AI studio上ERNI阅读理解模型使用问题
andwhatandwhat 发布于2020-03 浏览:4250 回复:3
0
收藏

在 AI studio 上,下载好模型ERNIE 1.0 中文 Base 模型(max_len=512)和数据

运行 sh script/zh_task/ernie_base/run_drcd.sh 报错:

 

export TASK_DATA_PATH=/task_data
+ export MODEL_PATH=/model
+ sh script/zh_task/ernie_base/run_drcd.sh
+ export FLAGS_eager_delete_tensor_gb=0
+ export FLAGS_sync_nccl_allreduce=1
+ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+ export PYTHONPATH=./ernie:
+ hostname -i
+ python ./ernie/finetune_launch.py --nproc_per_node 8 --selected_gpus 0,1,2,3,4,5,6,7 --node_ips 172.21.15.60 --node_id 0 ./ernie/run_mrc.py --use_cuda true --batch_size 16 --in_tokens false --use_fast_executor true --checkpoints ./checkpoints --vocab_path /model/vocab.txt --ernie_config_path /model/ernie_config.json --do_train true --do_val true --do_test true --verbose true --save_steps 1000 --validation_steps 100 --warmup_proportion 0.0 --weight_decay 0.01 --epoch 2 --max_seq_len 512 --do_lower_case true --doc_stride 128 --train_set /task_data/drcd/train.json --dev_set /task_data/drcd/dev.json --test_set /task_data/drcd/test.json --learning_rate 5e-5 --num_iteration_per_drop_scope 1 --init_pretraining_params /model/params --skip_steps 10
2020-03-10 22:43:27,051-INFO: init model: /model/params
[INFO] 2020-03-10 22:43:27,051 [finetune_launch.py:  185]:	init model: /model/params
syntax error at (eval 1) line 1, near "."
2020-03-10 22:43:27,084-INFO: -----------  Configuration Arguments -----------
[INFO] 2020-03-10 22:43:27,084 [     args.py:   68]:	-----------  Configuration Arguments -----------
2020-03-10 22:43:27,085-INFO: current_node_ip: None
[INFO] 2020-03-10 22:43:27,085 [     args.py:   70]:	current_node_ip: None
2020-03-10 22:43:27,085-INFO: log_prefix: 
[INFO] 2020-03-10 22:43:27,085 [     args.py:   70]:	log_prefix: 
2020-03-10 22:43:27,085-INFO: node_id: 0
[INFO] 2020-03-10 22:43:27,085 [     args.py:   70]:	node_id: 0
2020-03-10 22:43:27,085-INFO: node_ips: 172.21.15.60
[INFO] 2020-03-10 22:43:27,085 [     args.py:   70]:	node_ips: 172.21.15.60
2020-03-10 22:43:27,086-INFO: nproc_per_node: 8
[INFO] 2020-03-10 22:43:27,086 [     args.py:   70]:	nproc_per_node: 8
2020-03-10 22:43:27,086-INFO: print_config: True
[INFO] 2020-03-10 22:43:27,086 [     args.py:   70]:	print_config: True
2020-03-10 22:43:27,086-INFO: selected_gpus: 0,1,2,3,4,5,6,7
[INFO] 2020-03-10 22:43:27,086 [     args.py:   70]:	selected_gpus: 0,1,2,3,4,5,6,7
2020-03-10 22:43:27,086-INFO: split_log_path: log
[INFO] 2020-03-10 22:43:27,086 [     args.py:   70]:	split_log_path: log
2020-03-10 22:43:27,086-INFO: training_script: ./ernie/run_mrc.py
[INFO] 2020-03-10 22:43:27,086 [     args.py:   70]:	training_script: ./ernie/run_mrc.py
2020-03-10 22:43:27,087-INFO: training_script_args: ['--use_cuda', 'true', '--batch_size', '16', '--in_tokens', 'false', '--use_fast_executor', 'true', '--checkpoints', './checkpoints', '--vocab_path', '/model/vocab.txt', '--ernie_config_path', '/model/ernie_config.json', '--do_train', 'true', '--do_val', 'true', '--do_test', 'true', '--verbose', 'true', '--save_steps', '1000', '--validation_steps', '100', '--warmup_proportion', '0.0', '--weight_decay', '0.01', '--epoch', '2', '--max_seq_len', '512', '--do_lower_case', 'true', '--doc_stride', '128', '--train_set', '/task_data/drcd/train.json', '--dev_set', '/task_data/drcd/dev.json', '--test_set', '/task_data/drcd/test.json', '--learning_rate', '5e-5', '--num_iteration_per_drop_scope', '1', '--init_pretraining_params', '/model/params', '--skip_steps', '10']
[INFO] 2020-03-10 22:43:27,087 [     args.py:   70]:	training_script_args: ['--use_cuda', 'true', '--batch_size', '16', '--in_tokens', 'false', '--use_fast_executor', 'true', '--checkpoints', './checkpoints', '--vocab_path', '/model/vocab.txt', '--ernie_config_path', '/model/ernie_config.json', '--do_train', 'true', '--do_val', 'true', '--do_test', 'true', '--verbose', 'true', '--save_steps', '1000', '--validation_steps', '100', '--warmup_proportion', '0.0', '--weight_decay', '0.01', '--epoch', '2', '--max_seq_len', '512', '--do_lower_case', 'true', '--doc_stride', '128', '--train_set', '/task_data/drcd/train.json', '--dev_set', '/task_data/drcd/dev.json', '--test_set', '/task_data/drcd/test.json', '--learning_rate', '5e-5', '--num_iteration_per_drop_scope', '1', '--init_pretraining_params', '/model/params', '--skip_steps', '10']
2020-03-10 22:43:27,087-INFO: ------------------------------------------------
[INFO] 2020-03-10 22:43:27,087 [     args.py:   71]:	------------------------------------------------
2020-03-10 22:43:27,087-INFO: 172.21.15.60
[INFO] 2020-03-10 22:43:27,087 [finetune_launch.py:   74]:	172.21.15.60
2020-03-10 22:43:27,088-INFO: all_trainer_endpoints: 172.21.15.60:6170,172.21.15.60:6171,172.21.15.60:6172,172.21.15.60:6173,172.21.15.60:6174,172.21.15.60:6175,172.21.15.60:6176,172.21.15.60:6177, node_id: 0, current_ip: 172.21.15.60, num_nodes: 1, node_ips: [u'172.21.15.60'], gpus_per_proc: 1, selected_gpus_per_proc: [[u'0'], [u'1'], [u'2'], [u'3'], [u'4'], [u'5'], [u'6'], [u'7']], nranks: 8
[INFO] 2020-03-10 22:43:27,088 [finetune_launch.py:  112]:	all_trainer_endpoints: 172.21.15.60:6170,172.21.15.60:6171,172.21.15.60:6172,172.21.15.60:6173,172.21.15.60:6174,172.21.15.60:6175,172.21.15.60:6176,172.21.15.60:6177, node_id: 0, current_ip: 172.21.15.60, num_nodes: 1, node_ips: [u'172.21.15.60'], gpus_per_proc: 1, selected_gpus_per_proc: [[u'0'], [u'1'], [u'2'], [u'3'], [u'4'], [u'5'], [u'6'], [u'7']], nranks: 8
2020-03-10 22:43:27,100-INFO: subprocess launched, check log at log/job.log.0
[INFO] 2020-03-10 22:43:27,100 [finetune_launch.py:  150]:	subprocess launched, check log at log/job.log.0
2020-03-10 22:43:27,110-INFO: subprocess launched, check log at log/job.log.1
[INFO] 2020-03-10 22:43:27,110 [finetune_launch.py:  150]:	subprocess launched, check log at log/job.log.1
2020-03-10 22:43:27,124-INFO: subprocess launched, check log at log/job.log.2
[INFO] 2020-03-10 22:43:27,124 [finetune_launch.py:  150]:	subprocess launched, check log at log/job.log.2
2020-03-10 22:43:27,139-INFO: subprocess launched, check log at log/job.log.3
[INFO] 2020-03-10 22:43:27,139 [finetune_launch.py:  150]:	subprocess launched, check log at log/job.log.3
2020-03-10 22:43:27,150-INFO: subprocess launched, check log at log/job.log.4
[INFO] 2020-03-10 22:43:27,150 [finetune_launch.py:  150]:	subprocess launched, check log at log/job.log.4
2020-03-10 22:43:27,160-INFO: subprocess launched, check log at log/job.log.5
[INFO] 2020-03-10 22:43:27,160 [finetune_launch.py:  150]:	subprocess launched, check log at log/job.log.5
2020-03-10 22:43:27,170-INFO: subprocess launched, check log at log/job.log.6
[INFO] 2020-03-10 22:43:27,170 [finetune_launch.py:  150]:	subprocess launched, check log at log/job.log.6
2020-03-10 22:43:27,182-INFO: subprocess launched, check log at log/job.log.7
[INFO] 2020-03-10 22:43:27,182 [finetune_launch.py:  150]:	subprocess launched, check log at log/job.log.7
Traceback (most recent call last):
  File "./ernie/finetune_launch.py", line 188, in 
    main(lanch_args)
  File "./ernie/finetune_launch.py", line 176, in main
    start_procs(args)
  File "./ernie/finetune_launch.py", line 164, in start_procs
    cmd=cmds[i])
subprocess.CalledProcessError: Command '['/opt/conda/envs/python27-paddle120-env/bin/python', u'-u', './ernie/run_mrc.py', '--use_cuda', 'true', '--batch_size', '16', '--in_tokens', 'false', '--use_fast_executor', 'true', '--checkpoints', './checkpoints', '--vocab_path', '/model/vocab.txt', '--ernie_config_path', '/model/ernie_config.json', '--do_train', 'true', '--do_val', 'true', '--do_test', 'true', '--verbose', 'true', '--save_steps', '1000', '--validation_steps', '100', '--warmup_proportion', '0.0', '--weight_decay', '0.01', '--epoch', '2', '--max_seq_len', '512', '--do_lower_case', 'true', '--doc_stride', '128', '--train_set', '/task_data/drcd/train.json', '--dev_set', '/task_data/drcd/dev.json', '--test_set', '/task_data/drcd/test.json', '--learning_rate', '5e-5', '--num_iteration_per_drop_scope', '1', '--init_pretraining_params', '/model/params', '--skip_steps', '10', u'--is_distributed', u'true']' returned non-zero exit status 1
收藏
点赞
0
个赞
共3条回复 最后由用户已被禁言回复于2022-03
#12Daniel_more回复于2021-07

同问,我做训练数据也是出现类似的错误。请问楼主解决了吗?

0
#11Daniel_more回复于2021-07
#2 阿布军师回复
感觉你提供的这个报错信息不全阿 File "./ernie/finetune_launch.py", line 188 这个文件有问题吗?
展开

那个文件没问题吧,这个就是按照官方指南做就这样了。

0
#2阿布军师回复于2020-03

感觉你提供的这个报错信息不全阿

File "./ernie/finetune_launch.py", line 188 这个文件有问题吗?

0
快速回复
TOP
切换版块