友情提示,以下内容复制到markdown预览,你能获得更好的阅读体验
#### 上传github下载的DuEE_baseline.zip,解压
#### 切换到代码目录
```
cd baseline/
```
#### 建立conda环境
```
conda create -n DuEE_2020 python=2.7 -y
source activate DuEE_2020
pip install -r ./requirements.txt
```
### 下载预训练的ERNIE模型
```
mkdir ./model
cd ./model
wget https://ernie.bj.bcebos.com/ERNIE_1.0_max-len-512.tar.gz --no-check-certificate
mkdir ERNIE_1.0_max-len-512
tar -zxf ERNIE_1.0_max-len-512.tar.gz -C ERNIE_1.0_max-len-512
rm ERNIE_1.0_max-len-512.tar.gz
cd ../
```
#### 将官方的`train.json`, `dev.json`重命名为`train1.json`, `dev1.json`,并将`train1.json`, `dev1.json`, `test1.json`放到`./data/`目录下
#### 修改`./bin/data_process.py`,增加三个函数,并修改main函数里的func_mapping
```
def origin_events_process1train():
"""origin_events_process"""
origin_events_path = sys.argv[2]
save_dir = sys.argv[3]
if not origin_events_path or not save_dir:
raise Exception("set origin_events_path and save_dir first")
output = []
lines = utils.read_by_lines(origin_events_path)
for line in lines:
d_json = json.loads(line)
for event in d_json["event_list"]:
event["event_id"] = u"{}_{}".format(d_json["id"], event["trigger"])
event["text"] = d_json["text"]
event["id"] = d_json["id"]
output.append(json.dumps(event, ensure_ascii=False))
random.shuffle(output) # 随机一下
print(
u"include sentences {}, events {}, train datas {}"
.format(
len(lines), len(output), len(output)))
utils.write_by_lines(u"{}/train.json".format(save_dir), output)
def origin_events_process1dev():
"""origin_events_process"""
origin_events_path = sys.argv[2]
save_dir = sys.argv[3]
if not origin_events_path or not save_dir:
raise Exception("set origin_events_path and save_dir first")
output = []
lines = utils.read_by_lines(origin_events_path)
for line in lines:
d_json = json.loads(line)
for event in d_json["event_list"]:
event["event_id"] = u"{}_{}".format(d_json["id"], event["trigger"])
event["text"] = d_json["text"]
event["id"] = d_json["id"]
output.append(json.dumps(event, ensure_ascii=False))
random.shuffle(output) # 随机一下
print(
u"include sentences {}, events {}, train datas {}"
.format(
len(lines), len(output), len(output)))
utils.write_by_lines(u"{}/dev.json".format(save_dir), output)
def origin_events_process1test():
"""origin_events_process"""
origin_events_path = sys.argv[2]
save_dir = sys.argv[3]
if not origin_events_path or not save_dir:
raise Exception("set origin_events_path and save_dir first")
output = []
lines = utils.read_by_lines(origin_events_path)
for line in lines:
d_json = json.loads(line)
event = {}
event["trigger"] = ""
event["trigger_start_index"] = 0
event["class"] = ""
event["event_type"] = ""
event["arguments"] = []
argument = {}
argument["argument_start_index"] = 0
argument["role"] = ""
argument["argument"] = ""
argument["alias"] = []
event["arguments"].append(argument)
event["event_id"] = u"{}_{}".format(d_json["id"], "no_event")
event["text"] = d_json["text"]
event["id"] = d_json["id"]
output.append(json.dumps(event, ensure_ascii=False))
random.shuffle(output) # 随机一下
print(
u"include sentences {}, events {}, test datas {}"
.format(len(lines), len(output), len(output)))
utils.write_by_lines(u"{}/test.json".format(save_dir), output)
def main():
"""main"""
func_mapping = {
"origin_events_process": origin_events_process,
"schema_event_type_process": schema_event_type_process,
"schema_role_process": schema_role_process,
# 新增以下三行以及上一句末尾逗号
"origin_events_process1train": origin_events_process1train,
"origin_events_process1dev": origin_events_process1dev,
"origin_events_process1test": origin_events_process1test
}
func_name = sys.argv[1]
if func_name not in func_mapping:
raise Exception("no function {}, please select [ {} ]".format(
func_name, u" | ".join(func_mapping.keys())))
func_mapping[func_name]()
```
### 处理样本数据
```
rm ./data/eet_events.json
python bin/data_process.py origin_events_process1train ./data/train1.json ./data/
python bin/data_process.py origin_events_process1dev ./data/dev1.json ./data/
python bin/data_process.py origin_events_process1test ./data/test1.json ./data/
```
### 处理schema生成序列标注标签文档
#### 用官方的`event_schema.json`覆盖`./dict/event_schema.json`
#### 触发词识别模型标签,保存到文件 `./dict/vocab_trigger_label_map.txt`
```
python bin/data_process.py schema_event_type_process ./dict/event_schema.json ./dict/vocab_trigger_label_map.txt
```
#### 论元角色识别模型标签,保存到文件 `./dict/vocab_roles_label_map.txt`
```
python bin/data_process.py schema_role_process ./dict/event_schema.json ./dict/vocab_roles_label_map.txt
```
### 训练触发词识别模型
#### 将官方baseline系统里的`train_trigger.sh`, `train_role.sh`, `predict_trigger.sh`, `predict_role.sh`拷贝到项目根目录下,每个脚本都可以修改`GPUID`的值来选择运行的GPU
#### 运行脚本后会先添加临时变量,然后运行新脚本,这样即使关闭当前ssh连接程序仍在后台运行
```
sh train_trigger.sh
```
### 预测触发词结果
#### 修改脚本`./bin/script/predict_event_trigger.sh`,第21行改为`--batch_zize 32`, 只要对应你训练时使用的batch_size大小即可,改完记得返回项目根目录
#### 如需用CPU来评估在开发集上的结果,在该文件中`export CPU_NUM=20`(自选),并将第17行的use_cuda的值改为false
```
sh predict_trigger.sh
```
### 训练论元角色识别模型
```
sh train_role.sh
```
### 预测论元角色结果
#### 修改脚本`./bin/script/predict_event_role.sh`,第21行改为`--batch_zize 32`
```
sh predict_role.sh
```
### 评估
#### 将测试集(`./data/test.json`)转化为评估格式 `./bin/evaluate/test.json`
```
mkdir ./bin/evaluate/
python bin/predict_eval_process.py test_data_2_eval ./data/test.json ./bin/evaluate/test.json
```
#### 将预测结果整合并转为评估格式
```
python bin/predict_eval_process.py predict_data_2_eval ./save_model/trigger/pred_trigger.json ./save_model/role/pred_role.json ./dict/event_schema.json ./bin/evaluate/pred.json
```
#### 将`pred.json`上传,可重命名,每个系统每天可以上传5个结果
| precision | recall | f1_score |
| ---- | ---- | ---- |
| 0.77 | 0.663 | 0.713 |
#### 参考链接
[官方github](https://github.com/PaddlePaddle/Research/tree/master/KG/DuEE_baseline/DuEE-PaddleHub)
[官方baseline](https://aistudio.baidu.com/aistudio/projectdetail/357419)
[阿布军事的分享](https://ai.baidu.com/forum/topic/show/958717)
好文,编辑的也很好。
赞
不错不错
我在pip install -r ./requirements.txt时报错,错误提示如下,能帮忙看一下吗?
ERROR: Command errored out with exit status 1:
command: /home/gaolu/miniconda3/envs/paddle_env2/bin/python /home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpfC4cKc
cwd: /tmp/pip-install-YtYcIj/opencv-python
Complete output (22 lines):
Traceback (most recent call last):
File "/home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/site-packages/pip/_vendor/pep517/_in_process.py", line 280, in
main()
File "/home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/site-packages/pip/_vendor/pep517/_in_process.py", line 263, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/site-packages/pip/_vendor/pep517/_in_process.py", line 114, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-Y3F_xn/overlay/lib/python2.7/site-packages/setuptools/build_meta.py", line 146, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
File "/tmp/pip-build-env-Y3F_xn/overlay/lib/python2.7/site-packages/setuptools/build_meta.py", line 127, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-Y3F_xn/overlay/lib/python2.7/site-packages/setuptools/build_meta.py", line 243, in run_setup
self).run_setup(setup_script=setup_script)
File "/tmp/pip-build-env-Y3F_xn/overlay/lib/python2.7/site-packages/setuptools/build_meta.py", line 142, in run_setup
exec(compile(code, __file__, 'exec'), locals())
File "setup.py", line 448, in
main()
File "setup.py", line 99, in main
% {"ext": re.escape(sysconfig.get_config_var("EXT_SUFFIX"))}
File "/home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/re.py", line 210, in escape
s = list(pattern)
TypeError: 'NoneType' object is not iterable
----------------------------------------
ERROR: Command errored out with exit status 1: /home/gaolu/miniconda3/envs/paddle_env2/bin/python /home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpfC4cKc Check the logs for full command output.
能看出是哪个包的问题吗?
mark,学习之
你好想请教下关系抽取的问题,方便留个联系方式么
http://ccfdap.wikidot.com/
http://xmadw.wikidot.com/
http://szhen88.wikidot.com/
http://hzp66.wikidot.com/
http://tjfap9.wikidot.com/
http://tjkap6.wikidot.com
http://tianj20.wikidot.com
http://xmazx.wikidot.com/
http://xmcsw.wikidot.com
http://xmaer.wikidot.com
http://xmkdp.wikidot.com
http://xadp.wikidot.com/
http://wuhpp.wikidot.com/
http://xadkpp.wikidot.com/
http://xahakp.wikidot.com/
http://wuhakp.wikidot.com/
http://31kjw.wikidot.com/
http://32kjw.wikidot.com/
http://34kjw.wikidot.com/
http://35kjw.wikidot.com/
http://33kjw.wikidot.com/
http://suz16.wikidot.com/
http://xafaa.wikidot.com/
http://syfwe.wikidot.com/
http://fuziap.wikidot.com/
http://sjz00.wikidot.com/
http://xad213.wikidot.com/
http://wuha63.wikidot.com/
http://sza3.wikidot.com/
http://sjzf9.wikidot.com/
http://sjzsa4.wikidot.com/
http://sjz65.wikidot.com/
http://zzziap.wikidot.com/
http://cskip.wikidot.com/
http://zzjiap.wikidot.com/
http://ccaip.wikidot.com/
http://fo11.wikidot.com/start
http://suz88.wikidot.com/
http://sz06.wikidot.com/
http://fos9.wikidot.com/
http://hnhjnrb579zg2.wikidot.com/
http://hnhjnrb579zg1.wikidot.com/
http://hnhjnrb579zg3.wikidot.com/
http://hnhjnrb579zg4.wikidot.com/
http://hnhjnrb579zg5.wikidot.com/
http://fplvdlh115zg1.wikidot.com/
http://fplvdlh115zg2.wikidot.com/
http://fplvdlh115zg3.wikidot.com/
http://fplvdlh115zg4.wikidot.com/
http://fplvdlh115zg5.wikidot.com/
http://icscooy628zg1.wikidot.com/
http://icscooy628zg2.wikidot.com/
http://icscooy628zg3.wikidot.com/
http://icscooy628zg4.wikidot.com/
http://icscooy628zg5.wikidot.com/
http://xrtvtth195zg1.wikidot.com/
http://xrtvtth195zg2.wikidot.com/
http://xrtvtth195zg3.wikidot.com/
http://xrtvtth195zg4.wikidot.com/
http://xrtvtth195zg5.wikidot.com/
http://ucissso486zg1.wikidot.com/
http://ucissso486zg2.wikidot.com/
http://ucissso486zg3.wikidot.com/
http://ucissso486zg4.wikidot.com/
http://ucissso486zg5.wikidot.com/
http://jbbtlht151zg1.wikidot.com/
http://jbbtlht151zg2.wikidot.com/
http://jbbtlht151zg3.wikidot.com/
http://jbbtlht151zg4.wikidot.com/
http://jbbtlht151zg5.wikidot.com/