事件抽取baseline部署到服务器

项目

数据集

课程

比赛

模型库

活动

论坛

访问飞桨官网

项目

数据集

课程

比赛

模型库

活动

论坛

访问飞桨官网

风维月魄发布于2020-04

友情提示，以下内容复制到markdown预览，你能获得更好的阅读体验

#### 上传github下载的DuEE_baseline.zip，解压

#### 切换到代码目录
```
cd baseline/
```

#### 建立conda环境
```
conda create -n DuEE_2020 python=2.7 -y
source activate DuEE_2020
pip install -r ./requirements.txt
```
### 下载预训练的ERNIE模型
```
mkdir ./model
cd ./model
wget https://ernie.bj.bcebos.com/ERNIE_1.0_max-len-512.tar.gz --no-check-certificate
mkdir ERNIE_1.0_max-len-512
tar -zxf ERNIE_1.0_max-len-512.tar.gz -C ERNIE_1.0_max-len-512
rm ERNIE_1.0_max-len-512.tar.gz
cd ../
```

#### 将官方的`train.json`, `dev.json`重命名为`train1.json`, `dev1.json`，并将`train1.json`, `dev1.json`, `test1.json`放到`./data/`目录下
#### 修改`./bin/data_process.py`，增加三个函数，并修改main函数里的func_mapping
```
def origin_events_process1train():
    """origin_events_process"""
    origin_events_path = sys.argv[2]
    save_dir = sys.argv[3]
    if not origin_events_path or not save_dir:
        raise Exception("set origin_events_path and save_dir first")
    output = []
    lines = utils.read_by_lines(origin_events_path)
    for line in lines:
        d_json = json.loads(line)
        for event in d_json["event_list"]:
            event["event_id"] = u"{}_{}".format(d_json["id"], event["trigger"])
            event["text"] = d_json["text"]
            event["id"] = d_json["id"]
            output.append(json.dumps(event, ensure_ascii=False))
    random.shuffle(output)  # 随机一下

    print(
        u"include sentences {}, events {}, train datas {}"
        .format(
            len(lines), len(output), len(output)))
    utils.write_by_lines(u"{}/train.json".format(save_dir), output)

def origin_events_process1dev():
    """origin_events_process"""
    origin_events_path = sys.argv[2]
    save_dir = sys.argv[3]
    if not origin_events_path or not save_dir:
        raise Exception("set origin_events_path and save_dir first")
    output = []
    lines = utils.read_by_lines(origin_events_path)
    for line in lines:
        d_json = json.loads(line)
        for event in d_json["event_list"]:
            event["event_id"] = u"{}_{}".format(d_json["id"], event["trigger"])
            event["text"] = d_json["text"]
            event["id"] = d_json["id"]
            output.append(json.dumps(event, ensure_ascii=False))
    random.shuffle(output)  # 随机一下

    print(
        u"include sentences {}, events {}, train datas {}"
        .format(
            len(lines), len(output), len(output)))
    utils.write_by_lines(u"{}/dev.json".format(save_dir), output)

def origin_events_process1test():
    """origin_events_process"""
    origin_events_path = sys.argv[2]
    save_dir = sys.argv[3]
    if not origin_events_path or not save_dir:
        raise Exception("set origin_events_path and save_dir first")
    output = []
    lines = utils.read_by_lines(origin_events_path)
    for line in lines:
        d_json = json.loads(line)
        event = {}
        event["trigger"] = ""
        event["trigger_start_index"] = 0
        event["class"] = ""
        event["event_type"] = ""
        event["arguments"] = []
        argument  = {}
        argument["argument_start_index"] = 0
        argument["role"] = ""
        argument["argument"] = ""
        argument["alias"] = []
        event["arguments"].append(argument)
        event["event_id"] = u"{}_{}".format(d_json["id"], "no_event")
        event["text"] = d_json["text"]
        event["id"] = d_json["id"]
        output.append(json.dumps(event, ensure_ascii=False))
    random.shuffle(output)  # 随机一下

    print(
        u"include sentences {}, events {}, test datas {}"
        .format(len(lines), len(output), len(output)))
    utils.write_by_lines(u"{}/test.json".format(save_dir), output)

def main():
    """main"""
    func_mapping = {
        "origin_events_process": origin_events_process,
        "schema_event_type_process": schema_event_type_process,
        "schema_role_process": schema_role_process,

        # 新增以下三行以及上一句末尾逗号
        "origin_events_process1train": origin_events_process1train,
        "origin_events_process1dev": origin_events_process1dev,
        "origin_events_process1test": origin_events_process1test
    }
    func_name = sys.argv[1]
    if func_name not in func_mapping:
        raise Exception("no function {}, please select [ {} ]".format(
            func_name, u" | ".join(func_mapping.keys())))
    func_mapping[func_name]()

```
### 处理样本数据
```
rm ./data/eet_events.json
python bin/data_process.py origin_events_process1train ./data/train1.json ./data/
python bin/data_process.py origin_events_process1dev ./data/dev1.json ./data/
python bin/data_process.py origin_events_process1test ./data/test1.json ./data/
```

### 处理schema生成序列标注标签文档
#### 用官方的`event_schema.json`覆盖`./dict/event_schema.json`
#### 触发词识别模型标签，保存到文件 `./dict/vocab_trigger_label_map.txt`
```
python bin/data_process.py schema_event_type_process ./dict/event_schema.json ./dict/vocab_trigger_label_map.txt
```

#### 论元角色识别模型标签，保存到文件 `./dict/vocab_roles_label_map.txt`
```
python bin/data_process.py schema_role_process ./dict/event_schema.json ./dict/vocab_roles_label_map.txt
```

### 训练触发词识别模型
#### 将官方baseline系统里的`train_trigger.sh`, `train_role.sh`, `predict_trigger.sh`, `predict_role.sh`拷贝到项目根目录下，每个脚本都可以修改`GPUID`的值来选择运行的GPU
#### 运行脚本后会先添加临时变量，然后运行新脚本，这样即使关闭当前ssh连接程序仍在后台运行
```
sh train_trigger.sh
```

### 预测触发词结果
#### 修改脚本`./bin/script/predict_event_trigger.sh`，第21行改为`--batch_zize 32`, 只要对应你训练时使用的batch_size大小即可，改完记得返回项目根目录
#### 如需用CPU来评估在开发集上的结果，在该文件中`export CPU_NUM=20`（自选），并将第17行的use_cuda的值改为false
```
sh predict_trigger.sh
```

### 训练论元角色识别模型
```
sh train_role.sh
```

### 预测论元角色结果
#### 修改脚本`./bin/script/predict_event_role.sh`，第21行改为`--batch_zize 32`
```
sh predict_role.sh
```

### 评估
#### 将测试集（`./data/test.json`）转化为评估格式 `./bin/evaluate/test.json`
```
mkdir ./bin/evaluate/
python bin/predict_eval_process.py test_data_2_eval ./data/test.json ./bin/evaluate/test.json
```
#### 将预测结果整合并转为评估格式
```
python bin/predict_eval_process.py predict_data_2_eval ./save_model/trigger/pred_trigger.json ./save_model/role/pred_role.json ./dict/event_schema.json ./bin/evaluate/pred.json
```

#### 将`pred.json`上传，可重命名，每个系统每天可以上传5个结果
| precision | recall | f1_score |
| ---- | ---- | ---- |
| 0.77 | 0.663 | 0.713 |

#### 参考链接
[官方github](https://github.com/PaddlePaddle/Research/tree/master/KG/DuEE_baseline/DuEE-PaddleHub)
[官方baseline](https://aistudio.baidu.com/aistudio/projectdetail/357419)
[阿布军事的分享](https://ai.baidu.com/forum/topic/show/958717)

全部评论(8)

AIStudio810258

#2 回复于2020-04

好文，编辑的也很好。

wangwei8638

#3 回复于2020-04

水水水的老师

#4 回复于2020-04

不错不错

Rela0426

#5 回复于2021-01

我在pip install -r ./requirements.txt时报错，错误提示如下，能帮忙看一下吗？

ERROR: Command errored out with exit status 1:
command: /home/gaolu/miniconda3/envs/paddle_env2/bin/python /home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpfC4cKc
cwd: /tmp/pip-install-YtYcIj/opencv-python
Complete output (22 lines):
Traceback (most recent call last):
File "/home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/site-packages/pip/_vendor/pep517/_in_process.py", line 280, in
main()
File "/home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/site-packages/pip/_vendor/pep517/_in_process.py", line 263, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "/home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/site-packages/pip/_vendor/pep517/_in_process.py", line 114, in get_requires_for_build_wheel
return hook(config_settings)
File "/tmp/pip-build-env-Y3F_xn/overlay/lib/python2.7/site-packages/setuptools/build_meta.py", line 146, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=['wheel'])
File "/tmp/pip-build-env-Y3F_xn/overlay/lib/python2.7/site-packages/setuptools/build_meta.py", line 127, in _get_build_requires
self.run_setup()
File "/tmp/pip-build-env-Y3F_xn/overlay/lib/python2.7/site-packages/setuptools/build_meta.py", line 243, in run_setup
self).run_setup(setup_script=setup_script)
File "/tmp/pip-build-env-Y3F_xn/overlay/lib/python2.7/site-packages/setuptools/build_meta.py", line 142, in run_setup
exec(compile(code, __file__, 'exec'), locals())
File "setup.py", line 448, in
main()
File "setup.py", line 99, in main
% {"ext": re.escape(sysconfig.get_config_var("EXT_SUFFIX"))}
File "/home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/re.py", line 210, in escape
s = list(pattern)
TypeError: 'NoneType' object is not iterable
----------------------------------------
ERROR: Command errored out with exit status 1: /home/gaolu/miniconda3/envs/paddle_env2/bin/python /home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpfC4cKc Check the logs for full command output.

AIStudio810260

#6 回复于2021-01

Rela0426 #5

我在pip install -r ./requirements.txt时报错，错误提示如下，能帮忙看一下吗？ ERROR: Command errored out with exit status 1: command: /home/gaolu/miniconda3/envs/paddle_env2/bin/python /home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpfC4cKc cwd: /tmp/pip-install-YtYcIj/opencv-python Complete output (22 lines): Traceback (most recent call last): File "/home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/site-packages/pip/_vendor/pep517/_in_process.py", line 280, in main() File "/home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/site-packages/pip/_vendor/pep517/_in_process.py", line 263, in main json_out['return_val'] = hook(**hook_input['kwargs']) File "/home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/site-packages/pip/_vendor/pep517/_in_process.py", line 114, in get_requires_for_build_wheel return hook(config_settings) File "/tmp/pip-build-env-Y3F_xn/overlay/lib/python2.7/site-packages/setuptools/build_meta.py", line 146, in get_requires_for_build_wheel return self._get_build_requires(config_settings, requirements=['wheel']) File "/tmp/pip-build-env-Y3F_xn/overlay/lib/python2.7/site-packages/setuptools/build_meta.py", line 127, in _get_build_requires self.run_setup() File "/tmp/pip-build-env-Y3F_xn/overlay/lib/python2.7/site-packages/setuptools/build_meta.py", line 243, in run_setup self).run_setup(setup_script=setup_script) File "/tmp/pip-build-env-Y3F_xn/overlay/lib/python2.7/site-packages/setuptools/build_meta.py", line 142, in run_setup exec(compile(code, __file__, 'exec'), locals()) File "setup.py", line 448, in main() File "setup.py", line 99, in main % {"ext": re.escape(sysconfig.get_config_var("EXT_SUFFIX"))} File "/home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/re.py", line 210, in escape s = list(pattern) TypeError: 'NoneType' object is not iterable ---------------------------------------- ERROR: Command errored out with exit status 1: /home/gaolu/miniconda3/envs/paddle_env2/bin/python /home/gaolu/miniconda3/envs/paddle_env2/lib/python2.7/site-packages/pip/_vendor/pep517/_in_process.py get_requires_for_build_wheel /tmp/tmpfC4cKc Check the logs for full command output.

展开

能看出是哪个包的问题吗？

周小鱼whoyou

#7 回复于2021-01

mark，学习之

PSG.LGD

#8 回复于2021-10

你好想请教下关系抽取的问题，方便留个联系方式么

膳

膳蘸彝癱吻目轎

#9 回复于2021-10

水水水的老师 #4

不错不错

http://ccfdap.wikidot.com/
http://xmadw.wikidot.com/
http://szhen88.wikidot.com/
http://hzp66.wikidot.com/
http://tjfap9.wikidot.com/
http://tjkap6.wikidot.com
http://tianj20.wikidot.com
http://xmazx.wikidot.com/
http://xmcsw.wikidot.com
http://xmaer.wikidot.com
http://xmkdp.wikidot.com
http://xadp.wikidot.com/
http://wuhpp.wikidot.com/
http://xadkpp.wikidot.com/
http://xahakp.wikidot.com/
http://wuhakp.wikidot.com/
http://31kjw.wikidot.com/
http://32kjw.wikidot.com/
http://34kjw.wikidot.com/
http://35kjw.wikidot.com/
http://33kjw.wikidot.com/
http://suz16.wikidot.com/
http://xafaa.wikidot.com/
http://syfwe.wikidot.com/
http://fuziap.wikidot.com/
http://sjz00.wikidot.com/
http://xad213.wikidot.com/
http://wuha63.wikidot.com/
http://sza3.wikidot.com/
http://sjzf9.wikidot.com/
http://sjzsa4.wikidot.com/
http://sjz65.wikidot.com/
http://zzziap.wikidot.com/
http://cskip.wikidot.com/
http://zzjiap.wikidot.com/
http://ccaip.wikidot.com/
http://fo11.wikidot.com/start
http://suz88.wikidot.com/
http://sz06.wikidot.com/
http://fos9.wikidot.com/
http://hnhjnrb579zg2.wikidot.com/
http://hnhjnrb579zg1.wikidot.com/
http://hnhjnrb579zg3.wikidot.com/
http://hnhjnrb579zg4.wikidot.com/
http://hnhjnrb579zg5.wikidot.com/
http://fplvdlh115zg1.wikidot.com/
http://fplvdlh115zg2.wikidot.com/
http://fplvdlh115zg3.wikidot.com/
http://fplvdlh115zg4.wikidot.com/
http://fplvdlh115zg5.wikidot.com/
http://icscooy628zg1.wikidot.com/
http://icscooy628zg2.wikidot.com/
http://icscooy628zg3.wikidot.com/
http://icscooy628zg4.wikidot.com/
http://icscooy628zg5.wikidot.com/
http://xrtvtth195zg1.wikidot.com/
http://xrtvtth195zg2.wikidot.com/
http://xrtvtth195zg3.wikidot.com/
http://xrtvtth195zg4.wikidot.com/
http://xrtvtth195zg5.wikidot.com/
http://ucissso486zg1.wikidot.com/
http://ucissso486zg2.wikidot.com/
http://ucissso486zg3.wikidot.com/
http://ucissso486zg4.wikidot.com/
http://ucissso486zg5.wikidot.com/
http://jbbtlht151zg1.wikidot.com/
http://jbbtlht151zg2.wikidot.com/
http://jbbtlht151zg3.wikidot.com/
http://jbbtlht151zg4.wikidot.com/
http://jbbtlht151zg5.wikidot.com/

提issue

需求/bug反馈？一键提issue告诉我们

提pr

发现bug？如果您知道修复办法，欢迎提pr直接参与建设飞桨~