paddlehub的TextClassifierTask提供了三个评价指标:accuracy、matthews和f1,众所周知,要评估类别之间不平衡的分类问题,在考虑准确率之外必须考虑查全率,故而有f1指标和Matthews指标的诞生。但是据我实测,TextClassifierTask的f1和Matthews都只对二分类问题有用。这并不奇怪,这二者的定义本身也是在二分类问题上的,只是我们可以推广到多分类问题上,成为micro/macro-f1指标。paddlehub究竟有没有准备多分类的更加可靠的评价工具呢?
在多分类问题中使用paddlehub提供的Matthews:
[2020-04-01 19:20:59,542] [ TRAIN] - step 810 / 2539: loss=0.12797 matthews=0.00000 acc=0.95937 [step/sec: 0.80]
[2020-04-01 19:21:11,955] [ TRAIN] - step 820 / 2539: loss=0.13541 matthews=0.00000 acc=0.96484 [step/sec: 0.81]
[2020-04-01 19:21:24,331] [ TRAIN] - step 830 / 2539: loss=0.10855 matthews=0.00000 acc=0.96875 [step/sec: 0.81]
[2020-04-01 19:21:36,635] [ TRAIN] - step 840 / 2539: loss=0.11830 matthews=1.00000 acc=0.96406 [step/sec: 0.81]
[2020-04-01 19:21:48,879] [ TRAIN] - step 850 / 2539: loss=0.10088 matthews=0.00000 acc=0.97031 [step/sec: 0.82]
[2020-04-01 19:21:48,880] [ INFO] - Evaluation on dev dataset start
[2020-04-01 19:21:50,759] [ EVAL] - [dev dataset evaluation result] loss=0.11813 matthews=0.00000 acc=0.95460 [step/sec: 2.87]
在多分类问题中使用paddlehub提供的f1:
[2020-04-02 09:54:45,881] [ TRAIN] - step 110 / 2539: loss=2.00415 f1=0.00000 acc=0.43555 [step/sec: 0.87]
[2020-04-02 09:54:57,282] [ TRAIN] - step 120 / 2539: loss=1.91379 f1=0.00000 acc=0.45156 [step/sec: 0.88]
[2020-04-02 09:55:08,684] [ TRAIN] - step 130 / 2539: loss=1.79662 f1=0.00000 acc=0.49844 [step/sec: 0.88]
[2020-04-02 09:55:20,117] [ TRAIN] - step 140 / 2539: loss=1.68962 f1=0.00000 acc=0.54531 [step/sec: 0.87]
[2020-04-02 09:55:31,565] [ TRAIN] - step 150 / 2539: loss=1.57737 f1=0.00000 acc=0.55937 [step/sec: 0.87]
[2020-04-02 09:55:31,566] [ INFO] - Evaluation on dev dataset start
[2020-04-02 09:55:33,442] [ EVAL] - [dev dataset evaluation result] loss=2.03867 f1=0.00000 acc=0.47656 [step/sec: 3.50]
[2020-04-02 09:55:45,055] [ TRAIN] - step 160 / 2539: loss=1.41776 f1=0.00000 acc=0.62500 [step/sec: 0.86]
[2020-04-02 09:55:56,542] [ TRAIN] - step 170 / 2539: loss=1.46184 f1=0.00000 acc=0.62031 [step/sec: 0.87]
[2020-04-02 09:56:08,013] [ TRAIN] - step 180 / 2539: loss=1.29388 f1=0.00000 acc=0.64531 [step/sec: 0.87]
[2020-04-02 09:56:19,521] [ TRAIN] - step 190 / 2539: loss=1.28302 f1=0.00000 acc=0.63906 [step/sec: 0.87]
[2020-04-02 09:56:31,012] [ TRAIN] - step 200 / 2539: loss=1.34620 f1=0.00000 acc=0.60469 [step/sec: 0.87]
[2020-04-02 09:56:31,014] [ INFO] - Evaluation on dev dataset start
[2020-04-02 09:56:32,885] [ EVAL] - [dev dataset evaluation result] loss=1.83094 f1=0.00000 acc=0.54167 [step/sec: 3.51]
看了下源代码,在paddlehub/finetune/evaluate,的确都是针对二分类问题的。
f1:
def calculate_f1_np(preds, labels):
preds = np.array(preds)
labels = np.array(labels)
tp = np.sum((labels == 1) & (preds == 1))
tn = np.sum((labels == 0) & (preds == 0))
fp = np.sum((labels == 0) & (preds == 1))
fn = np.sum((labels == 1) & (preds == 0))
p = tp / (tp + fp) if (tp + fp) else 0
r = tp / (tp + fn) if (tp + fn) else 0
f1 = (2 * p * r) / (p + r) if p + r else 0
return f1
可以自己改成macro-f1,不过我得斟酌一下怎么改……
问题来了,怎么在在线平台改paddlehub的源代码……本地很好弄,但是我得用在线算力啊……
aistudio上的环境可以修改paddlehub的源码,找到安装位置,替换一下
你好 请问楼主知道怎么改了吗 能否分享下