训练脚本报错 maximum recursion depth exceeded

项目

数据集

课程

比赛

模型库

活动

论坛

访问飞桨官网

项目

数据集

课程

比赛

模型库

活动

论坛

访问飞桨官网

l ljjwxj1970 发布于2021-12

训练脚本出现了错误，这是怎么回事我设置的 batch_size_per_card 是 1

[2021/12/02 13:28:51] root INFO: Initialize indexs of datasets:['D:\\codes\\PaddleOCR-release-2.3\\train_data\\Label.txt']
INFO:root:If regularizer of a Parameter has been set by 'paddle.ParamAttr' or 'static.WeightNormParamAttr' already. The weight_decay[L2Decay, regularization_coeff=0.000010] in Optimizer will not take effect, and it will only be applied to other Parameters!
[2021/12/02 13:28:56] root ERROR: When parsing line train_images/lvmh_50e2c1d8-4ddc-11ec-a9fa-1418c37faa3e.png [{"transcription": "L", "points": [[7, 0], [19, 0], [19, 21], [7, 21]], "difficult": false}, {"transcription": "V", "points": [[35, 0], [49, 0], [49, 20], [35, 20]], "difficult": false}, {"transcription": "M", "points": [[61, 0], [84, 0], [84, 18], [61, 18]], "difficult": false}, {"transcription": "H", "points": [[88, 0], [104, 0], [104, 19], [88, 19]], "difficult": false}]
, error happened with msg: maximum recursion depth exceeded while calling a Python object
Fatal Python error: Cannot recover from stack overflow.
Python runtime state: initialized

Current thread 0x00003d58 (most recent call first):

ljjwxj1970

已解决

12# 回复于2021-12

我把 cal_metric_during_train 设置为 False 就不报错了但是测试识别一个验证码为 96g9 的结果变成了 D:\96g9_41481702-4ddc-11ec-b2ca-1418c37faa3e.png q 0.017899727 为什么识别结果只有一个字母，难道字典需要自定义吗，我字典用的默认的英文那个字典

展开

全部评论(16)

ljjwxj1970

#2 回复于2021-12

使用的cpu

羊毛

#3 回复于2021-12

这个错误看起来是递归深度，可以修改递归深度的值，让它变大大一点。比如，import sys sys.setrecursionlimit(10000) 。本质可能是你的数据有些异常，如果是自定义的数据可以画出来看看。

ljjwxj1970

#4 回复于2021-12

羊毛 #3

展开

import sys sys.setrecursionlimit(10000) 这个也试过了试到了100万还是这样的报错

我用的数据是下载的网站响应的验证码图片而且就100张

ljjwxj1970

#5 回复于2021-12

羊毛 #3

展开

我选用的模型是 en_number_mobile_v2.0_rec_train 是这个模型不对吗

羊毛

#6 回复于2021-12

Po4RpR.png Po4RpR 验证码训练文本识别用train.txt的行写成这样就好了。

ljjwxj1970

#7 回复于2021-12

羊毛 #6

Po4RpR.png Po4RpR 验证码训练文本识别用train.txt的行写成这样就好了。

应该咋写

我改成这样了

train_images/2dk6_3a82fbf4-4ddc-11ec-89b8-1418c37faa3e.png 2dk6

还是报错

[2021/12/02 17:02:44] root INFO: During the training process, after the 0th iteration, an evaluation is run every 20 iterations
[2021/12/02 17:02:44] root INFO: Initialize indexs of datasets:['D:\\codes\\PaddleOCR-release-2.3\\train_data\\Label2.txt']
INFO:root:If regularizer of a Parameter has been set by 'paddle.ParamAttr' or 'static.WeightNormParamAttr' already. The weight_decay[L2Decay, regularization_coeff=0.000010] in Optimizer will not take effect, and it will only be applied to other Parameters!
[2021/12/02 17:02:51] root ERROR: When parsing line train_images/vblb_ea12a631-4dba-11ec-bb0f-1418c37faa3e.png vblb
, error happened with msg: maximum recursion depth exceeded while calling a Python object
Fatal Python error: Cannot recover from stack overflow.

羊毛

#8 回复于2021-12

numworkers调低看看，用train.py的test_reader()函数测试一下，应该还是数据的问题，文件名+\t+label+\n

ljjwxj1970

#9 回复于2021-12

羊毛 #8

numworkers调低看看，用train.py的test_reader()函数测试一下，应该还是数据的问题，文件名+\t+label+\n

就是这样搞得，好奇怪，还是不行

# 7NXY.png

k = ''
for r,d,f in os.walk(r'D:\codes\PaddleOCR-release-2.3\train_data\train_images'):
for a in f:
k += a + '\t' + re.search('(.*?)\.',a).group(1) + '\n'

with open(r'D:\codes\PaddleOCR-release-2.3\train_data\Label3.txt', 'w')as f:
f.write(k)

羊毛

#11 回复于2021-12

把配置文件里的max_text_length调大点，看下解码的字符在不在配置的字典文件中

ljjwxj1970

#12 回复于2021-12

羊毛 #11

把配置文件里的max_text_length调大点，看下解码的字符在不在配置的字典文件中

我把 cal_metric_during_train 设置为 False 就不报错了

但是测试识别一个验证码为 96g9 的结果变成了

D:\96g9_41481702-4ddc-11ec-b2ca-1418c37faa3e.png q 0.017899727

为什么识别结果只有一个字母，难道字典需要自定义吗，我字典用的默认的英文那个字典

羊毛

#13 回复于2021-12

没收敛吧，训练的loss，和acc数值怎么样呢，如果验证码比较简单的话，acc是可以达到0.95+的

ljjwxj1970

#14 回复于2021-12

羊毛 #13

没收敛吧，训练的loss，和acc数值怎么样呢，如果验证码比较简单的话，acc是可以达到0.95+的

嗯嗯好的谢谢我就试了下就用了100张我多试试

时间女神

#15 回复于2021-12

可能是变量名称定义重复导致的无限递归，检查变量的命名！

你是年少的欢喜

#16 回复于2022-08

ljjwxj1970 #12

展开

我修改成False还是报错，如何解呢？谢谢

ERROR:root:DataLoader reader thread raised an exception!

RecursionError: maximum recursion depth exceeded while calling a Python object

SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:166)

Jonny5659

#17 回复于2023-01

羊毛 #11

把配置文件里的max_text_length调大点，看下解码的字符在不在配置的字典文件中

确实是，max_text_length调大就好了，试了下max_text_length最小可以设成3

要不是你说，看这报错信息我是怎么也没想到跟这个有关系

Jonny5659

#18 回复于2023-01

ljjwxj1970 #12

展开

我的确实只有一个字符想设置成1，可是最小好像只能设3

提issue

需求/bug反馈？一键提issue告诉我们

提pr

发现bug？如果您知道修复办法，欢迎提pr直接参与建设飞桨~