CNN每次训练到step300就报错?
收藏
网络输入长度为20,通道数量为25。
运行时能正确读取数据集,能成功训练290个step。
但一到第300个step就报错,报错内容如下:
25ms/step
step 290/140400 [..............................] - loss: 0.0101 - acc: 0.1143 - ETA: 56:43 - 24ms/step
step 300/140400 [..............................] - loss: 0.0042 - acc: 0.1118 - ETA: 55:32 - 24ms/step2021-03-20 14:51:05,693 - WARNING - DataLoader reader thread raised an exception.
Exception in thread Thread-1:
Traceback (most recent call last):
File "/opt/_internal/cpython-3.7.0/lib/python3.7/threading.py", line 917, in _bootstrap_inner
self.run()
File "/opt/_internal/cpython-3.7.0/lib/python3.7/threading.py", line 865, in run
self._target(*self._args, **self._kwargs)
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 346, in _thread_loop
six.reraise(*sys.exc_info())
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/six.py", line 693, in reraise
raise value
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 317, in _thread_loop
batch = self._dataset_fetcher.fetch(indices)
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/dataloader/fetcher.py", line 65, in fetch
data = self.collate_fn(data)
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 90, in default_collate_fn
tmp = np.stack(slot, axis=0)
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/numpy/core/shape_base.py", line 353, in stack
raise ValueError('all input arrays must have the same shape')
ValueError: all input arrays must have the same shape
Traceback (most recent call last):
File "train.py", line 50, in
verbose=1)
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/hapi/model.py", line 1495, in fit
logs = self._run_one_epoch(train_loader, cbks, 'train')
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/hapi/model.py", line 1779, in _run_one_epoch
for step, data in enumerate(data_loader):
File "/opt/_internal/cpython-3.7.0/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 351, in __next__
return self._reader.read_next_var_list()
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] (at /paddle/paddle/fluid/operators/reader/blocking_queue.h:158)
采用的是自制的数据集。
虽然我怀疑是我的数据集有问题,但诡异的是每次出错都是在第300个step……
paddle做训练的时候不是自动shuffle数据集的吗,为什么会出现这个固定的现象,想不明白……
话说有没有办法让它自动忽略这种出错的数据,加强一点适应能力,不然每次训练到300就直接功亏一篑,实在太难受了诶
1
收藏
请登录后评论
数据集部分怎么被吞了……
我知道问题在哪里了。
第一,确实是我的数据集问题,时间序列的截取要注意方向,我截反了,导致有的序列截取不是完整的长度。
第二,数据集在传入paddle.Model的时候确实是被自动shuffle了。或许shuffle是伪随机,每次顺序都是一样的。
shuffle应该不是伪随机把
可以看一下源码
好像没法自动忽略数据集读取错误
这指的是?
99%的问题都是数据集的问题
数据集清洗一下
如果有读取错误就自动忽略这个batch数据,让训练继续执行,只警告错误
明白
训练的时候先输出数据集的shape,看看shape发生了什么变化。如果是NLP的话,可能是填充问题。
阳哥来了 哈哈