训练的时候出现如下报错,请问如何解决?
收藏
快速回复
AI Studio平台使用 问答其他 16867 26
训练的时候出现如下报错,请问如何解决?
收藏
快速回复
AI Studio平台使用 问答其他 16867 26

DataLoader worker (pid 454) is killed by signal: Bus error. It is possible that dataloader's workers are out of shared memory. Please try to raise your shared memory limit.

之后训练就自动停止了。

0
收藏
回复
全部评论(26)
时间顺序
朱小表
#22 回复于2020-04
各位,怎么查看每次登陆分配的内存大小呀?

free

0
回复
micahvista
#23 回复于2020-04

爆显存了?

0
回复
倍七
#24 回复于2020-04

楼主解决问题了吗  我设置num_workers=0后还是没用,缩小batch_size也无济于事

0
回复
MLTcola
#25 回复于2020-04

站的显存太多了吧,调小一下batch_size??

0
回复
x
xenos_an
#26 回复于2020-12

ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)
问题
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)
出现这个错误的情况是,在服务器上的docker中运行训练代码时,batch size设置得过大,shared memory不够(因为docker限制了shm).

根据PyTorch README:

Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.
解决方案
1.这里说明PyTorch的IPC会利用共享内存,所以共享内存必须足够大,可以通过docker run --shm-size进行修改
2.通过设置 --ipc=host
3.将Dataloader的num_workers设置为0.但训练会变慢

1
回复
AIStudio810258
#27 回复于2020-12
xenos_an #26
ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm) 问题 ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm) 出现这个错误的情况是,在服务器上的docker中运行训练代码时,batch size设置得过大,shared memory不够(因为docker限制了shm). 根据PyTorch README: Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run. 解决方案 1.这里说明PyTorch的IPC会利用共享内存,所以共享内存必须足够大,可以通过docker run --shm-size进行修改 2.通过设置 --ipc=host 3.将Dataloader的num_workers设置为0.但训练会变慢
展开

这个得有root权限吧

0
回复
在@后输入用户全名并按空格结束,可艾特全站任一用户