首页 炼丹房 帖子详情
炼丹炉又炸了Unexpected BUS error encountered in DataLoader worker
收藏
快速回复
炼丹房 问答新手上路 5140 16
炼丹炉又炸了Unexpected BUS error encountered in DataLoader worker
收藏
快速回复
炼丹房 问答新手上路 5140 16

炼丹炉又炸了。。。。。。

下面资源视图感觉没用多少内存啊

ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
0
收藏
回复
全部评论(16)
时间顺序
JavaRoom
#2 回复于2021-08

完整日志如下:

/home/aistudio/PaddleClas
/home/aistudio/PaddleClas/ppcls/arch/backbone/model_zoo/vision_transformer.py:15: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Callable
[2021/08/02 02:56:30] root INFO: 
===========================================================
==        PaddleClas is powered by PaddlePaddle !        ==
===========================================================
==                                                       ==
==   For more info please go to the following website.   ==
==                                                       ==
==       https://github.com/PaddlePaddle/PaddleClas      ==
===========================================================

[2021/08/02 02:56:30] root INFO: Arch : 
[2021/08/02 02:56:30] root INFO:     class_num : 2
[2021/08/02 02:56:30] root INFO:     name : MobileNetV3_large_x1_0
[2021/08/02 02:56:30] root INFO: DataLoader : 
[2021/08/02 02:56:30] root INFO:     Eval : 
[2021/08/02 02:56:30] root INFO:         dataset : 
[2021/08/02 02:56:30] root INFO:             cls_label_path : /home/aistudio/val.txt
[2021/08/02 02:56:30] root INFO:             image_root : /home/aistudio/
[2021/08/02 02:56:30] root INFO:             name : MyDataset
[2021/08/02 02:56:30] root INFO:             transform_ops : 
[2021/08/02 02:56:30] root INFO:                 DecodeImage : 
[2021/08/02 02:56:30] root INFO:                     channel_first : False
[2021/08/02 02:56:30] root INFO:                     to_rgb : True
[2021/08/02 02:56:30] root INFO:                 ResizeImage : 
[2021/08/02 02:56:30] root INFO:                     resize_short : 256
[2021/08/02 02:56:30] root INFO:                 CropImage : 
[2021/08/02 02:56:30] root INFO:                     size : 224
[2021/08/02 02:56:30] root INFO:                 NormalizeImage : 
[2021/08/02 02:56:30] root INFO:                     mean : [0.485, 0.456, 0.406]
[2021/08/02 02:56:30] root INFO:                     order : 
[2021/08/02 02:56:30] root INFO:                     scale : 1.0/255.0
[2021/08/02 02:56:30] root INFO:                     std : [0.229, 0.224, 0.225]
[2021/08/02 02:56:30] root INFO:         loader : 
[2021/08/02 02:56:30] root INFO:             num_workers : 4
[2021/08/02 02:56:30] root INFO:             use_shared_memory : True
[2021/08/02 02:56:30] root INFO:         sampler : 
[2021/08/02 02:56:30] root INFO:             batch_size : 32
[2021/08/02 02:56:30] root INFO:             drop_last : False
[2021/08/02 02:56:30] root INFO:             name : DistributedBatchSampler
[2021/08/02 02:56:30] root INFO:             shuffle : False
[2021/08/02 02:56:30] root INFO:     Train : 
[2021/08/02 02:56:30] root INFO:         dataset : 
[2021/08/02 02:56:30] root INFO:             cls_label_path : /home/aistudio/train.txt
[2021/08/02 02:56:30] root INFO:             image_root : /home/aistudio/
[2021/08/02 02:56:30] root INFO:             name : MyDataset
[2021/08/02 02:56:30] root INFO:             transform_ops : 
[2021/08/02 02:56:30] root INFO:                 DecodeImage : 
[2021/08/02 02:56:30] root INFO:                     channel_first : False
[2021/08/02 02:56:30] root INFO:                     to_rgb : True
[2021/08/02 02:56:30] root INFO:                 RandCropImage : 
[2021/08/02 02:56:30] root INFO:                     size : 224
[2021/08/02 02:56:30] root INFO:                 RandFlipImage : 
[2021/08/02 02:56:30] root INFO:                     flip_code : 1
[2021/08/02 02:56:30] root INFO:                 NormalizeImage : 
[2021/08/02 02:56:30] root INFO:                     mean : [0.485, 0.456, 0.406]
[2021/08/02 02:56:30] root INFO:                     order : 
[2021/08/02 02:56:30] root INFO:                     scale : 1.0/255.0
[2021/08/02 02:56:30] root INFO:                     std : [0.229, 0.224, 0.225]
[2021/08/02 02:56:30] root INFO:         loader : 
[2021/08/02 02:56:30] root INFO:             num_workers : 4
[2021/08/02 02:56:30] root INFO:             use_shared_memory : True
[2021/08/02 02:56:30] root INFO:         sampler : 
[2021/08/02 02:56:30] root INFO:             batch_size : 32
[2021/08/02 02:56:30] root INFO:             drop_last : False
[2021/08/02 02:56:30] root INFO:             name : DistributedBatchSampler
[2021/08/02 02:56:30] root INFO:             shuffle : True
[2021/08/02 02:56:30] root INFO: Global : 
[2021/08/02 02:56:30] root INFO:     checkpoints : None
[2021/08/02 02:56:30] root INFO:     device : gpu
[2021/08/02 02:56:30] root INFO:     epochs : 20
[2021/08/02 02:56:30] root INFO:     eval_during_train : True
[2021/08/02 02:56:30] root INFO:     eval_interval : 1
[2021/08/02 02:56:30] root INFO:     image_shape : [3, 224, 224]
[2021/08/02 02:56:30] root INFO:     output_dir : ./output/
[2021/08/02 02:56:30] root INFO:     pretrained_model : None
[2021/08/02 02:56:30] root INFO:     print_batch_step : 10
[2021/08/02 02:56:30] root INFO:     save_inference_dir : ./inference
[2021/08/02 02:56:30] root INFO:     save_interval : 1
[2021/08/02 02:56:30] root INFO:     use_visualdl : False
[2021/08/02 02:56:30] root INFO: Infer : 
[2021/08/02 02:56:30] root INFO:     PostProcess : 
[2021/08/02 02:56:30] root INFO:         class_id_map_file : /home/aistudio/label_list.txt
[2021/08/02 02:56:30] root INFO:         name : Topk
[2021/08/02 02:56:30] root INFO:         topk : 5
[2021/08/02 02:56:30] root INFO:     batch_size : 10
[2021/08/02 02:56:30] root INFO:     infer_imgs : datasets/hymenoptera_data/val/bees/2173503984_9c6aaaa7e2.jpg
[2021/08/02 02:56:30] root INFO:     transforms : 
[2021/08/02 02:56:30] root INFO:         DecodeImage : 
[2021/08/02 02:56:30] root INFO:             channel_first : False
[2021/08/02 02:56:30] root INFO:             to_rgb : True
[2021/08/02 02:56:30] root INFO:         ResizeImage : 
[2021/08/02 02:56:30] root INFO:             resize_short : 256
[2021/08/02 02:56:30] root INFO:         CropImage : 
[2021/08/02 02:56:30] root INFO:             size : 224
[2021/08/02 02:56:30] root INFO:         NormalizeImage : 
[2021/08/02 02:56:30] root INFO:             mean : [0.485, 0.456, 0.406]
[2021/08/02 02:56:30] root INFO:             order : 
[2021/08/02 02:56:30] root INFO:             scale : 1.0/255.0
[2021/08/02 02:56:30] root INFO:             std : [0.229, 0.224, 0.225]
[2021/08/02 02:56:30] root INFO:         ToCHWImage : None
[2021/08/02 02:56:30] root INFO: Loss : 
[2021/08/02 02:56:30] root INFO:     Eval : 
[2021/08/02 02:56:30] root INFO:         CELoss : 
[2021/08/02 02:56:30] root INFO:             weight : 1.0
[2021/08/02 02:56:30] root INFO:     Train : 
[2021/08/02 02:56:30] root INFO:         CELoss : 
[2021/08/02 02:56:30] root INFO:             weight : 1.0
[2021/08/02 02:56:30] root INFO: Metric : 
[2021/08/02 02:56:30] root INFO:     Eval : 
[2021/08/02 02:56:30] root INFO:         TopkAcc : 
[2021/08/02 02:56:30] root INFO:             topk : [1]
[2021/08/02 02:56:30] root INFO:     Train : 
[2021/08/02 02:56:30] root INFO:         TopkAcc : 
[2021/08/02 02:56:30] root INFO:             topk : [1]
[2021/08/02 02:56:30] root INFO: Optimizer : 
[2021/08/02 02:56:30] root INFO:     lr : 
[2021/08/02 02:56:30] root INFO:         last_epoch : -1
[2021/08/02 02:56:30] root INFO:         learning_rate : 0.00375
[2021/08/02 02:56:30] root INFO:         name : Cosine
[2021/08/02 02:56:30] root INFO:         warmup_epoch : 5
[2021/08/02 02:56:30] root INFO:     momentum : 0.9
[2021/08/02 02:56:30] root INFO:     name : Momentum
[2021/08/02 02:56:30] root INFO:     regularizer : 
[2021/08/02 02:56:30] root INFO:         coeff : 1e-06
[2021/08/02 02:56:30] root INFO:         name : L2
W0802 02:56:30.461478  8465 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0802 02:56:30.466107  8465 device_context.cc:422] device: 0, cuDNN Version: 7.6.
[2021/08/02 02:56:35] root INFO: train with paddle 2.1.0 and device CUDAPlace(0)
{'CELoss': {'weight': 1.0}}
ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
Traceback (most recent call last):
  File "tools/train.py", line 31, in 
    trainer.train()
  File "/home/aistudio/PaddleClas/ppcls/engine/trainer.py", line 191, in train
    loss_dict["loss"].backward()
  File "", line 2, in backward
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
    return wrapped_func(*args, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 225, in __impl__
    return func(*args, **kwargs)
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/varbase_patch_methods.py", line 236, in backward
    framework._dygraph_tracer())
  File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/multiprocess_utils.py", line 138, in __handler__
    core._throw_error_if_process_failed()
SystemError: (Fatal) DataLoader process (pid   1. If run DataLoader by DataLoader.from_generator(...), queue capacity is set by from_generator(..., capacity=xx, ...).
  2. If run DataLoader by DataLoader(dataset, ...), queue capacity is set as 2 times of the max value of num_workers and len(places).
  3. If run by DataLoader(dataset, ..., use_shared_memory=True), set use_shared_memory=False for not using shared memory.) exited is killed by signal: 8564.
  It may be caused by insufficient shared storage space. This problem usually occurs when using docker as a development environment.
  Please use command `df -h` to check the storage space of `/dev/shm`. Shared storage space needs to be greater than (DataLoader Num * DataLoader queue capacity * 1 batch data size).
  You can solve this problem by increasing the shared storage space or reducing the queue capacity appropriately.
Bus error (at /paddle/paddle/fluid/imperative/data_loader.cc:177)
0
回复
JavaRoom
#3 回复于2021-08

自我评论,配置文件中关了share memory就好了

0
回复
FutureSI
#4 回复于2021-08

嗯,共享内存设为False,num_workers=0

0
回复
nidb222
#6 回复于2022-03
自我评论,配置文件中关了share memory就好了

咋关

0
回复
1
1179276863
#8 回复于2022-08

删除 /dev/shm 下所有paddle开头的缓存文件就可以了

2
回复
被褐怀玉
#9 回复于2022-09
自我评论,配置文件中关了share memory就好了

咋关

0
回复
JavaRoom
#10 回复于2022-11
咋关

修改训练配置文件

0
回复
范德蒙大帝
#11 回复于2023-02
JavaRoom #10
修改训练配置文件

哪一个配置文件?

 

0
回复
風雪殘月
#12 回复于2023-05

同问,小白今天也遇到这样的问题了

0
回复
風雪殘月
#13 回复于2023-05

楼主有解决吗

0
回复
JavaRoom
#14 回复于2023-05
同问,小白今天也遇到这样的问题了

配置文件里关掉

0
回复
w
wpj
#15 回复于2023-06

修改配置文件里哪一个参数

0
回复
JavaRoom
#16 回复于2023-06
wpj #15
修改配置文件里哪一个参数

share memory

0
回复
守望Alex
#17 回复于2023-12
自我评论,配置文件中关了share memory就好了

配置文件在哪里啊

 

0
回复
IvanAXu
#18 回复于2024-04
删除 /dev/shm 下所有paddle开头的缓存文件就可以了

管用!赞!

0
回复
IvanAXu
#19 回复于2024-04
删除 /dev/shm 下所有paddle开头的缓存文件就可以了

赞!

0
回复
在@后输入用户全名并按空格结束,可艾特全站任一用户