首页 Paddle框架 帖子详情
paddledetection训练报错 Memory map failed when create shared memory.
收藏
快速回复
Paddle框架 问答模型训练 13999 3
paddledetection训练报错 Memory map failed when create shared memory.
收藏
快速回复
Paddle框架 问答模型训练 13999 3

loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
W1224 19:45:22.058293 13701 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W1224 19:45:22.062646 13701 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
[12/24 19:45:24] ppdet.utils.checkpoint INFO: Finish loading model weights: /home/aistudio/.cache/paddle/weights/CSPResNetb_l_pretrained.pdparams
[12/24 19:45:26] ppdet.engine INFO: Epoch: [0] [ 0/38] learning_rate: 0.000000 loss: 27.027630 loss_cls: 25.446932 loss_iou: 0.452059 loss_dfl: 9.011045 eta: 0:44:26 batch_cost: 1.9494 data_cost: 0.0025 ips: 1.0260 images/s
[12/24 19:45:41] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_r_crn_l_3x_dota
[12/24 19:45:41] ppdet.engine INFO: Epoch: [1] [ 0/38] learning_rate: 0.000076 loss: 3.448476 loss_cls: 1.313157 loss_iou: 0.689530 loss_dfl: 9.010893 eta: 0:05:00 batch_cost: 0.2262 data_cost: 0.0003 ips: 8.8414 images/s
[12/24 19:45:56] ppdet.utils.checkpoint INFO: Save checkpoint: output/ppyoloe_r_crn_l_3x_dota
[12/24 19:45:56] ppdet.engine INFO: Epoch: [2] [ 0/38] learning_rate: 0.000152 loss: 2.897534 loss_cls: 0.613523 loss_iou: 0.709885 loss_dfl: 8.997499 eta: 0:04:19 batch_cost: 0.1739 data_cost: 0.0003 ips: 11.4980 images/s
Process Process-10:
Traceback (most recent call last):
File "/home/aistudio/PaddleDetection/ppdet/data/reader.py", line 209, in __next__
return next(self.loader)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 776, in __next__
six.reraise(*sys.exc_info())
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 719, in reraise
raise value
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 745, in __next__
self._reader.read_next_list()[0])
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/worker.py", line 371, in _worker_loop
six.reraise(*sys.exc_info())
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 719, in reraise
raise value
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/worker.py", line 361, in _worker_loop
else tensor_share_memory(b) for b in batch
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dataloader/worker.py", line 361, in
else tensor_share_memory(b) for b in batch
RuntimeError: (Unavailable) Memory map failed when create shared memory.
[Hint: Expected ptr != ((void *) -1), but received ptr:0xffffffffffffffff == ((void *) -1):0xffffffffffffffff.] (at /paddle/paddle/fluid/memory/allocation/mmap_allocator.cc:246)

Traceback (most recent call last):
File "tools/train.py", line 183, in
main()
File "tools/train.py", line 179, in main
run(FLAGS, cfg)
File "tools/train.py", line 132, in run
trainer.train(FLAGS.eval)
File "/home/aistudio/PaddleDetection/ppdet/engine/trainer.py", line 529, in train
self.optimizer.step()
File "", line 2, in step
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 319, in __impl__
return func(*args, **kwargs)
File "", line 2, in step
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 26, in __impl__
return wrapped_func(*args, **kwargs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 534, in __impl__
return func(*args, **kwargs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/optimizer/optimizer.py", line 1440, in step
param_group_idx=0,
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/optimizer/optimizer.py", line 1152, in _apply_optimize
params_grads = self._grad_clip(params_grads)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/clip.py", line 193, in __call__
return self._dygraph_clip(params_grads)
File "", line 2, in _dygraph_clip
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 319, in __impl__
return func(*args, **kwargs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/clip.py", line 561, in _dygraph_clip
new_grad = layers.elementwise_mul(g, clip_input)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/nn.py", line 12493, in elementwise_mul
op_name='elementwise_mul')
File "", line 2, in _elementwise_op_in_dygraph
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 26, in __impl__
return wrapped_func(*args, **kwargs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 534, in __impl__
return func(*args, **kwargs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/nn.py", line 231, in _elementwise_op_in_dygraph
out = op(x, y)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/multiprocess_utils.py", line 135, in __handler__
core._throw_error_if_process_failed()
SystemError: (Fatal) DataLoader process (pid 13923) exited unexpectedly with code 1. Error detailed are lost due to multiprocessing. Rerunning with:
1. If run DataLoader by DataLoader.from_generator(...), run with DataLoader.from_generator(..., use_multiprocess=False) may give better error trace.
2. If run DataLoader by DataLoader(dataset, ...), run with DataLoader(dataset, ..., num_workers=0) may give better error trace (at /paddle/paddle/fluid/imperative/data_loader.cc:161)

0
收藏
回复
全部评论(3)
时间顺序
宇宙物语
#2 回复于2022-12

有没有可能是输入数据没用DataLoader封装

0
回复
Automate
#3 回复于2022-12

yml里面改成`worker_num: 0`。

0
回复
李长安
#4 回复于2023-01
yml里面改成`worker_num: 0`。

大佬优秀

0
回复
需求/bug反馈?一键提issue告诉我们
发现bug?如果您知道修复办法,欢迎提pr直接参与建设飞桨~
在@后输入用户全名并按空格结束,可艾特全站任一用户