请问大佬如何修改batch_size值还有图片大小呢orz,在运行https://github.com/JiehangXie/PaddleBoBo中create_virtual_human.py文件时,报错Out of memory error on GPU 0,但是我不太清楚能在哪里改...非常感谢!!
Out of memory error on GPU 0. Cannot allocate 2.861023GB memory on GPU 0, 5.999512GB memory has been allocated and available memory is only 0.000000B.
Please check whether there is any other process using GPU 0. 1. If yes, please stop them, or start PaddlePaddle on another GPU. 2. If no, please decrease the batch size of your model. (at ..\paddle\fluid\memory\allocation\cuda_allocator.cc:86)
这个建议去官网issue一下,然后把系统版本。飞桨版本也进行一个表述,方便官方进行回复和处理
显存不足报错,尝试减少下batchsize、或者裁剪图片等降低下显存占用。
或者你不要在本地跑,用高配的服务器去训练后导出模型参数再考虑本地做预测及部署。
之前认为是有gpu占用,后来改小batchsize可以了
之前认为是有gpu占用,后来改小batchsize可以了
不愧是炼丹师
1、减小batch size;
2、缩小网络输入图片尺寸;
3、更换更小的网络;
学到了学到了
遇到同样问题,减少了batch也不好用。
关闭程序,重新运行。我之前有两个模型在notebook,运行了一个之后,我又运行另外一个,出现GPU不够,系统建议减少batch。
楼主修复了吗,我这个也是这个问题
减少下batchsize
或者resize图片
再不行用半精度的数据
请问大佬如何修改batch_size值还有图片大小呢orz,在运行https://github.com/JiehangXie/PaddleBoBo中create_virtual_human.py文件时,报错Out of memory error on GPU 0,但是我不太清楚能在哪里改...非常感谢!!
你解决了没?我也设置到1了,还是报这个问题
我也是这样,我是6G的
我的显存占用率差不多一半,训练的时候也没有发生报错。但是保存训练过程中评估发生了这样的错误
2024-04-27 12:11:10 [INFO] [TRAIN] epoch: 1, iter: 480/40000, loss: 0.0562, lr: 0.009892, batch_cost: 0.1013, reader_cost: 0.05119, ips: 19.7454 samples/sec | ETA 01:06:42
2024-04-27 12:11:13 [INFO] [TRAIN] epoch: 1, iter: 490/40000, loss: 0.0353, lr: 0.009890, batch_cost: 0.2332, reader_cost: 0.18838, ips: 8.5772 samples/sec | ETA 02:33:32
2024-04-27 12:11:14 [INFO] [TRAIN] epoch: 1, iter: 500/40000, loss: 0.0906, lr: 0.009888, batch_cost: 0.1161, reader_cost: 0.07341, ips: 17.2338 samples/sec | ETA 01:16:24
2024-04-27 12:11:14 [INFO] Start evaluating (total_samples: 1582, total_iters: 1582)...
Traceback (most recent call last):
File "tools/train.py", line 213, in
main(args)
File "tools/train.py", line 188, in main
train(
File "d:\paddleseg\paddleseg\core\train.py", line 316, in train
mean_iou, acc, _, _, _ = evaluate(
File "d:\paddleseg\paddleseg\core\val.py", line 158, in evaluate
pred, logits = infer.inference(
File "d:\paddleseg\paddleseg\core\infer.py", line 160, in inference
logits = model(im)
File "F:\anaconda\envs\paddleseg\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in __call__
return self.forward(*inputs, **kwargs)
File "d:\paddleseg\paddleseg\models\unet.py", line 66, in forward
x = self.decode(x, short_cuts)
File "F:\anaconda\envs\paddleseg\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in __call__
return self.forward(*inputs, **kwargs)
File "d:\paddleseg\paddleseg\models\unet.py", line 116, in forward
x = self.up_sample_list[i](x, short_cuts[-(i + 1)])
File "F:\anaconda\envs\paddleseg\lib\site-packages\paddle\nn\layer\layers.py", line 1254, in __call__
return self.forward(*inputs, **kwargs)
File "d:\paddleseg\paddleseg\models\unet.py", line 155, in forward
x = paddle.concat([x, short_cut], axis=1)
File "F:\anaconda\envs\paddleseg\lib\site-packages\paddle\tensor\manipulation.py", line 1121, in concat
return _C_ops.concat(input, axis)
MemoryError:
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
Not support stack backtrace yet.
----------------------
Error Message Summary:
----------------------
ResourceExhaustedError:
Out of memory error on GPU 0. Cannot allocate 2.861023GB memory on GPU 0, 5.999512GB memory has been allocated and available memory is only 0.000000B.
Please check whether there is any other process using GPU 0.
1. If yes, please stop them, or start PaddlePaddle on another GPU.
2. If no, please decrease the batch size of your model.
(at ..\paddle\fluid\memory\allocation\cuda_allocator.cc:86)
有人知道这是为什么吗?
我把 use_space_char 改成了 false。也出现了这个问题。
只好用 true 了。
```yml
use_space_char: true
```