首页 Paddle框架 帖子详情
单机多卡时遇到的问题。OSError: (External) CUBLAS error(13).
收藏
快速回复
Paddle框架 问答模型训练 1976 3
单机多卡时遇到的问题。OSError: (External) CUBLAS error(13).
收藏
快速回复
Paddle框架 问答模型训练 1976 3

OSError: (External) CUBLAS error(13). 

  [Hint: 'CUBLAS_STATUS_EXECUTION_FAILED'.  The GPU program failed to execute. This is often caused by a launch failure of the kernel on the GPU, which can be caused by multiple reasons.  To correct: check that the hardware, an appropriate version of the driver, and the cuBLAS library are correctly installed. ] (at /paddle/paddle/phi/kernels/funcs/blas/blas_impl.cu.h:35)

  [operator < matmul_v2 > error]

 


  6%|▌         | 1597/25932 [08:37<2:11:33,  3.08it/s]

terminate called after throwing an instance of 'phi::enforce::EnforceNotMet'

  what():  (External) CUDA error(700), an illegal memory access was encountered. 

  [Hint: 'cudaErrorIllegalAddress'. The device encountered a load or store instruction on an invalid memory address. This leaves the process in an inconsistentstate and any further CUDA work will return the same error. To continue using CUDA, the process must be terminated and relaunched. ] (at /paddle/paddle/fluid/memory/allocation/cuda_device_context_allocator.h:99)

 


INFO 2022-08-17 13:27:29,212 launch_utils.py:322] terminate process group gid:4437

INFO 2022-08-17 13:27:29,212 launch_utils.py:322] terminate process group gid:4437

INFO 2022-08-17 13:27:33,215 launch_utils.py:343] terminate all the procs

INFO 2022-08-17 13:27:33,215 launch_utils.py:343] terminate all the procs

ERROR 2022-08-17 13:27:33,215 launch_utils.py:640] ABORT!!! Out of all 2 trainers, the trainer process with rank=[0] was aborted. Please check its log.

ERROR 2022-08-17 13:27:33,215 launch_utils.py:640] ABORT!!! Out of all 2 trainers, the trainer process with rank=[0] was aborted. Please check its log.

INFO 2022-08-17 13:27:37,220 launch_utils.py:343] terminate all the procs

INFO 2022-08-17 13:27:37,220 launch_utils.py:343] terminate all the procs

INFO 2022-08-17 13:27:37,220 launch.py:402] Local processes completed.

INFO 2022-08-17 13:27:37,220 launch.py:402] Local processes completed.

0
收藏
回复
全部评论(3)
时间顺序
zhuningxian
#2 回复于2022-08

666

0
回复
l
love_myy
#3 回复于2022-08
666

这个66666是什么意思捏

0
回复
天马行空
#4 回复于2022-08

按它说的:To correct: check that the hardware, an appropriate version of the driver, and the cuBLAS library are correctly installed

先用单卡测试下每块显卡,看看显卡都没有问题吧。 

 

0
回复
需求/bug反馈?一键提issue告诉我们
发现bug?如果您知道修复办法,欢迎提pr直接参与建设飞桨~
在@后输入用户全名并按空格结束,可艾特全站任一用户