环境:
PaddlePaddle版本:paddlepaddle-gpu==1.8.5.post107
GPU:Tesla T4
Driver Version: 418.211
CUDA Version: 10.1
cuDNN version: 8.0.5
python version: 3.6.8 (GCC 4.8.5)
CentOS Linux release 7.9.2009 (Core)
问题描述:
代码在jupyter运行。调用CPU时模型可正常静态训练,调用GPU进行静态训练时jupyter内核会挂掉,操作系统报一个’检测到python3-3.6.8-18.el17软件包的一个问题‘。
具体报错信息:
W0926 16:16:56.865593 10933 device_context.cc:260] device: 0, cuDNN Version: 8.0.
W0926 16:16:59.773133 10933 init.cc:226] Warning: PaddlePaddle catches a failure signal, it may not work properly
W0926 16:16:59.773157 10933 init.cc:228] You could check whether you killed PaddlePaddle thread/process accidentally or report the case to PaddlePaddle
W0926 16:16:59.773160 10933 init.cc:231] The detail failure signal is:
W0926 16:16:59.773165 10933 init.cc:234] *** Aborted at 1632644219 (unix time) try "date -d @1632644219" if you are using GNU date ***
W0926 16:16:59.774632 10933 init.cc:234] PC: @ 0x0 (unknown)
W0926 16:16:59.774744 10933 init.cc:234] *** SIGSEGV (@0x0) received by PID 10933 (TID 0x7f5ce6b3f740) from PID 0; stack trace: ***
W0926 16:16:59.776057 10933 init.cc:234] @ 0x7f5ce6208630 (unknown)
W0926 16:16:59.777312 10933 init.cc:234] @ 0x0 (unknown)
[I 16:18:42.495 NotebookApp] Saving file at /x_program/program_6/xcv_lab6.ipynb
[I 16:23:41.826 NotebookApp] Kernel interrupted: 8a085c39-91e4-4f0a-a7b0-39001f15c252
建议提一个issue:https://github.com/PaddlePaddle/Paddle/issues
已经提了 ---> 训练错误信息 Aborted at 1632644219 (unix time) try date -d @1632644219 · Issue #36130 · PaddlePaddle/Paddle (github.com) 。望解答,谢谢
issue回复还是比较快的