有一台双T4服务器,当时因为飞桨不支持cuda 10.2 ,把cuda降低到10.1 10.0和9.x ,重新安装折腾了好几遍才装上。往事不堪回首,就不提了。
昨天又抱着非常可能踩坑的心态,把飞桨升级到2.0rc版本,果然就不顺利,报错:
--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0 paddle::imperative::Tracer::TraceOp(std::string const&, paddle::imperative::NameVarBaseMap const&, paddle::imperative::NameVarBaseMap const&, paddle::framework::AttributeMap, paddle::platform::Place const&, bool)
1 paddle::imperative::PreparedOp::Run(paddle::imperative::NameVarBaseMap const&, paddle::imperative::NameVarBaseMap const&, paddle::framework::AttributeMap const&)
2 paddle::framework::Tensor::mutable_data(paddle::platform::Place const&, paddle::framework::proto::VarType_Type, unsigned long)
3 paddle::memory::AllocShared(paddle::platform::Place const&, unsigned long)
4 paddle::memory::allocation::AllocatorFacade::Instance()
5 paddle::memory::allocation::AllocatorFacade::AllocatorFacade()
6 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
7 std::string paddle::platform::GetTraceBackString(char const*&&, char const*, int)
8 paddle::platform::GetCurrentTraceBackString[abi:cxx11]()
----------------------
Error Message Summary:
----------------------
ExternalError: Cuda error(35), CUDA driver version is insufficient for CUDA runtime version.
[Advise: This indicates that the installed NVIDIA CUDA driver is older than the CUDA runtime library. This is not a supported configuration.Users should install an updated NVIDIA display driver to allow the application to run.] (at /paddle/paddle/fluid/platform/gpu_info.cc:68)
[operator < uniform_random > error]
或者在import paddle的时候报错:
>>> import paddle
W1104 09:44:02.803095 16040 init.cc:157] Compiled with WITH_GPU, but no GPU found in runtime.
/home/skywalk/anaconda3/lib/python3.7/site-packages/paddle/fluid/framework.py:295: UserWarning: You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default.
"You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default."
然后我的操作是:
到nvidia官网下载cuda 10.2 ,安装
wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
sudo sh cuda_10.2.89_440.33.01_linux.run
由于我没有顺便安装显卡驱动,又重新把驱动从418升级到440
wget -c https://cn.download.nvidia.com/tesla/440.118.02/NVIDIA-Linux-x86_64-440.118.02.run
sudo sh NVIDIA-Linux-x86_64-440.118.02.run
安装好之后:
(base) [skywalk@ecs-ai nvidia]$ nvidia-smi
Wed Nov 4 09:57:59 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.118.02 Driver Version: 440.118.02 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:21:01.0 Off | 0 |
| N/A 35C P0 26W / 70W | 0MiB / 15109MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla T4 Off | 00000000:21:02.0 Off | 0 |
| N/A 35C P0 15W / 70W | 0MiB / 15109MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
可能大家看着里面显示的10.1 不顺眼,我是放弃了,这台机器的版本显示一直这样奇葩,不过nvcc能显示正确就行:
(base) [skywalk@ecs-ai ~]$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
现在飞桨2.0rc版本终于ok了!
import paddle
paddle.fluid.install_check.run_check()
Running Verify Fluid Program ...
Your Paddle Fluid works well on SINGLE GPU or CPU.
Your Paddle Fluid works well on MUTIPLE GPU or CPU.
Your Paddle Fluid is installed successfully! Let's start deep Learning with Paddle Fluid now