首页 Paddle框架 帖子详情
双T4卡装上飞桨2.0 rc版本
收藏
快速回复
Paddle框架 问答深度学习 1263 0
双T4卡装上飞桨2.0 rc版本
收藏
快速回复
Paddle框架 问答深度学习 1263 0

有一台双T4服务器,当时因为飞桨不支持cuda 10.2 ,把cuda降低到10.1 10.0和9.x ,重新安装折腾了好几遍才装上。往事不堪回首,就不提了。

昨天又抱着非常可能踩坑的心态,把飞桨升级到2.0rc版本,果然就不顺利,报错:

 

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0 paddle::imperative::Tracer::TraceOp(std::string const&, paddle::imperative::NameVarBaseMap const&, paddle::imperative::NameVarBaseMap const&, paddle::framework::AttributeMap, paddle::platform::Place const&, bool)
1 paddle::imperative::PreparedOp::Run(paddle::imperative::NameVarBaseMap const&, paddle::imperative::NameVarBaseMap const&, paddle::framework::AttributeMap const&)
2 paddle::framework::Tensor::mutable_data(paddle::platform::Place const&, paddle::framework::proto::VarType_Type, unsigned long)
3 paddle::memory::AllocShared(paddle::platform::Place const&, unsigned long)
4 paddle::memory::allocation::AllocatorFacade::Instance()
5 paddle::memory::allocation::AllocatorFacade::AllocatorFacade()
6 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
7 std::string paddle::platform::GetTraceBackString(char const*&&, char const*, int)
8 paddle::platform::GetCurrentTraceBackString[abi:cxx11]()

----------------------
Error Message Summary:
----------------------
ExternalError: Cuda error(35), CUDA driver version is insufficient for CUDA runtime version.
[Advise: This indicates that the installed NVIDIA CUDA driver is older than the CUDA runtime library. This is not a supported configuration.Users should install an updated NVIDIA display driver to allow the application to run.] (at /paddle/paddle/fluid/platform/gpu_info.cc:68)
[operator < uniform_random > error]

 

或者在import paddle的时候报错:

>>> import paddle

W1104 09:44:02.803095 16040 init.cc:157] Compiled with WITH_GPU, but no GPU found in runtime.

/home/skywalk/anaconda3/lib/python3.7/site-packages/paddle/fluid/framework.py:295: UserWarning: You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default.

  "You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default."

 

然后我的操作是:

到nvidia官网下载cuda 10.2 ,安装

wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run

sudo sh cuda_10.2.89_440.33.01_linux.run

 

由于我没有顺便安装显卡驱动,又重新把驱动从418升级到440 

wget -c https://cn.download.nvidia.com/tesla/440.118.02/NVIDIA-Linux-x86_64-440.118.02.run

sudo sh NVIDIA-Linux-x86_64-440.118.02.run

 

安装好之后:

(base) [skywalk@ecs-ai nvidia]$ nvidia-smi

Wed Nov  4 09:57:59 2020       

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 440.118.02   Driver Version: 440.118.02   CUDA Version: 10.2     |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|

|   0  Tesla T4            Off  | 00000000:21:01.0 Off |                    0 |

| N/A   35C    P0    26W /  70W |      0MiB / 15109MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

|   1  Tesla T4            Off  | 00000000:21:02.0 Off |                    0 |

| N/A   35C    P0    15W /  70W |      0MiB / 15109MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+

                                                                               

+-----------------------------------------------------------------------------+

| Processes:                                                       GPU Memory |

|  GPU       PID   Type   Process name                             Usage      |

|=============================================================================|

|  No running processes found                                                 |

+-----------------------------------------------------------------------------+

 

可能大家看着里面显示的10.1 不顺眼,我是放弃了,这台机器的版本显示一直这样奇葩,不过nvcc能显示正确就行:

(base) [skywalk@ecs-ai ~]$ nvcc -V

nvcc: NVIDIA (R) Cuda compiler driver

Copyright (c) 2005-2019 NVIDIA Corporation

Built on Wed_Oct_23_19:24:38_PDT_2019

Cuda compilation tools, release 10.2, V10.2.89

 

现在飞桨2.0rc版本终于ok了!

 

import paddle

paddle.fluid.install_check.run_check()

 

Running Verify Fluid Program ...

Your Paddle Fluid works well on SINGLE GPU or CPU.

Your Paddle Fluid works well on MUTIPLE GPU or CPU.

Your Paddle Fluid is installed successfully! Let's start deep Learning with Paddle Fluid now

0
收藏
回复
需求/bug反馈?一键提issue告诉我们
发现bug?如果您知道修复办法,欢迎提pr直接参与建设飞桨~
在@后输入用户全名并按空格结束,可艾特全站任一用户