分类
综合技术 长期更新的内容

linux卸载旧版CUDA并安装新版CUDA

使用GPU进行深度学习计算的浪潮已经持续了很多年,各种工具包的版本已经大幅更新,我们也需要将运行环境更新到最新版本,本文将介绍如何卸载旧版CUDA并安装新版CUDA。

(在苹果系统下,如果文章中的图片不能正常显示,请升级Safari浏览器到最新版本,或者使用Chrome、Firefox浏览器打开。)

使用GPU和CUDA、cuDNN进行深度学习计算的浪潮已经持续了很多年,在此期间,显卡驱动和CUDA版本,以及cudnn深度学习工具包的版本已经更新了很多次。随着新的TensorFlow 2.0版和Pytorch1.3版的发布,我们用于深度学习的机器也需要将运行环境更新到最新版本了,尤其是还在使用CUDA 8.0的话。本文将介绍如何卸载旧版CUDA(如8.0版)并安装新版CUDA(10.0版)。

AI柠檬博主曾在2017年写过一篇介绍如何安装gpu版tf的文章:《Linux系统下安装TensorFlow的GPU版本》,对于TensorFlow的安装,可以参考该文,关于软件依赖版本的对应等TensorFlow问题会保持更新。

材料准备

首先需要从NVIDIA官网下载下述两个文件,一个是cuda 10.0,一个是cudnn 7.4

  • cuda_10.0.130_410.48_linux
  • cudnn-10.0-linux-x64-v7.4.2.24.solitairetheme8

卸载旧版本CUDA

卸载前需要关闭一些跟图形相关的服务,比如X显示管理器lightdm。键盘按ctrl+Alt+F1,从纯命令行输入账号密码登入终端,然后输入下面的命令。

$ sudo systemctl stop lightdm
$ cd /usr/local/cuda-8.0/bin
$ sudo ./uninstall_cuda_8.0.pl

于是开始卸载CUDA 8.0。卸载的残留”cuda-8.0/”目录可以直接删除。

安装新版本CUDA

找到我们已经下载好的cuda 10和cudnn 7.4文件,并首先输入下列命令安装cuda 10。

$ sudo sh cuda_10.0.130_410.48_linux

首先出现的是关于CUDA的用户协议的事项,可以直接按“Ctrl +C”跳过,并输入“accpet”表示接受协议。

Logging to /tmp/cuda_install_11026.log

Using more to view the EULA.

End User License Agreement

--------------------------





Preface

-------



The Software License Agreement in Chapter 1 and the Supplement

in Chapter 2 contain license terms and conditions that govern

the use of NVIDIA software. By accepting this agreement, you

agree to comply with all the terms and conditions applicable

to the product(s) included herein.





NVIDIA Driver





Description



This package contains the operating system driver and

fundamental system software components for NVIDIA GPUs.





NVIDIA CUDA Toolkit





Description



The NVIDIA CUDA Toolkit provides command-line and graphical

tools for building, debugging and optimizing the performance

of applications accelerated by NVIDIA GPUs, runtime and math

libraries, and documentation including programming guides,

user manuals, and API references.





Default Install Location of CUDA Toolkit



Windows platform:



%ProgramFiles%\NVIDIA GPU Computing Toolkit\CUDA\v#.#



Linux platform:



/usr/local/cuda-#.#



Mac platform:



/Developer/NVIDIA/CUDA-#.#





NVIDIA CUDA Samples





Description



This package includes over 100+ CUDA examples that demonstrate

various CUDA programming principles, and efficient CUDA

implementation of algorithms in specific application domains.

Do you accept the previously read EULA?

accept/decline/quit: accept

由于需要更新NVIDIA驱动的版本,其中有一个“Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?”需要输入“y”以安装新版驱动。

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48?

(y)es/(n)o/(q)uit: y



Do you want to install the OpenGL libraries?

(y)es/(n)o/(q)uit [ default is yes ]: y



Do you want to run nvidia-xconfig?

This will update the system X configuration file so that the NVIDIA X driver

is used. The pre-existing X configuration file will be backed up.

This option should not be used on systems that require a custom

X configuration, such as systems with multiple GPU vendors.

(y)es/(n)o/(q)uit [ default is no ]:



Install the CUDA 10.0 Toolkit?

(y)es/(n)o/(q)uit: y



Enter Toolkit Location

[ default is /usr/local/cuda-10.0 ]:



Do you want to install a symbolic link at /usr/local/cuda?

(y)es/(n)o/(q)uit: y



Install the CUDA 10.0 Samples?

(y)es/(n)o/(q)uit: y



Enter CUDA Samples Location

[ default is /home/gpu ]:



Installing the NVIDIA display driver...

Installing the CUDA Toolkit in /usr/local/cuda-10.0 ...

Missing recommended library: libGLU.so

Missing recommended library: libXmu.so



Installing the CUDA Samples in /home/gpu ...

Copying samples to /home/gpu/NVIDIA_CUDA-10.0_Samples now...

Finished copying samples.



===========

= Summary =

===========



Driver:   Installed

Toolkit:  Installed in /usr/local/cuda-10.0

Samples:  Installed in /home/gpu, but missing recommended libraries



Please make sure that

-   PATH includes /usr/local/cuda-10.0/bin

-   LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root



To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/bin

To uninstall the NVIDIA Driver, run nvidia-uninstall



Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA.



Logfile is /tmp/cuda_install_11026.log

Signal caught, cleaning up

当最后出现这类输出,没有其他报错之后,就算成功安装了新版CUDA了。然后我们接着需要安装配置新的环境变量。在 ~/.bashrc 的最后添加:

export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda

其中,前 2 个(PATH, LD_LIBRARY_PATH) 是 CUDA 官网安装文档中建议的变量。第 3 个(CUDA_HOME)是 tensorflow-GPU 版本要求的变量。

配置完环境变量之后,一定要更新一下,否则不能立即生效。也可以通过重启电脑使得环境变量生效。

$ source ~/.bashrc

如果遗漏了这一步,对于新手来说,是致命的灾难,会出现明明正确按照教程配置,却根本无法使用GPU的情况。

接着,我们检查一下新版显卡驱动安装结果:

$ nvidia-smi

Fri Oct 27 15:46:57 2019

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |

|-------------------------------+----------------------+----------------------+

| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

|===============================+======================+======================|

|   0  Tesla P100-PCIE...  Off  | 00000000:06:00.0 Off |                    0 |

| N/A   29C    P0    24W / 250W |      0MiB / 12198MiB |      0%      Default |

+-------------------------------+----------------------+----------------------+



+-----------------------------------------------------------------------------+

| Processes:                                                       GPU Memory |

|  GPU       PID   Type   Process name                             Usage      |

|=============================================================================|

|  No running processes found                                                 |

+-----------------------------------------------------------------------------+

最后,需要恢复图形图像显示:

$ sudo systemctl start lightdm

配置cudnn库

首先,更改cudnn文件名称,以方便解压。其他版本的文件名需根据实际情况做相应修改。

$ cp cudnn-10.0-linux-x64-v7.4.2.24.solitairetheme8 cudnn-10.0-linux-x64-v7.4.2.24.tgz

然后解压

$ tar zxvf cudnn-10.0-linux-x64-v7.4.2.24.tgz

然后将库和头文件copy到cuda目录(一定是你自己安装的目录如/usr/local/cuda-10.0),不过正确安装的话,ubuntu一般就会有软链接/usr/local/cuda -> /usr/local/cuda-10.0/

$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64

接下来就是修改文件访问权限:

$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

然后,我们就可以放心大胆地安装最新版TensorFlow和Pytorch啦。

附:验证TensorFlow是否可以使用GPU

打开终端,输入下列命令:

$ python

Python 3.7.4 (default, Aug 13 2019, 20:35:49)

[GCC 7.3.0] :: Anaconda, Inc. on linux

Type "help", "copyright", "credits" or "license" for more information.

>>> import tensorflow as tf

>>> tf.test.is_gpu_available()

True

>>>

如果我们能够看到一个“True”,那么就说明可以正常使用GPU了,否则,需要根据具体的报错信息,再次检查上述过程,或者通过谷歌百度搜索看看是不是遗漏了什么。

版权声明
本博客的文章除特别说明外均为原创,本人版权所有。欢迎转载,转载请注明作者及来源链接,谢谢。
本文地址: https://blog.ailemon.net/2019/10/28/linux-remove-old-cuda-and-install-new-cuda/
All articles are under Attribution-NonCommercial-ShareAlike 4.0

关注“AI柠檬博客”微信公众号,及时获取你最需要的干货。


发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注

16 + 8 =

如果您是第一次在本站发布评论,内容将在博主审核后显示,请耐心等待