id: 12703    nodeId: 12703    type: General    point: 199.0    linkPoint: .0    maker: cella    permission: linkable    made at: 2020.03.24 05:09    edited at: 2021.06.10 02:55
system upgrade 2021-06-09

I want to use Tensorflow with docker container for which Nvidia driver (or CUDA driver) is needed but CUDA Toolkit is NOT.
Nvidia Container Toolkit is also needed.
Docker is already installed.

https://www.tensorflow.org/install/docker
https://github.com/NVIDIA/nvidia-docker

======== Nvidia driver


// check installed driver

$ apt list --installed |grep nvidia

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libnvidia-cfg1-440/now 440.100-0ubuntu0.18.04.1 amd64 [installed,upgradable to: 440.118.02-0ubuntu1]
libnvidia-common-440/bionic-updates,bionic-updates,bionic-security,bionic-security,now 440.100-0ubuntu0.18.04.1 all [installed,upgradable to: 440.118.02-0ubuntu1]
libnvidia-compute-440/now 440.100-0ubuntu0.18.04.1 amd64 [installed,upgradable to: 440.118.02-0ubuntu1]
libnvidia-decode-440/now 440.100-0ubuntu0.18.04.1 amd64 [installed,upgradable to: 440.118.02-0ubuntu1]
libnvidia-encode-440/now 440.100-0ubuntu0.18.04.1 amd64 [installed,upgradable to: 440.118.02-0ubuntu1]
libnvidia-extra-440/now 440.100-0ubuntu0.18.04.1 amd64 [installed,upgradable to: 450.119.03-0ubuntu0.18.04.1]
libnvidia-fbc1-440/now 440.100-0ubuntu0.18.04.1 amd64 [installed,upgradable to: 440.118.02-0ubuntu1]
libnvidia-gl-440/now 440.100-0ubuntu0.18.04.1 amd64 [installed,upgradable to: 440.118.02-0ubuntu1]
libnvidia-ifr1-440/now 440.100-0ubuntu0.18.04.1 amd64 [installed,upgradable to: 440.118.02-0ubuntu1]
nvidia-compute-utils-440/now 440.100-0ubuntu0.18.04.1 amd64 [installed,upgradable to: 440.118.02-0ubuntu1]
nvidia-dkms-440/now 440.100-0ubuntu0.18.04.1 amd64 [installed,upgradable to: 440.118.02-0ubuntu1]
nvidia-driver-440/now 440.100-0ubuntu0.18.04.1 amd64 [installed,upgradable to: 440.118.02-0ubuntu1]
nvidia-kernel-common-440/now 440.100-0ubuntu0.18.04.1 amd64 [installed,upgradable to: 440.118.02-0ubuntu1]
nvidia-kernel-source-440/now 440.100-0ubuntu0.18.04.1 amd64 [installed,upgradable to: 440.118.02-0ubuntu1]
nvidia-modprobe/unknown,now 440.64.00-0ubuntu1 amd64 [installed,upgradable to: 465.19.01-0ubuntu1]
nvidia-prime/now 0.8.8.2 all [installed,upgradable to: 0.8.16~0.18.04.1]
nvidia-settings/unknown,now 440.64.00-0ubuntu1 amd64 [installed,upgradable to: 465.19.01-0ubuntu1]
nvidia-utils-440/now 440.100-0ubuntu0.18.04.1 amd64 [installed,upgradable to: 440.118.02-0ubuntu1]
xserver-xorg-video-nvidia-440/now 440.100-0ubuntu0.18.04.1 amd64 [installed,upgradable to: 440.118.02-0ubuntu1]


// remove the installed driver
// actually there are uninstallation guide by nividia https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-uninstallation
// but I did not try since I removed those already by https://stackoverflow.com/questions/56431461/how-to-remove-cuda-completely-from-ubuntu

There are two things- nvidia drivers and cuda toolkit- which you may want to remove. If you have installed using apt-get use the following to remove the packages completely from the system:

To remove cuda toolkit:
$ sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*"
To remove Nvidia drivers:
$ sudo apt-get --purge remove "*nvidia*"
If you have installed via source files (assuming the default location to be /use/local) then remove it using:
$ sudo rm -rf /usr/local/cuda*
If you get the problem of broken packages, it has happened since you added repo to the apt/sources.lst. Run the following to delete it:
$ sudo vim /etc/apt/sources.list
Go to the line containing reference to Nvidia repo and comment it by appending # in front of the line, for e.g.:
#deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/ /
Then run
$sudo apt-get update
This will fix the problem.


// now install the new driver with Package Managers: follow the guide by https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html

// no post-installation actions are done, since I don't install CUDA Toolkit and don't have Power9.

// reboot

// check
$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 465.19.01 Fri Mar 19 07:44:41 UTC 2021
GCC version: gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)


======== Nvidia Container Toolkit

installation guide
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker

user guide
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/user-guide.html

Return to system upgrade 2021-06-09