Upon installation of containerd on a kubernetes-worker NVIDIA DGXA100, a repo error prevented the installation of containerd. After some research, it looks like the http repo of NVIDIA is unstable and it is safer to pull from the https repos (according to this forum https://forums.developer.nvidia.com/t/the-following-signatures-were-invalid-badsig-f60f4b3d7fa2af80-cudatools-cudatools-nvidia-com/193642/3).
After changing the source to https, I was able to pull the cuda packages.
Logs:
Juju logs :
2022-02-23 16:53:46 INFO unit.containerd/8.juju-log server.go:327 Invoking reactive handler: reactive/containerd.py:413:check_for_gpu
2022-02-23 16:53:46 INFO unit.containerd/8.juju-log server.go:327 Invoking reactive handler: reactive/containerd.py:446:configure_nvidia
2022-02-23 16:53:46 INFO unit.containerd/8.juju-log server.go:327 status-set: maintenance: Installing Nvidia drivers.
2022-02-23 16:53:49 WARNING unit.containerd/8.install logger.go:60 W: GPG error: http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release: The following signatures were invalid: BADSIG F60F4B3D7FA2AF80 cudatools <email address hidden>
2022-02-23 16:53:49 WARNING unit.containerd/8.install logger.go:60 E: The repository 'http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release' is not signed.
2022-02-23 16:53:49 INFO unit.containerd/8.juju-log server.go:327 Installing ['cuda-drivers', 'nvidia-container-runtime'] with options: ['--option=Dpkg::Options::=--force-confold']
2022-02-23 16:53:50 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:53:50 INFO unit.containerd/8.juju-log server.go:327 Couldn't acquire DPKG lock. Will retry in 10 seconds
2022-02-23 16:54:00 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:54:00 INFO unit.containerd/8.juju-log server.go:327 Couldn't acquire DPKG lock. Will retry in 10 seconds
2022-02-23 16:54:10 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:54:10 INFO unit.containerd/8.juju-log server.go:327 Couldn't acquire DPKG lock. Will retry in 10 seconds
2022-02-23 16:54:21 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:54:21 INFO unit.containerd/8.juju-log server.go:327 Couldn't acquire DPKG lock. Will retry in 10 seconds
2022-02-23 16:54:31 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:54:31 INFO unit.containerd/8.juju-log server.go:327 Couldn't acquire DPKG lock. Will retry in 10 seconds
2022-02-23 16:54:41 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:54:41 INFO unit.containerd/8.juju-log server.go:327 Couldn't acquire DPKG lock. Will retry in 10 seconds
2022-02-23 16:54:52 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:54:52 INFO unit.containerd/8.juju-log server.go:327 Couldn't acquire DPKG lock. Will retry in 10 seconds
2022-02-23 16:55:02 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:55:02 INFO unit.containerd/8.juju-log server.go:327 Couldn't acquire DPKG lock. Will retry in 10 seconds
2022-02-23 16:55:12 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:55:33 ERROR unit.containerd/8.juju-log server.go:327 Hook error:
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
bus.dispatch(restricted=restricted_mode)
File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
_invoke(other_handlers)
File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
handler.invoke()
File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
self._action(*args)
File "/var/lib/juju/agents/unit-containerd-8/charm/reactive/containerd.py", line 488, in configure_nvidia
apt_install(packages, fatal=True)
File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charmhelpers/fetch/ubuntu.py", line 371, in apt_install
_run_apt_command(cmd, fatal, quiet=quiet)
File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charmhelpers/fetch/ubuntu.py", line 953, in _run_apt_command
_run_with_retries(
File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charmhelpers/fetch/ubuntu.py", line 930, in _run_with_retries
result = subprocess.check_call(cmd, env=env, **kwargs)
File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['apt-get', '--assume-yes', '--option=Dpkg::Options::=--force-confold', 'install', 'cuda-drivers', 'nvidia-container-runtime']' returned non-zero exit status 100.
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 Traceback (most recent call last):
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/charm/hooks/install", line 22, in <module>
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 main()
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 bus.dispatch(restricted=restricted_mode)
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 _invoke(other_handlers)
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 handler.invoke()
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 self._action(*args)
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/charm/reactive/containerd.py", line 488, in configure_nvidia
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 apt_install(packages, fatal=True)
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charmhelpers/fetch/ubuntu.py", line 371, in apt_install
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 _run_apt_command(cmd, fatal, quiet=quiet)
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charmhelpers/fetch/ubuntu.py", line 953, in _run_apt_command
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 _run_with_retries(
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charmhelpers/fetch/ubuntu.py", line 930, in _run_with_retries
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 result = subprocess.check_call(cmd, env=env, **kwargs)
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 raise CalledProcessError(retcode, cmd)
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 subprocess.CalledProcessError: Command '['apt-get', '--assume-yes', '--option=Dpkg::Options::=--force-confold', 'install', 'cuda-drivers', 'nvidia-container-runtime']' returned non-zero exit status 100.
If I run apt update manually :
ubuntu@dgx05:~$ sudo apt update
Hit:1 https://nvidia.github.io/libnvidia-container/ubuntu20.04/amd64 InRelease
Hit:2 http://archive.ubuntu.com/ubuntu focal InRelease
Hit:3 https://nvidia.github.io/nvidia-container-runtime/ubuntu20.04/amd64 InRelease
Hit:4 https://artifacts.elastic.co/packages/6.x/apt stable InRelease
Ign:5 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease
Get:6 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release [696 B]
Get:7 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release.gpg [836 B]
Hit:8 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:9 http://ppa.launchpad.net/telegraf-devs/ppa/ubuntu focal InRelease
Hit:10 http://archive.ubuntu.com/ubuntu focal-security InRelease
Hit:11 http://archive.ubuntu.com/ubuntu focal-backports InRelease
Ign:7 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release.gpg
Reading package lists... Done
W: GPG error: http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release: The following signatures were invalid: BADSIG F60F4B3D7FA2AF80 cudatools <email address hidden>
E: The repository 'http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
ubuntu@dgx05:~$
ubuntu@dgx05:~$ cat /etc/apt/sources.list.d/nvidia.list
deb https://nvidia.github.io/libnvidia-container/ubuntu20.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/ubuntu20.04/$(ARCH) /
deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /
If I change the cuda source to https:
ubuntu@dgx05:~$ sudo apt update
Hit:1 https://nvidia.github.io/libnvidia-container/ubuntu20.04/amd64 InRelease
Hit:2 https://nvidia.github.io/nvidia-container-runtime/ubuntu20.04/amd64 InRelease
Ign:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease
Hit:4 https://artifacts.elastic.co/packages/6.x/apt stable InRelease
Get:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release [696 B]
Hit:6 http://ppa.launchpad.net/telegraf-devs/ppa/ubuntu focal InRelease
Get:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release.gpg [836 B]
Hit:8 http://archive.ubuntu.com/ubuntu focal InRelease
Hit:9 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:10 http://archive.ubuntu.com/ubuntu focal-security InRelease
Hit:11 http://archive.ubuntu.com/ubuntu focal-backports InRelease
Get:12 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Packages [611 kB]
Fetched 613 kB in 1s (484 kB/s)
Reading package lists... Done
Building dependency tree
Reading state information... Done
All packages are up to date.
https:/ /github. com/charmed- kubernetes/ charm-container d/pull/ 68