nvidia apt source should be https

Bug #1962032 reported by Camille Rodriguez
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Containerd Subordinate Charm
Fix Released
Medium
Adam Dyess

Bug Description

Upon installation of containerd on a kubernetes-worker NVIDIA DGXA100, a repo error prevented the installation of containerd. After some research, it looks like the http repo of NVIDIA is unstable and it is safer to pull from the https repos (according to this forum https://forums.developer.nvidia.com/t/the-following-signatures-were-invalid-badsig-f60f4b3d7fa2af80-cudatools-cudatools-nvidia-com/193642/3).

After changing the source to https, I was able to pull the cuda packages.

Logs:

Juju logs :

2022-02-23 16:53:46 INFO unit.containerd/8.juju-log server.go:327 Invoking reactive handler: reactive/containerd.py:413:check_for_gpu
2022-02-23 16:53:46 INFO unit.containerd/8.juju-log server.go:327 Invoking reactive handler: reactive/containerd.py:446:configure_nvidia
2022-02-23 16:53:46 INFO unit.containerd/8.juju-log server.go:327 status-set: maintenance: Installing Nvidia drivers.
2022-02-23 16:53:49 WARNING unit.containerd/8.install logger.go:60 W: GPG error: http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release: The following signatures were invalid: BADSIG F60F4B3D7FA2AF80 cudatools <email address hidden>
2022-02-23 16:53:49 WARNING unit.containerd/8.install logger.go:60 E: The repository 'http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release' is not signed.
2022-02-23 16:53:49 INFO unit.containerd/8.juju-log server.go:327 Installing ['cuda-drivers', 'nvidia-container-runtime'] with options: ['--option=Dpkg::Options::=--force-confold']
2022-02-23 16:53:50 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:53:50 INFO unit.containerd/8.juju-log server.go:327 Couldn't acquire DPKG lock. Will retry in 10 seconds
2022-02-23 16:54:00 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:54:00 INFO unit.containerd/8.juju-log server.go:327 Couldn't acquire DPKG lock. Will retry in 10 seconds
2022-02-23 16:54:10 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:54:10 INFO unit.containerd/8.juju-log server.go:327 Couldn't acquire DPKG lock. Will retry in 10 seconds
2022-02-23 16:54:21 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:54:21 INFO unit.containerd/8.juju-log server.go:327 Couldn't acquire DPKG lock. Will retry in 10 seconds
2022-02-23 16:54:31 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:54:31 INFO unit.containerd/8.juju-log server.go:327 Couldn't acquire DPKG lock. Will retry in 10 seconds
2022-02-23 16:54:41 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:54:41 INFO unit.containerd/8.juju-log server.go:327 Couldn't acquire DPKG lock. Will retry in 10 seconds
2022-02-23 16:54:52 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:54:52 INFO unit.containerd/8.juju-log server.go:327 Couldn't acquire DPKG lock. Will retry in 10 seconds
2022-02-23 16:55:02 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:55:02 INFO unit.containerd/8.juju-log server.go:327 Couldn't acquire DPKG lock. Will retry in 10 seconds
2022-02-23 16:55:12 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 E: Unable to locate package cuda-drivers
2022-02-23 16:55:33 ERROR unit.containerd/8.juju-log server.go:327 Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-containerd-8/charm/reactive/containerd.py", line 488, in configure_nvidia
    apt_install(packages, fatal=True)
  File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charmhelpers/fetch/ubuntu.py", line 371, in apt_install
    _run_apt_command(cmd, fatal, quiet=quiet)
  File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charmhelpers/fetch/ubuntu.py", line 953, in _run_apt_command
    _run_with_retries(
  File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charmhelpers/fetch/ubuntu.py", line 930, in _run_with_retries
    result = subprocess.check_call(cmd, env=env, **kwargs)
  File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['apt-get', '--assume-yes', '--option=Dpkg::Options::=--force-confold', 'install', 'cuda-drivers', 'nvidia-container-runtime']' returned non-zero exit status 100.

2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 Traceback (most recent call last):
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/charm/hooks/install", line 22, in <module>
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 main()
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charms/reactive/__init__.py", line 74, in main
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 bus.dispatch(restricted=restricted_mode)
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 390, in dispatch
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 _invoke(other_handlers)
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 359, in _invoke
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 handler.invoke()
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charms/reactive/bus.py", line 181, in invoke
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 self._action(*args)
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/charm/reactive/containerd.py", line 488, in configure_nvidia
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 apt_install(packages, fatal=True)
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charmhelpers/fetch/ubuntu.py", line 371, in apt_install
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 _run_apt_command(cmd, fatal, quiet=quiet)
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charmhelpers/fetch/ubuntu.py", line 953, in _run_apt_command
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 _run_with_retries(
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/var/lib/juju/agents/unit-containerd-8/.venv/lib/python3.8/site-packages/charmhelpers/fetch/ubuntu.py", line 930, in _run_with_retries
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 result = subprocess.check_call(cmd, env=env, **kwargs)
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 raise CalledProcessError(retcode, cmd)
2022-02-23 16:55:33 WARNING unit.containerd/8.install logger.go:60 subprocess.CalledProcessError: Command '['apt-get', '--assume-yes', '--option=Dpkg::Options::=--force-confold', 'install', 'cuda-drivers', 'nvidia-container-runtime']' returned non-zero exit status 100.

If I run apt update manually :

ubuntu@dgx05:~$ sudo apt update
Hit:1 https://nvidia.github.io/libnvidia-container/ubuntu20.04/amd64 InRelease
Hit:2 http://archive.ubuntu.com/ubuntu focal InRelease
Hit:3 https://nvidia.github.io/nvidia-container-runtime/ubuntu20.04/amd64 InRelease
Hit:4 https://artifacts.elastic.co/packages/6.x/apt stable InRelease
Ign:5 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease
Get:6 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release [696 B]
Get:7 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release.gpg [836 B]
Hit:8 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:9 http://ppa.launchpad.net/telegraf-devs/ppa/ubuntu focal InRelease
Hit:10 http://archive.ubuntu.com/ubuntu focal-security InRelease
Hit:11 http://archive.ubuntu.com/ubuntu focal-backports InRelease
Ign:7 http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release.gpg
Reading package lists... Done
W: GPG error: http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release: The following signatures were invalid: BADSIG F60F4B3D7FA2AF80 cudatools <email address hidden>
E: The repository 'http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release' is not signed.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.
ubuntu@dgx05:~$

ubuntu@dgx05:~$ cat /etc/apt/sources.list.d/nvidia.list
deb https://nvidia.github.io/libnvidia-container/ubuntu20.04/$(ARCH) /
deb https://nvidia.github.io/nvidia-container-runtime/ubuntu20.04/$(ARCH) /
deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 /

If I change the cuda source to https:
ubuntu@dgx05:~$ sudo apt update
Hit:1 https://nvidia.github.io/libnvidia-container/ubuntu20.04/amd64 InRelease
Hit:2 https://nvidia.github.io/nvidia-container-runtime/ubuntu20.04/amd64 InRelease
Ign:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease
Hit:4 https://artifacts.elastic.co/packages/6.x/apt stable InRelease
Get:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release [696 B]
Hit:6 http://ppa.launchpad.net/telegraf-devs/ppa/ubuntu focal InRelease
Get:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Release.gpg [836 B]
Hit:8 http://archive.ubuntu.com/ubuntu focal InRelease
Hit:9 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:10 http://archive.ubuntu.com/ubuntu focal-security InRelease
Hit:11 http://archive.ubuntu.com/ubuntu focal-backports InRelease
Get:12 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 Packages [611 kB]
Fetched 613 kB in 1s (484 kB/s)
Reading package lists... Done
Building dependency tree
Reading state information... Done
All packages are up to date.

Revision history for this message
Adam Dyess (addyess) wrote :
Changed in charm-containerd:
status: New → In Progress
importance: Undecided → Medium
milestone: none → 1.25
assignee: nobody → Adam Dyess (addyess)
Changed in charm-containerd:
status: In Progress → Fix Committed
milestone: 1.25 → 1.24+ck1
Adam Dyess (addyess)
tags: added: backport-needed
Adam Dyess (addyess)
tags: removed: backport-needed
Adam Dyess (addyess)
Changed in charm-containerd:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.