cuda-drivers : Depends: cuda-drivers-515 (= 515.48.07-1) but it is not going to be installed

Bug #1982197 reported by Chris Johnston
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Containerd Subordinate Charm
Fix Released
High
Adam Dyess

Bug Description

When upgrading the charm from 1.22 to 1.24/stable we are seeing an error:

2022-07-16 05:41:59 DEBUG unit.containerd/110.config-changed logger.go:60 cuda-drivers : Depends: cuda-drivers-515 (= 515.48.07-1) but it is not going to be installed
2022-07-16 05:41:59 WARNING unit.containerd/110.config-changed logger.go:60 E: Unable to correct problems, you have held broken packages.
2022-07-16 05:41:59 ERROR unit.containerd/110.juju-log server.go:319 Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-containerd-110/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-containerd-110/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 390, in dispatch
    _invoke(other_handlers)
  File "/var/lib/juju/agents/unit-containerd-110/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-containerd-110/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-containerd-110/charm/reactive/containerd.py", line 488, in configure_nvidia
    apt_install(packages, fatal=True)
  File "/var/lib/juju/agents/unit-containerd-110/.venv/lib/python3.6/site-packages/charmhelpers/fetch/ubuntu.py", line 369, in apt_install
    _run_apt_command(cmd, fatal, quiet=quiet)
  File "/var/lib/juju/agents/unit-containerd-110/.venv/lib/python3.6/site-packages/charmhelpers/fetch/ubuntu.py", line 948, in _run_apt_command
    quiet=quiet)
  File "/var/lib/juju/agents/unit-containerd-110/.venv/lib/python3.6/site-packages/charmhelpers/fetch/ubuntu.py", line 922, in _run_with_retries
    result = subprocess.check_call(cmd, env=env, **kwargs)
  File "/usr/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['apt-get', '--assume-yes', '--option=Dpkg::Options::=--force-confold', 'install', 'cuda-drivers', 'nvidia-container-runtime']' returned non-zero exit status 100.

Notes from attempted manual install: https://paste.ubuntu.com/p/xSVrtW6zRj/

It would appear as though the issue is in order to install cuda-drivers-515 apt would need to remove the -460/-470 packages, which it will not do on its own.

Revision history for this message
Adam Dyess (addyess) wrote :

From [NVIDIA docs](https://docs.nvidia.com/cuda/archive/9.1/cuda-installation-guide-linux/index.html#handle-uninstallation)

> Before installing CUDA, any previously installations that could conflict should be uninstalled. This will not affect systems which have not had CUDA installed previously, or systems where the installation method has been preserved (RPM/Deb vs. Runfile). See the following charts for specifics.

the charm will need to purge the cuda-drivers before running an install/upgrade on them

Revision history for this message
Chris Johnston (cjohnston) wrote :

The way that I read the comments there and especially the table is that if you are going from deb to deb there should be no work needed.

I have done some additional testing and have found that the -460 packages need to be removed prior to upgrading to 515. This may mean:

- install cuda-drivers-470 # this will allow for the -460 packages to be able to be removed
- apt autoremove --purge # this will remove the -460 packages

I was then able to install cuda-drivers, which pulls in cuda-drivers-515.

Full steps:

https://paste.ubuntu.com/p/wWNFv7bqTx/

Revision history for this message
Adam Dyess (addyess) wrote :
Changed in charm-containerd:
importance: Undecided → High
status: New → In Progress
assignee: nobody → Adam Dyess (addyess)
milestone: none → 1.24+ck1
Adam Dyess (addyess)
Changed in charm-containerd:
status: In Progress → Fix Committed
Adam Dyess (addyess)
tags: added: backport-needed
Adam Dyess (addyess)
tags: removed: backport-needed
Adam Dyess (addyess)
Changed in charm-containerd:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.