"couldn't communicate with the NVIDIA driver" when installing open dkms and LRM drivers concurrently

Bug #2023042 reported by Francis Ginther
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-525 (Ubuntu)
Invalid
Low
Unassigned

Bug Description

Installing "nvidia-driver-525-open" followed by "nvidia-headless-no-dkms-525 linux-modules-nvidia-525-gcp nvidia-utils-525" led to a system which complained about a "Driver/library version mismatch". Specifically what was done is:

Deploy a clean google VM with:

gcloud compute instances create fginther-kinetic-gpgpu-525 --image-project ubuntu-os-cloud --image-family ubuntu-2210-amd64 --machine-type n1-standard-4 --boot-disk-size=32GB --accelerator type=nvidia-tesla-t4,count=1 --maintenance-policy TERMINATE --restart-on-failure

Enable kinetic-proposed (this was done with the 525.116.04-0ubuntu0.22.10.1 driver package).

Install the 525-open driver first:

apt-get install -y nvidia-driver-525-open

Then install the proprietary driver:

apt-get install nvidia-headless-no-dkms-525 linux-modules-nvidia-525-gcp nvidia-utils-525

After rebooting, "nvidia-smi" complained of the driver/library mismatch:

ubuntu@fginther-kinetic-gpgpu-525:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

The /var/log/apt/history.log is attached which details the packages installed and removed.

Tags: kinetic
Revision history for this message
Francis Ginther (fginther) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Sounds like low priority because I get the feeling that the deb packages just need to declare more conflicts.

Changed in nvidia-graphics-drivers-525 (Ubuntu):
importance: Undecided → Low
tags: added: kinetic
Revision history for this message
Francis Ginther (fginther) wrote :
Revision history for this message
Francis Ginther (fginther) wrote :
Revision history for this message
Francis Ginther (fginther) wrote :

I've created some test automation for this and have attached two logs:
* kinetic-525-open-to-lrm.txt
* lunar-525-open-to-lrm.txt

The failure occurs when going from the open dkms packages to the LRM packages. Going from the open dkms packages to the proprietary dkms packages is working as expected.

I accidentally mixed in using the proposed pocket when filing this bug. With using just the release and updates pockets, it still fails, but with a different error message (will change this in the title and description).

summary: - "Driver/library version mismatch" when installing open and proprietary
- drivers concurrently
+ "couldn't communicate with the NVIDIA driver" when installing open dkms
+ and LRM drivers concurrently
description: updated
Revision history for this message
Alberto Milone (albertomilone) wrote :

Everything looks correct to me, from the package installation point of view. Perhaps the module is not being loaded the second time around. Was the first module ever unloaded?

I would like to see the output of "lsmod" and of "sudo modinfo nvidia-525" after every driver installation, please.

Revision history for this message
Francis Ginther (fginther) wrote :

I've found a flaw in the test script in which it was installing the wrong LRM modules for the running kernel. It was installing the generic modules for a gcp kernel. Once I corrected this to install the gcp modules, it now passes.

Attached are the logs with the addition of `lsmod` and `modinfo nvidia`

I think this can now be closed as a test error.

Changed in nvidia-graphics-drivers-525 (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.