ubuntu-drivers install the wrong version of Nvidia driver on some machines

Bug #2044973 reported by Kevin Yeh
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ubuntu-drivers-common (Ubuntu)
Confirmed
Undecided
Alberto Milone

Bug Description

During the SRU testing, I found some of machines installed the wrong version of Nvidia driver.

They shouldn't install nvidia-driver-535-server-open on a desktop.

Please have a look, thanks.

If you want to have more information, please let me know.

+ _run sudo ubuntu-drivers devices
+ ssh -t -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null ubuntu@10.102.154.51 sudo ubuntu-drivers devices

== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001FB8sv00001028sd00000906bc03sc02i00
vendor : NVIDIA Corporation
model : TU117GLM [Quadro T2000 Mobile / Max-Q]
driver : nvidia-driver-535-server - distro non-free
driver : nvidia-driver-535 - distro non-free
driver : nvidia-driver-418-server - distro non-free
driver : nvidia-driver-525 - distro non-free
driver : nvidia-driver-535-open - distro non-free
driver : nvidia-driver-470 - distro non-free
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-535-server-open - distro non-free recommended
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-525-open - distro non-free
driver : nvidia-driver-525-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin

Tags: cert-sru
Kevin Yeh (kevinyeh)
Changed in ubuntu-drivers-common (Ubuntu):
assignee: nobody → Alberto Milone (albertomilone)
Revision history for this message
Shane Williams (shanew) wrote :

We discovered the same issue with T1000 cards in a bunch of Dell Precision 3660s. We believe the correct choice should be "nvidia-driver-535", but it recommends "nvidia-driver-535-server-open" instead, which prevents gdm and X from starting. dmesg provides the following error:

[ 231.608701] NVRM objClInitPcieChipset: *** Chipset Setup Function Error!
[ 233.147366] NVRM: Open nvidia.ko is only ready for use on Data Center GPUs.
[ 233.147368] NVRM: To force use of Open nvidia.ko on other GPUs, see the
[ 233.147369] NVRM: 'OpenRmEnableUnsupportedGpus' kernel module parameter described
[ 233.147369] NVRM: in the README.
[ 233.426588] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x0:1904)
[ 233.426882] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 235.365019] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x0:1904)
[ 235.365336] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 237.455019] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x0:1904)
[ 237.455443] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 239.430998] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x0:1904)
[ 239.431436] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 241.403724] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x0:1904)
[ 241.404223] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0
[ 243.384996] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x0:1904)
[ 243.385389] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

Here is the output of lspci and ubuntu-drivers devices:

shanew@prius:~$ sudo lspci -s 01:00.0
01:00.0 VGA compatible controller: NVIDIA Corporation TU117GL [T1000 8GB] (rev a1)

shanew@prius:~$ sudo ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001FF0sv00001028sd00001612bc03sc00i00
vendor : NVIDIA Corporation
driver : nvidia-driver-525-server - distro non-free
driver : nvidia-driver-525 - distro non-free
driver : nvidia-driver-535-open - distro non-free
driver : nvidia-driver-535-server - distro non-free
driver : nvidia-driver-525-open - distro non-free
driver : nvidia-driver-535-server-open - distro non-free recommended
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-470 - distro non-free
driver : nvidia-driver-535 - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ubuntu-drivers-common (Ubuntu):
status: New → Confirmed
Revision history for this message
Francis Ginther (fginther) wrote :

@kevinyew, @shanew, It looks like you are hitting this on focal. Can you please confirm?

If so, can you please retest with the latest ubuntu-drivers from focal-proposed? This should be version 1:0.9.0~0.20.04.8. This appears to fix a bug (discussed in https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-515/+bug/1988836) which caused the -open drivers to be preferred.

Revision history for this message
Kevin Yeh (kevinyeh) wrote :

@Francis

I've confirmed that the DUT has already used the latest ubuntu-drivers from proposed.
Setting up ubuntu-drivers-common (1:0.9.0~0.20.04.8) ...

But it doesn't seem to solve the issue. Please check following messages.

+ _run sudo ubuntu-drivers devices
+ ssh -t -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null ubuntu@10.102.153.239 sudo ubuntu-drivers devices
WARNING:root:_pkg_get_support nvidia-driver-525: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-535: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-535-open: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-525-server: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-535-server-open: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-525-open: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-535-server: package has invalid Support PBheader, cannot determine support level
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00002191sv00001028sd00000949bc03sc00i00
vendor : NVIDIA Corporation
model : TU116M [GeForce GTX 1660 Ti Mobile]
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-525 - distro non-free
driver : nvidia-driver-535 - distro non-free
driver : nvidia-driver-535-open - distro non-free
driver : nvidia-driver-418-server - distro non-free
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-525-server - distro non-free
driver : nvidia-driver-470 - distro non-free
driver : nvidia-driver-535-server-open - distro non-free recommended
driver : nvidia-driver-525-open - distro non-free
driver : nvidia-driver-535-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin

Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 linux-modules-nvidia-535-server-open-generic : Depends: linux-modules-nvidia-535-server-open-5.4.0-173-generic (= 5.4.0-173.191) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

Revision history for this message
Francis Ginther (fginther) wrote :
Download full text (4.8 KiB)

@Kevin,

Thank you for testing again. This latest output indicates something may not be correct in your test environment. This line:

WARNING:root:_pkg_get_support nvidia-driver-525: package has invalid Support PBheader, cannot determine support level

is indicating that the driver packages are providing a "PB" in the support field, but ubuntu-drivers doesn't know how to handle this. Support for PB drivers was added in version 1:0.9.0~0.20.04.7, which is the latest version in focal-updates (see lp:1964747 for details). This indicates this particular test was run with an even older version, but that doesn't make sense if you've also updated to 1:0.9.0~0.20.04.8. I really can only speculate what is going on here.

I did run an experiment on a device attached to testflinger, 202007-28059. This is an HP ZBook Studio G7 with a TU117GLM [Quadro T2000 Mobile / Max-Q]. This appears to match the devices used in the original bug description. I first verified that the problem exists with the version of the package in focal-updates:

ubuntu@hp-zbook-studio-g7-c28059:~$ dpkg -l|grep ubuntu-drivers
ii ubuntu-drivers-common 1:0.9.0~0.20.04.7 amd64 Detect and install additional Ubuntu driver packages
ubuntu@hp-zbook-studio-g7-c28059:~$ sudo ubuntu-drivers devices
ERROR:root:could not open aplay -l
Traceback (most recent call last):
  File "/usr/share/ubuntu-drivers-common/detect/sl-modem.py", line 35, in detect
    aplay = subprocess.Popen(
  File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'aplay'
== /sys/devices/pci0000:00/0000:00:1f.4 ==
modalias : pci:v00008086d000006A3sv0000103Csd00008736bc0Csc05i00
vendor : Intel Corporation
driver : oem-stella.cmit-meowth-meta - distro free

== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001FB8sv0000103Csd00008736bc03sc00i00
vendor : NVIDIA Corporation
model : TU117GLM [Quadro T2000 Mobile / Max-Q]
driver : nvidia-driver-535-server - distro non-free
driver : nvidia-driver-535-server-open - distro non-free recommended
driver : nvidia-driver-535 - distro non-free
driver : nvidia-driver-525-open - distro non-free
driver : nvidia-driver-535-open - distro non-free
driver : nvidia-driver-450-server - distro non-free
driver : nvidia-driver-470 - distro non-free
driver : nvidia-driver-418-server - distro non-free
driver : nvidia-driver-525 - distro non-free
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-525-server - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin

Then updated to the package from proposed:

ubuntu@hp-zbook-studio-g7-c28059:~$ dpkg -l|grep ubuntu-drivers
ii ubuntu-drivers-common 1:0.9.0~0.20.04.8 amd64 Detect and install additional Ubuntu driver packages
ubuntu@hp-zbook-studio-g7-c28059:~$ sudo ubuntu-drivers devices
ERROR:root:could not open a...

Read more...

Revision history for this message
Shane Williams (shanew) wrote :

Finally had a chance to update today, and ubuntu-drivers-common 1:0.9.0~0.20.04.8 does appear to fix the issue for us.

01:00.0 VGA compatible controller: NVIDIA Corporation TU117GL [T1000 8GB] (rev a1) (prog-if 00 [VGA controller])

shanew@prius:~$ ubuntu-drivers devices
== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
modalias : pci:v000010DEd00001FF0sv00001028sd00001612bc03sc00i00
vendor : NVIDIA Corporation
driver : nvidia-driver-535-open - distro non-free
driver : nvidia-driver-535-server - distro non-free
driver : nvidia-driver-535-server-open - distro non-free
driver : nvidia-driver-535 - distro non-free recommended
driver : nvidia-driver-525-server - distro non-free
driver : nvidia-driver-525-open - distro non-free
driver : nvidia-driver-470 - distro non-free
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-525 - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.