systemd-udevd busyloops when nvidia kernel module fails to load

Bug #1655584 reported by Lauri Tirkkonen
84
This bug affects 18 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

On a machine where nvidia-367 is installed, but the driver won't attach (due to
the nvidia card being older and requiring legacy drivers), systemd-udevd
repeatedly attempts to load it anyway, causing high CPU and memory usage. I
observed the systemd-udevd process at 98% CPU usage and 2.5G RSS reported by
top(1); restarting the service reduces resource usage momentarily, but it
appears to be climbing back up (I suspect the number of events is growing).

Workaround is of course installing the correct driver package, but udevd really
ought not behave like this if something fails to load.

Some journal entries for systemd-udevd.service:

    Jan 11 11:22:11 systemd-udevd[11651]: Process '/sbin/modprobe nvidia-drm' failed with exit code 1.
    Jan 11 11:22:11 systemd-udevd[11651]: Process '/sbin/modprobe nvidia-uvm' failed with exit code 1.
    Jan 11 11:22:11 systemd-udevd[11651]: Process '/usr/bin/nvidia-smi' failed with exit code 9.
    Jan 11 11:22:11 systemd-udevd[11651]: Process '/sbin/modprobe nvidia-modeset' failed with exit code 1.
    Jan 11 11:22:11 systemd-udevd[11651]: Process '/sbin/modprobe nvidia-drm' failed with exit code 1.
    Jan 11 11:22:11 systemd-udevd[11651]: Process '/sbin/modprobe nvidia-uvm' failed with exit code 1.
    Jan 11 11:22:12 systemd-udevd[11651]: Process '/usr/bin/nvidia-smi' failed with exit code 9.
    Jan 11 11:22:12 systemd-udevd[11651]: Process '/sbin/modprobe nvidia-modeset' failed with exit code 1.
    Jan 11 11:22:12 systemd-udevd[11651]: Process '/sbin/modprobe nvidia-drm' failed with exit code 1.
    Jan 11 11:22:12 systemd-udevd[11651]: Process '/sbin/modprobe nvidia-uvm' failed with exit code 1.

dmesg shows repeated failures for loading the nvidia module:

    [1675406.587926] NVRM: The NVIDIA GeForce 9300 GE GPU installed in this system is
       NVRM: supported through the NVIDIA 340.xx Legacy drivers. Please
       NVRM: visit http://www.nvidia.com/object/unix.html for more
       NVRM: information. The 367.57 NVIDIA driver will ignore
       NVRM: this GPU. Continuing probe...
    [1675406.587936] NVRM: No NVIDIA graphics adapter found!
    [1675406.588078] NVRM: NVIDIA init module failed!

Release information:

    # lsb_release -rd
    Description: Ubuntu 16.04.1 LTS
    Release: 16.04
    # apt policy systemd udev
    systemd:
      Installed: 229-4ubuntu13
      Candidate: 229-4ubuntu13
      Version table:
     *** 229-4ubuntu13 500
     500 http://ftp.funet.fi/pub/Linux/mirrors/ubuntu/archive xenial-updates/main amd64 Packages
     100 /var/lib/dpkg/status
  229-4ubuntu10 500
     500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
  229-4ubuntu4 500
     500 http://ftp.funet.fi/pub/Linux/mirrors/ubuntu/archive xenial/main amd64 Packages
    udev:
      Installed: 229-4ubuntu13
      Candidate: 229-4ubuntu13
      Version table:
     *** 229-4ubuntu13 500
     500 http://ftp.funet.fi/pub/Linux/mirrors/ubuntu/archive xenial-updates/main amd64 Packages
     100 /var/lib/dpkg/status
  229-4ubuntu10 500
     500 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages
  229-4ubuntu4 500
     500 http://ftp.funet.fi/pub/Linux/mirrors/ubuntu/archive xenial/main amd64 Packages

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Revision history for this message
Ross Boylan (rossboylan) wrote :

I tried
systemctl stop nvidia-persistenced
nvidia-persistenced --no-persistence-mode
nvidia-smi --persistence-mode=Disabled
but none of them helped. nvidia-smi failed because it couldn't communicate with the nvidia driver.

The system was created using a raw disk in a virtual machine (VirtualBox). I then booted off the disk so it was running for real, and accepted the offered nvidia drivers. Finally, I shutdown and restarted in the VM. That is the context for the loop. The VM has no nvidia hardware, and so obviously there is no reason for the driver to load.

Ross

Revision history for this message
Ross Boylan (rossboylan) wrote :

I was able to stop the loop by blacklisting the nvidia modules and restarting the system.
I added /etc/modprobe.d/nvidia-kill.conf, deliberately chose to appear after the existing nvivida conf files in the same directory (not sure if that's essential):

blacklist nvidia_367
blacklist nvidia_367_uvm
blacklist nvidia_367_modeset
blacklist nvidia_367_drm

alias nvidia off
alias nvidia-uvm off
alias nvidia-modeset off
alias nvidia-drm off

This was all guesswork; I don't know how much is essential.

Revision history for this message
João Vitor Sell (joaovictor-joi) wrote :

Affect me too

Revision history for this message
Andrea Bocci (fwyzard) wrote :

Something similar happens on a laptop with both an integrated Intel card, and a discrete Nvidia card: systemd-udev keeps loading and unloading the nvidia modules.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

How did the nvidia package get installed? By default nvidia drivers are not installed, and only pulled in when compatible. Does the metadata on the package claim that it is compatible with your nvidia card - when in fact, it is not? E.g. if you use ubuntu-desktop, and remove the package, is it still offered by additional drivers / ubuntu-drivers?

In other words, are the Modaliases listed on the nvidia-driver-390 package match the modalias of your card?

Revision history for this message
Chris Good (chris-good) wrote :

I've got the same problem. nvidia-384 was installed (using Software Updater, Settings, Additional Drivers) while booting the partition native (mbr). Natively, all works fine using nvidia-384 but this problem occurs when running in VirtualBox. I had to install nvidia-384 because I could not get a high resolution when booting native using nouveau drivers.

Revision history for this message
Chris Good (chris-good) wrote :

In case it is not clear, I am using the same VirtualBox raw disk facility as Ross Boylan to use the raw disk partition as a virtual disk.

Revision history for this message
Lyubomir (mystiquewolf) wrote :

I might be experiencing this problem too. I use KVM/QEMU and load vfio-pci driver in place of NVidia driver. nvidia-driver-455 here.

Revision history for this message
Dan Streetman (ddstreet) wrote :

please reopen if this is still an issue

Changed in systemd (Ubuntu):
status: Confirmed → Won't Fix
Revision history for this message
grimmy70 (grimmy70) wrote :

Happens in Linux Mint 20.2 (nvidia driver 470, kernel 5.11.0).

Repeated message :

systemd-udevd[482]: nvidia: Process '/sbin/modprobe nvidia-modeset' failed with exit code 1.
kernel: [ 13.707771] nvidia-nvlink: Nvlink Core is being initialized, major device number 511
kernel: [ 13.707775] NVRM: request_mem_region failed for 0M @ 0x0. This can
kernel: [ 13.707775] NVRM: occur when a driver such as rivatv is loaded and claims
kernel: [ 13.707775] NVRM: ownership of the device's registers.
kernel: [ 13.708267] nvidia: probe of 0000:01:00.0 failed with error -1
kernel: [ 13.708279] NVRM: The NVIDIA probe routine failed for 1 device(s).
kernel: [ 13.708280] NVRM: None of the NVIDIA devices were initialized.
kernel: [ 13.708468] nvidia-nvlink: Unregistered the Nvlink Core, major device number 511

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.