nvidia driver 515 fails to boot on kernel 6.2

Bug #2012908 reported by Simon Chopin
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux-signed (Ubuntu)
Confirmed
Undecided
Unassigned
nvidia-graphics-drivers-515 (Ubuntu)
In Progress
Undecided
Unassigned

Bug Description

I just updated my Lunar install, which brought me the 6.2 kernel, and it failed to start, stalling after enumerating my USB devices, until something times out with a message saying that the udev event queue failed to be drained.

When attempting to move on to the normal graphics boot from rescue mode once the timeout is hit, I simply get a black screen.

I'm blaming this on the nvidia driver because of this appearing in the log:

mars 27 10:33:54 gandalf kernel: INFO: task systemd-udevd:304 blocked for more than 120 seconds.
mars 27 10:33:54 gandalf kernel: Tainted: P OE 6.2.0-18-generic #18-Ubuntu
mars 27 10:33:54 gandalf kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mars 27 10:33:54 gandalf kernel: task:systemd-udevd state:D stack:0 pid:304 ppid:258 flags:0x00004006
mars 27 10:33:54 gandalf kernel: Call Trace:
mars 27 10:33:54 gandalf kernel: <TASK>
mars 27 10:33:54 gandalf kernel: __schedule+0x2aa/0x610
mars 27 10:33:54 gandalf kernel: schedule+0x63/0x110
mars 27 10:33:54 gandalf kernel: schedule_preempt_disabled+0x15/0x30
mars 27 10:33:54 gandalf kernel: __mutex_lock.constprop.0+0x3f8/0x7a0
mars 27 10:33:54 gandalf kernel: ? __kmem_cache_alloc_node+0x19d/0x340
mars 27 10:33:54 gandalf kernel: ? nv_drm_calloc+0x1e/0x40 [nvidia_drm]
mars 27 10:33:54 gandalf kernel: __mutex_lock_slowpath+0x13/0x20
mars 27 10:33:54 gandalf kernel: mutex_lock+0x3c/0x50
mars 27 10:33:54 gandalf kernel: __nv_drm_connector_detect_internal+0x15c/0x2f0 [nvidia_drm]
mars 27 10:33:54 gandalf kernel: nv_drm_connector_detect+0xe/0x20 [nvidia_drm]
mars 27 10:33:54 gandalf kernel: drm_helper_probe_detect_ctx+0xa3/0x120 [drm_kms_helper]
mars 27 10:33:54 gandalf kernel: check_connector_changed+0x52/0x200 [drm_kms_helper]
mars 27 10:33:54 gandalf kernel: drm_helper_hpd_irq_event+0xbc/0x170 [drm_kms_helper]
mars 27 10:33:54 gandalf kernel: nv_drm_load+0x2e7/0x480 [nvidia_drm]
mars 27 10:33:54 gandalf kernel: ? __pfx_nv_drm_event_callback+0x10/0x10 [nvidia_drm]
mars 27 10:33:54 gandalf kernel: drm_dev_register+0x10e/0x250 [drm]
mars 27 10:33:54 gandalf kernel: nv_drm_probe_devices+0x111/0x200 [nvidia_drm]
mars 27 10:33:54 gandalf kernel: ? __pfx_init_module+0x10/0x10 [nvidia_drm]
mars 27 10:33:54 gandalf kernel: nv_drm_init+0x1e/0x60 [nvidia_drm]
mars 27 10:33:54 gandalf kernel: nv_linux_drm_init+0xe/0xff0 [nvidia_drm]
mars 27 10:33:54 gandalf kernel: do_one_initcall+0x5e/0x250
mars 27 10:33:54 gandalf kernel: do_init_module+0x7b/0x260
mars 27 10:33:54 gandalf kernel: load_module+0xc76/0xd60
mars 27 10:33:54 gandalf kernel: ? kernel_read_file+0x2a4/0x320
mars 27 10:33:54 gandalf kernel: __do_sys_finit_module+0xc4/0x140
mars 27 10:33:54 gandalf kernel: ? __do_sys_finit_module+0xc4/0x140
mars 27 10:33:54 gandalf kernel: __x64_sys_finit_module+0x18/0x30
mars 27 10:33:54 gandalf kernel: do_syscall_64+0x5b/0x90
mars 27 10:33:54 gandalf kernel: ? ksys_mmap_pgoff+0x120/0x260
mars 27 10:33:54 gandalf kernel: ? exit_to_user_mode_prepare+0x30/0xb0
mars 27 10:33:54 gandalf kernel: ? exit_to_user_mode_prepare+0x30/0xb0
mars 27 10:33:54 gandalf kernel: ? syscall_exit_to_user_mode+0x29/0x50
mars 27 10:33:54 gandalf kernel: ? do_syscall_64+0x67/0x90
mars 27 10:33:54 gandalf kernel: ? do_syscall_64+0x67/0x90
mars 27 10:33:54 gandalf kernel: ? exit_to_user_mode_prepare+0x30/0xb0
mars 27 10:33:54 gandalf kernel: ? syscall_exit_to_user_mode+0x29/0x50
mars 27 10:33:54 gandalf kernel: ? do_syscall_64+0x67/0x90
mars 27 10:33:54 gandalf kernel: ? syscall_exit_to_user_mode+0x29/0x50
mars 27 10:33:54 gandalf kernel: ? do_syscall_64+0x67/0x90
mars 27 10:33:54 gandalf kernel: ? do_syscall_64+0x67/0x90
mars 27 10:33:54 gandalf kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc
mars 27 10:33:54 gandalf kernel: RIP: 0033:0x7fd3dc85d89d
mars 27 10:33:54 gandalf kernel: RSP: 002b:00007ffc801034a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
mars 27 10:33:54 gandalf kernel: RAX: ffffffffffffffda RBX: 000055f373907c60 RCX: 00007fd3dc85d89d
mars 27 10:33:54 gandalf kernel: RDX: 0000000000000000 RSI: 000055f373889af0 RDI: 0000000000000012
mars 27 10:33:54 gandalf kernel: RBP: 000055f373889af0 R08: 0000000000000000 R09: 00007ffc801035d0
mars 27 10:33:54 gandalf kernel: R10: 0000000000000012 R11: 0000000000000246 R12: 0000000000020000
mars 27 10:33:54 gandalf kernel: R13: 000055f37389c120 R14: 0000000000000000 R15: 000055f373909080
mars 27 10:33:54 gandalf kernel: </TASK>

Also, the last entry before I had to do a hard shutdown was this:

mars 27 10:34:54 gandalf systemd-udevd[1023]: nvidia: Spawned process '/sbin/modprobe nvidia-drm' [1124] is taking longer than 59s to complete

I'm attaching the kernel logs up to the beginning of userspace logs. I can provide more if necessary.

My driver package:
ii nvidia-driver-515 515.86.01-0ubuntu3 amd64

My graphics card:
0b:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3080 Ti] (rev a1)
---
ProblemType: Bug
ApportVersion: 2.26.0-0ubuntu2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: schopin 12971 F.... wireplumber
 /dev/snd/controlC0: schopin 12971 F.... wireplumber
 /dev/snd/controlC2: schopin 12971 F.... wireplumber
 /dev/snd/seq: schopin 12960 F.... pipewire
CRDA: N/A
CasperMD5CheckResult: pass
CurrentDesktop: GNOME
DistroRelease: Ubuntu 23.04
InstallationDate: Installed on 2021-11-03 (508 days ago)
InstallationMedia: Ubuntu 21.10 "Impish Indri" - Release amd64 (20211012)
MachineType: ASUS System Product Name
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair nvidia_modeset nvidia
Package: nvidia-graphics-drivers-515
ProcFB: 0 EFI VGA
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.1.0-16-generic root=/dev/mapper/vgubuntu-root ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 6.1.0-16.16-generic 6.1.6
RelatedPackageVersions:
 linux-restricted-modules-6.1.0-16-generic N/A
 linux-backports-modules-6.1.0-16-generic N/A
 linux-firmware 20230323.gitbcdcfbcf-0ubuntu1
Tags: lunar
Uname: Linux 6.1.0-16-generic x86_64
UpgradeStatus: Upgraded to lunar on 2023-03-13 (13 days ago)
UserGroups: adm cdrom dip docker kvm libvirt lpadmin lxd plugdev sambashare sbuild sudo
_MarkForUpload: True
dmi.bios.date: 07/30/2021
dmi.bios.release: 5.17
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 3801
dmi.board.asset.tag: Default string
dmi.board.name: ROG CROSSHAIR VIII DARK HERO
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3801:bd07/30/2021:br5.17:svnASUS:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnROGCROSSHAIRVIIIDARKHERO:rvrRevX.0x:cvnDefaultstring:ct3:cvrDefaultstring:skuSKU:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: System Product Name
dmi.product.sku: SKU
dmi.product.version: System Version
dmi.sys.vendor: ASUS

Revision history for this message
Simon Chopin (schopin) wrote :
tags: added: apport-collected lunar
description: updated
Revision history for this message
Simon Chopin (schopin) wrote : AlsaInfo.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote : IwConfig.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote : Lspci.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote : Lspci-vt.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote : Lsusb.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote : Lsusb-t.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote : Lsusb-v.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote : ProcEnviron.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote : ProcModules.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote : RfKill.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote : UdevDb.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote : WifiSyslog.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote : acpidump.txt

apport information

Revision history for this message
Simon Chopin (schopin) wrote :

The issue doesn't show when using the 525 driver.

Revision history for this message
Keeley Hoek (khoek) wrote :

I have exactly the same problem (this is on a Gigabyte Aero 15 YD 11th gen laptop), and had the same fix of migrating from v515 to v525. Annoyingly Software&Updates -> Additional Drivers is totally broken for me, too---when I select the driver and click apply something happens and a progress bar goes across the screen, and then I just get an empty error box (no title or actual message though). If someone could tell me how to extract logs from that so I could make a new issue, I'd be happy to oblige.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-signed (Ubuntu):
status: New → Confirmed
Changed in nvidia-graphics-drivers-515 (Ubuntu):
status: New → Confirmed
Revision history for this message
Keeley Hoek (khoek) wrote :

(Extra sad is that external monitors are broken on the ~3000 Series Laptop GPUs with v525, see https://github.com/NVIDIA/open-gpu-kernel-modules/issues/419. So 23.04 might cause some problems upon release, until a package for the fixed v530 comes out.)

Paolo Pisati (p-pisati)
Changed in nvidia-graphics-drivers-515 (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
Paolo Pisati (p-pisati) wrote :

New drivers are available in proposed (e.g. 515.105.01-0ubuntu1), can you test them and and report your results?

Thanks!

Revision history for this message
Keeley Hoek (khoek) wrote (last edit ):

After updating the machine doesn't hang anymore.

(Though alas, the new version of 515 seems to bork external displays on my machine too... :( For anyone else with this/related problems, I think the NVIDIA driver `nvidia-driver-530` package from the graphics drivers PPA is your best bet.)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.