Spontaneous NVIDIA panic

Bug #1942271 reported by Kevin McMurtrie
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-470 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

All graphics and USB devices spontaneously halted.
SSH, networking, and filesystems remained working.
Xorg stuck and can not exit even after ABRT signal.
"shutdown" command never completed.

Aug 31 18:17:12 fuel kernel: [ 3494.973431] BUG: unable to handle page fault for address: 0000000000020018
Aug 31 18:17:12 fuel kernel: [ 3494.973436] #PF: supervisor read access in kernel mode
Aug 31 18:17:12 fuel kernel: [ 3494.973438] #PF: error_code(0x0000) - not-present page
Aug 31 18:17:12 fuel kernel: [ 3494.973439] PGD 0 P4D 0
Aug 31 18:17:12 fuel kernel: [ 3494.973442] Oops: 0000 [#1] SMP NOPTI
Aug 31 18:17:12 fuel kernel: [ 3494.973444] CPU: 2 PID: 33911 Comm: chrome Tainted: P OE 5.11.0-31-generic #33-Ubuntu
Aug 31 18:17:12 fuel kernel: [ 3494.973447] Hardware name: Gigabyte Technology Co., Ltd. Default string/X99P-SLI-CF, BIOS F25b 03/13/2018
Aug 31 18:17:12 fuel kernel: [ 3494.973448] RIP: 0010:_nv028963rm+0x35/0x90 [nvidia]
Aug 31 18:17:12 fuel kernel: [ 3494.973792] Code: 57 10 31 c0 48 85 d2 74 2e 48 8b 4f 08 31 c0 48 85 c9 74 0d 48 63 41 14 48 89 d6 48 29 c6 48 89 f0 48 3b 57 18 48 89 07 74 1b <48> 8b 42 08 48 89 47 10 b8 01 00 00 00 48 83 c4 08 c3 66 0f 1f 84
Aug 31 18:17:12 fuel kernel: [ 3494.973795] RSP: 0018:ffffb11742f43bd0 EFLAGS: 00010203
Aug 31 18:17:12 fuel kernel: [ 3494.973797] RAX: 0000000000020010 RBX: ffff9c819c336c30 RCX: ffff9c7e48cec978
Aug 31 18:17:12 fuel kernel: [ 3494.973798] RDX: 0000000000020010 RSI: 0000000000020010 RDI: ffff9c7cf6695d20
Aug 31 18:17:12 fuel kernel: [ 3494.973800] RBP: ffff9c7cf6695d20 R08: 0000000000000020 R09: ffff9c7cf6695d28
Aug 31 18:17:12 fuel kernel: [ 3494.973801] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9c7d3d867738
Aug 31 18:17:12 fuel kernel: [ 3494.973802] R13: ffff9c7e1c027060 R14: ffff9c7cf6695d98 R15: ffff9c819c336c30
Aug 31 18:17:12 fuel kernel: [ 3494.973804] FS: 0000000000000000(0000) GS:ffff9c8bcfc80000(0000) knlGS:0000000000000000
Aug 31 18:17:12 fuel kernel: [ 3494.973806] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 31 18:17:12 fuel kernel: [ 3494.973807] CR2: 0000000000020018 CR3: 00000006dcc10005 CR4: 00000000003706e0
Aug 31 18:17:12 fuel kernel: [ 3494.973809] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 31 18:17:12 fuel kernel: [ 3494.973810] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Aug 31 18:17:12 fuel kernel: [ 3494.973811] Call Trace:
Aug 31 18:17:12 fuel kernel: [ 3494.973814] ? _nv035844rm+0xa8/0xe0 [nvidia]
Aug 31 18:17:12 fuel kernel: [ 3494.974111] ? _nv014655rm+0x2ee/0x770 [nvidia]
Aug 31 18:17:12 fuel kernel: [ 3494.974414] ? _nv037695rm+0xb3/0x150 [nvidia]
Aug 31 18:17:12 fuel kernel: [ 3494.974719] ? _nv037694rm+0x297/0x4e0 [nvidia]
Aug 31 18:17:12 fuel kernel: [ 3494.975018] ? _nv037689rm+0x60/0x70 [nvidia]
Aug 31 18:17:12 fuel kernel: [ 3494.975317] ? _nv037690rm+0x7b/0xb0 [nvidia]
Aug 31 18:17:12 fuel kernel: [ 3494.975618] ? _nv036056rm+0x40/0xe0 [nvidia]
Aug 31 18:17:12 fuel kernel: [ 3494.975840] ? _nv000699rm+0x68/0x80 [nvidia]
Aug 31 18:17:12 fuel kernel: [ 3494.976092] ? rm_cleanup_file_private+0xea/0x160 [nvidia]
Aug 31 18:17:12 fuel kernel: [ 3494.976341] ? fsnotify+0x23c/0x2f0
Aug 31 18:17:12 fuel kernel: [ 3494.976347] ? nvidia_close+0x156/0x320 [nvidia]
Aug 31 18:17:12 fuel kernel: [ 3494.976539] ? nvidia_frontend_close+0x2f/0x50 [nvidia]
Aug 31 18:17:12 fuel kernel: [ 3494.976728] ? __fput+0x9f/0x250
Aug 31 18:17:12 fuel kernel: [ 3494.976731] ? ____fput+0xe/0x10
Aug 31 18:17:12 fuel kernel: [ 3494.976733] ? task_work_run+0x6d/0xa0
Aug 31 18:17:12 fuel kernel: [ 3494.976738] ? do_exit+0x233/0x3e0
Aug 31 18:17:12 fuel kernel: [ 3494.976742] ? do_group_exit+0x3b/0xb0
Aug 31 18:17:12 fuel kernel: [ 3494.976744] ? __x64_sys_exit_group+0x18/0x20
Aug 31 18:17:12 fuel kernel: [ 3494.976747] ? do_syscall_64+0x38/0x90
Aug 31 18:17:12 fuel kernel: [ 3494.976749] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug 31 18:17:12 fuel kernel: [ 3494.976754] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_multiport nft_compat nft_counter nf_tables libcrc32c nfnetlink ccm md4 cmac nls_utf8 cifs libarc4 fscache libdes binfmt_misc snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic zfs(PO) ledtrig_audio snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec x86_pkg_temp_thermal zunicode(PO) intel_powerclamp snd_hda_core zzstd(O) snd_hwdep coretemp soundwire_bus zlua(O) snd_soc_core kvm_intel nls_iso8859_1 zavl(PO) icp(PO) snd_compress ac97_bus snd_pcm_dmaengine snd_pcm kvm snd_seq_midi zcommon(PO) snd_seq_midi_event znvpair(PO) crct10dif_pclmul spl(O) joydev input_leds snd_rawmidi ghash_clmulni_intel snd_seq snd_seq_device aesni_intel snd_timer crypto_simd cryptd glue_helper snd rapl mei_me intel_cstate efi_pstore mxm_wmi mei soundcore intel_wmi_thunderbolt mac_hid nvidia_uvm(POE) sch_fq_codel msr parport_pc ppdev lp parport ip_tables
Aug 31 18:17:12 fuel kernel: [ 3494.976804] x_tables autofs4 hid_generic usbhid hid nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core crc32_pclmul e1000e drm i2c_i801 ahci i2c_smbus lpc_ich thunderbolt libahci nvme xhci_pci nvme_core xhci_pci_renesas wmi
Aug 31 18:17:12 fuel kernel: [ 3494.976823] CR2: 0000000000020018
Aug 31 18:17:12 fuel kernel: [ 3494.976824] ---[ end trace 16a13b81d497db97 ]---
Aug 31 18:17:12 fuel kernel: [ 3494.996770] RIP: 0010:_nv028963rm+0x35/0x90 [nvidia]
Aug 31 18:17:12 fuel kernel: [ 3494.997107] Code: 57 10 31 c0 48 85 d2 74 2e 48 8b 4f 08 31 c0 48 85 c9 74 0d 48 63 41 14 48 89 d6 48 29 c6 48 89 f0 48 3b 57 18 48 89 07 74 1b <48> 8b 42 08 48 89 47 10 b8 01 00 00 00 48 83 c4 08 c3 66 0f 1f 84
Aug 31 18:17:12 fuel kernel: [ 3494.997109] RSP: 0018:ffffb11742f43bd0 EFLAGS: 00010203
Aug 31 18:17:12 fuel kernel: [ 3494.997112] RAX: 0000000000020010 RBX: ffff9c819c336c30 RCX: ffff9c7e48cec978
Aug 31 18:17:12 fuel kernel: [ 3494.997113] RDX: 0000000000020010 RSI: 0000000000020010 RDI: ffff9c7cf6695d20
Aug 31 18:17:12 fuel kernel: [ 3494.997115] RBP: ffff9c7cf6695d20 R08: 0000000000000020 R09: ffff9c7cf6695d28
Aug 31 18:17:12 fuel kernel: [ 3494.997116] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9c7d3d867738
Aug 31 18:17:12 fuel kernel: [ 3494.997117] R13: ffff9c7e1c027060 R14: ffff9c7cf6695d98 R15: ffff9c819c336c30
Aug 31 18:17:12 fuel kernel: [ 3494.997119] FS: 0000000000000000(0000) GS:ffff9c8bcfc80000(0000) knlGS:0000000000000000
Aug 31 18:17:12 fuel kernel: [ 3494.997121] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 31 18:17:12 fuel kernel: [ 3494.997122] CR2: 0000000000020018 CR3: 00000004b3340006 CR4: 00000000003706e0
Aug 31 18:17:12 fuel kernel: [ 3494.997124] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 31 18:17:12 fuel kernel: [ 3494.997125] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Aug 31 18:17:12 fuel kernel: [ 3494.997127] Fixing recursive fault but reboot is needed!
Aug 31 18:17:17 fuel kernel: [ 3499.968653] BUG: kernel NULL pointer dereference, address: 0000000000000009
Aug 31 18:17:17 fuel kernel: [ 3499.968663] #PF: supervisor read access in kernel mode
Aug 31 18:17:17 fuel kernel: [ 3499.968667] #PF: error_code(0x0000) - not-present page
Aug 31 18:17:17 fuel kernel: [ 3499.968670] PGD 0 P4D 0
Aug 31 18:17:17 fuel kernel: [ 3499.968677] Oops: 0000 [#2] SMP NOPTI
Aug 31 18:17:17 fuel kernel: [ 3499.968682] CPU: 11 PID: 3575 Comm: Xorg Tainted: P D OE 5.11.0-31-generic #33-Ubuntu
Aug 31 18:17:17 fuel kernel: [ 3499.968688] Hardware name: Gigabyte Technology Co., Ltd. Default string/X99P-SLI-CF, BIOS F25b 03/13/2018
Aug 31 18:17:17 fuel kernel: [ 3499.968691] RIP: 0010:_nv010150rm+0x3c/0x340 [nvidia]
Aug 31 18:17:17 fuel kernel: [ 3499.969503] Code: 07 0f 1f 44 00 00 31 d2 48 8b 07 48 85 c0 75 1a e9 a1 02 00 00 66 0f 1f 84 00 00 00 00 00 48 8b 48 10 48 85 c9 74 17 48 89 c8 <48> 39 30 77 ef 0f 83 29 02 00 00 48 8b 48 18 48 85 c9 75 e9 48 89
Aug 31 18:17:17 fuel kernel: [ 3499.969509] RSP: 0018:ffffb11741adfce8 EFLAGS: 00010006
Aug 31 18:17:17 fuel kernel: [ 3499.969514] RAX: 0000000000000009 RBX: ffff9c7db0945ee8 RCX: 0000000000000009
Aug 31 18:17:17 fuel kernel: [ 3499.969517] RDX: ffff9c7db0945f38 RSI: 0000000000000df7 RDI: ffffffffc25a7498
Aug 31 18:17:17 fuel kernel: [ 3499.969521] RBP: ffff9c7db0945ed0 R08: ffffb11741adfe44 R09: ffff9c7db0945ee8
Aug 31 18:17:17 fuel kernel: [ 3499.969524] R10: ffffffffc067ce80 R11: 0000000000000000 R12: ffffffffc067ceb5
Aug 31 18:17:17 fuel kernel: [ 3499.969527] R13: 0000000000000004 R14: ffffffffc25a87a0 R15: 0000000000010008
Aug 31 18:17:17 fuel kernel: [ 3499.969531] FS: 00007f7d9219fa40(0000) GS:ffff9c8bcfec0000(0000) knlGS:0000000000000000
Aug 31 18:17:17 fuel kernel: [ 3499.969536] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 31 18:17:17 fuel kernel: [ 3499.969540] CR2: 0000000000000009 CR3: 0000000171e5e003 CR4: 00000000003706e0
Aug 31 18:17:17 fuel kernel: [ 3499.969544] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 31 18:17:17 fuel kernel: [ 3499.969547] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Aug 31 18:17:17 fuel kernel: [ 3499.969550] Call Trace:
Aug 31 18:17:17 fuel kernel: [ 3499.969554] ? _nv039714rm+0xb0/0x1a0 [nvidia]
Aug 31 18:17:17 fuel kernel: [ 3499.969946] ? _nv036047rm+0x35/0xb0 [nvidia]
Aug 31 18:17:17 fuel kernel: [ 3499.970456] ? _nv000724kms+0xe0/0xe0 [nvidia_modeset]
Aug 31 18:17:17 fuel kernel: [ 3499.970494] ? _nv010249rm+0x52/0xa0 [nvidia]
Aug 31 18:17:17 fuel kernel: [ 3499.971002] ? _nv010248rm+0x46/0x50 [nvidia]
Aug 31 18:17:17 fuel kernel: [ 3499.971510] ? _nv010248rm+0x2f/0x50 [nvidia]
Aug 31 18:17:17 fuel kernel: [ 3499.972017] ? rm_kernel_rmapi_op+0x159/0x1b0 [nvidia]
Aug 31 18:17:17 fuel kernel: [ 3499.972657] ? nvidia_modeset_rm_ops_alloc_stack+0x1e/0x50 [nvidia]
Aug 31 18:17:17 fuel kernel: [ 3499.973037] ? nvkms_call_rm+0x50/0x80 [nvidia_modeset]
Aug 31 18:17:17 fuel kernel: [ 3499.973070] ? _nv002512kms+0x51/0x60 [nvidia_modeset]
Aug 31 18:17:17 fuel kernel: [ 3499.973122] ? nvkms_copyin+0x39/0x60 [nvidia_modeset]
Aug 31 18:17:17 fuel kernel: [ 3499.973153] ? _nv000724kms+0xc9/0xe0 [nvidia_modeset]
Aug 31 18:17:17 fuel kernel: [ 3499.973185] ? nvKmsIoctl+0x96/0x1d0 [nvidia_modeset]
Aug 31 18:17:17 fuel kernel: [ 3499.973217] ? nvkms_ioctl+0x107/0x180 [nvidia_modeset]
Aug 31 18:17:17 fuel kernel: [ 3499.973248] ? nvidia_frontend_unlocked_ioctl+0x3b/0x50 [nvidia]
Aug 31 18:17:17 fuel kernel: [ 3499.973629] ? __x64_sys_ioctl+0x91/0xc0
Aug 31 18:17:17 fuel kernel: [ 3499.973639] ? do_syscall_64+0x38/0x90
Aug 31 18:17:17 fuel kernel: [ 3499.973644] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug 31 18:17:17 fuel kernel: [ 3499.973655] Modules linked in: ipt_REJECT nf_reject_ipv4 xt_multiport nft_compat nft_counter nf_tables libcrc32c nfnetlink ccm md4 cmac nls_utf8 cifs libarc4 fscache libdes binfmt_misc snd_hda_codec_hdmi intel_rapl_msr intel_rapl_common snd_hda_codec_realtek snd_hda_codec_generic zfs(PO) ledtrig_audio snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec x86_pkg_temp_thermal zunicode(PO) intel_powerclamp snd_hda_core zzstd(O) snd_hwdep coretemp soundwire_bus zlua(O) snd_soc_core kvm_intel nls_iso8859_1 zavl(PO) icp(PO) snd_compress ac97_bus snd_pcm_dmaengine snd_pcm kvm snd_seq_midi zcommon(PO) snd_seq_midi_event znvpair(PO) crct10dif_pclmul spl(O) joydev input_leds snd_rawmidi ghash_clmulni_intel snd_seq snd_seq_device aesni_intel snd_timer crypto_simd cryptd glue_helper snd rapl mei_me intel_cstate efi_pstore mxm_wmi mei soundcore intel_wmi_thunderbolt mac_hid nvidia_uvm(POE) sch_fq_codel msr parport_pc ppdev lp parport ip_tables
Aug 31 18:17:17 fuel kernel: [ 3499.973768] x_tables autofs4 hid_generic usbhid hid nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core crc32_pclmul e1000e drm i2c_i801 ahci i2c_smbus lpc_ich thunderbolt libahci nvme xhci_pci nvme_core xhci_pci_renesas wmi
Aug 31 18:17:17 fuel kernel: [ 3499.973810] CR2: 0000000000000009
Aug 31 18:17:17 fuel kernel: [ 3499.973814] ---[ end trace 16a13b81d497db98 ]---
Aug 31 18:17:17 fuel kernel: [ 3500.022556] RIP: 0010:_nv028963rm+0x35/0x90 [nvidia]
Aug 31 18:17:17 fuel kernel: [ 3500.023346] Code: 57 10 31 c0 48 85 d2 74 2e 48 8b 4f 08 31 c0 48 85 c9 74 0d 48 63 41 14 48 89 d6 48 29 c6 48 89 f0 48 3b 57 18 48 89 07 74 1b <48> 8b 42 08 48 89 47 10 b8 01 00 00 00 48 83 c4 08 c3 66 0f 1f 84
Aug 31 18:17:17 fuel kernel: [ 3500.023351] RSP: 0018:ffffb11742f43bd0 EFLAGS: 00010203
Aug 31 18:17:17 fuel kernel: [ 3500.023357] RAX: 0000000000020010 RBX: ffff9c819c336c30 RCX: ffff9c7e48cec978
Aug 31 18:17:17 fuel kernel: [ 3500.023361] RDX: 0000000000020010 RSI: 0000000000020010 RDI: ffff9c7cf6695d20
Aug 31 18:17:17 fuel kernel: [ 3500.023364] RBP: ffff9c7cf6695d20 R08: 0000000000000020 R09: ffff9c7cf6695d28
Aug 31 18:17:17 fuel kernel: [ 3500.023368] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9c7d3d867738
Aug 31 18:17:17 fuel kernel: [ 3500.023371] R13: ffff9c7e1c027060 R14: ffff9c7cf6695d98 R15: ffff9c819c336c30
Aug 31 18:17:17 fuel kernel: [ 3500.023374] FS: 00007f7d9219fa40(0000) GS:ffff9c8bcfec0000(0000) knlGS:0000000000000000
Aug 31 18:17:17 fuel kernel: [ 3500.023379] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 31 18:17:17 fuel kernel: [ 3500.023383] CR2: 0000000000000009 CR3: 0000000171e5e003 CR4: 00000000003706e0
Aug 31 18:17:17 fuel kernel: [ 3500.023386] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 31 18:17:17 fuel kernel: [ 3500.023389] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

ProblemType: Bug
DistroRelease: Ubuntu 21.04
Package: nvidia-driver-470 470.57.02-0ubuntu0.21.04.1
ProcVersionSignature: Ubuntu 5.11.0-31.33-generic 5.11.22
Uname: Linux 5.11.0-31-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu65.1
Architecture: amd64
CasperMD5CheckResult: unknown
CurrentDesktop: X-Cinnamon
Date: Tue Aug 31 18:29:27 2021
InstallationDate: Installed on 2018-10-28 (1038 days ago)
InstallationMedia: Ubuntu 18.04.1 LTS "Bionic Beaver" - Release amd64 (20180725)
SourcePackage: nvidia-graphics-drivers-470
UpgradeStatus: Upgraded to hirsute on 2021-06-21 (71 days ago)

Revision history for this message
Kevin McMurtrie (kevinmcmurtrie) wrote :
Revision history for this message
JK (m0d) wrote :

This happens to me too, but with Ubuntu 20.04. It started shortly after the latest nvidia-graphics-driver-470 update. Symptoms are:

* graphical output freezes completely
* no reaction to any kind of input (no switching to tty either)
* ssh works and shows Xorg as using 100% CPU
* Xorg process can't be killed
* shutdown command is not executed (only ssh connection is terminated)

My GPU: GTX970. I've never had this problem before, although my PC is has not been modified for years. Now it happens multiple times per day!

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvidia-graphics-drivers-470 (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.