whole system hangs [kernel: watchdog: BUG: soft lockup - CPU#n stuck]

Bug #2069092 reported by Tomasz Makara
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
New
Undecided
Unassigned

Bug Description

Under "tested" nvidia driver it hangs, when i compiled manually on 23.10 version which is far before newer i never had hard crash. But it was pissing me off since every kernel update had to recompile driver again.
Secondly, on 23.10 i was staying on wayland, now on X11. Not sure what is causing hard crash. Totally dead, before it happens i have slown down for about 1,2 seconds, then dead, no ctrl+alt+f2345 works, only power

ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: xorg 1:7.7+23ubuntu3
ProcVersionSignature: Ubuntu 6.8.0-35.35-generic 6.8.4
Uname: Linux 6.8.0-35-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
.proc.driver.nvidia.capabilities.gpu0: Error: path was not a regular file.
.proc.driver.nvidia.capabilities.mig: Error: path was not a regular file.
.proc.driver.nvidia.gpus.0000.01.00.0: Error: path was not a regular file.
.proc.driver.nvidia.registry: Binary: ""
.proc.driver.nvidia.suspend: suspend hibernate resume
.proc.driver.nvidia.suspend_depth: default modeset uvm
.proc.driver.nvidia.version:
 NVRM version: NVIDIA UNIX x86_64 Kernel Module 535.171.04 Tue Mar 19 20:30:00 UTC 2024
 GCC version:
ApportVersion: 2.28.1-0ubuntu3
Architecture: amd64
BootLog: Error: [Errno 13] Permission denied: '/var/log/boot.log'
CasperMD5CheckResult: unknown
CompositorRunning: None
CurrentDesktop: KDE
Date: Tue Jun 11 21:52:14 2024
DistUpgraded: 2024-05-23 01:17:19,732 DEBUG migrateToDeb822Sources()
DistroCodename: noble
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes
GpuHangFrequency: Several times a week
GpuHangReproducibility: Seems to happen randomly
GpuHangStarted: Since before I upgraded
GraphicsCard:
 NVIDIA Corporation GA107M [GeForce RTX 3050 Mobile] [10de:25a2] (rev a1) (prog-if 00 [VGA controller])
   Subsystem: CLEVO/KAPOK Computer GA107M [GeForce RTX 3050 Mobile] [1558:5e00]
 Advanced Micro Devices, Inc. [AMD/ATI] Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] [1002:1638] (rev c6) (prog-if 00 [VGA controller])
   Subsystem: CLEVO/KAPOK Computer Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] [1558:5e00]
InstallationDate: Installed on 2024-03-16 (87 days ago)
InstallationMedia: Kubuntu 23.10 "Mantic Minotaur" - Release amd64 (20231010)
MachineType: MEDION Crawler E25
ProcEnviron:
 LANG=pl_PL.UTF-8
 LANGUAGE=
 PATH=(custom, no user)
 SHELL=/bin/bash
 XDG_RUNTIME_DIR=<set>
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.8.0-35-generic root=UUID=aa02f2dd-89eb-4b06-a98f-3fe963c633ae ro quiet splash vt.handoff=7
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: Upgraded to noble on 2024-05-23 (20 days ago)
dmi.bios.date: 05/10/2022
dmi.bios.release: 7.5
dmi.bios.vendor: INSYDE Corp.
dmi.bios.version: 1.07.05RME1
dmi.board.asset.tag: Tag 12345
dmi.board.name: NHxxEx
dmi.board.vendor: MEDION
dmi.board.version: 1.0
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: MEDION
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnINSYDECorp.:bvr1.07.05RME1:bd05/10/2022:br7.5:svnMEDION:pnCrawlerE25:pvrNotApplicable:rvnMEDION:rnNHxxEx:rvr1.0:cvnMEDION:ct10:cvrN/A:skuML-21000830031834:
dmi.product.family: Erazer
dmi.product.name: Crawler E25
dmi.product.sku: ML-210008 30031834
dmi.product.version: Not Applicable
dmi.sys.vendor: MEDION
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.120-2build1
version.libgl1-mesa-dri: libgl1-mesa-dri 24.0.5-1ubuntu1
version.libgl1-mesa-glx: libgl1-mesa-glx N/A
version.nvidia-graphics-drivers: nvidia-graphics-drivers-* N/A
version.xserver-xorg-core: xserver-xorg-core 2:21.1.12-1ubuntu1
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:22.0.0-1build1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20210115-1build1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.17-2build1

Revision history for this message
Tomasz Makara (csowiec) wrote :
Revision history for this message
Tomasz Makara (csowiec) wrote :

it happens very randomly, cant record it with second camera, cant trigger that. After several hours of using pc that happens.

summary: - Xorg freeze
+ whole system hangs
Revision history for this message
Daniel van Vugt (vanvugt) wrote : Re: whole system hangs

In case the problem is bug 2062426, please try the workarounds from bug 2060268. If that does not solve it then please follow these steps:

1. Run these commands:
    journalctl -b0 > journal.txt
    journalctl -b-1 > prevjournal.txt
and attach the resulting text files here.

2. Look in /var/crash for crash files and if found run:
    ubuntu-bug YOURFILE.crash
Then tell us the ID of the newly-created bug.

3. If step 2 failed then look at https://errors.ubuntu.com/user/ID where ID is the content of file /var/lib/whoopsie/whoopsie-id on the machine. Do you find any links to recent problems on that page? If so then please send the links to us.

Please take care to avoid attaching .crash files to bugs as we are unable to process them as file attachments. It would also be a security risk for yourself.

affects: xorg (Ubuntu) → ubuntu
Changed in ubuntu:
status: New → Incomplete
Revision history for this message
Tomasz Makara (csowiec) wrote :
Revision history for this message
Tomasz Makara (csowiec) wrote :
Revision history for this message
Tomasz Makara (csowiec) wrote :

2 sent but no id

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The (bottom of the) log in comment #5 shows it's the kernel hanging.

affects: ubuntu → linux (Ubuntu)
Changed in linux (Ubuntu):
status: Incomplete → New
summary: - whole system hangs
+ whole system hangs [kernel: watchdog: BUG: soft lockup - CPU#n stuck]
Revision history for this message
Tomasz Makara (csowiec) wrote :
Revision history for this message
Tomasz Makara (csowiec) wrote :
Revision history for this message
Tomasz Makara (csowiec) wrote :

kernel is more verbose now, backtrace available, im editing video of proof but it takes ages to blur parts of video in kdelive

Revision history for this message
Tomasz Makara (csowiec) wrote :

nvidia

Revision history for this message
Tomasz Makara (csowiec) wrote :

https://youtu.be/sZ2LIwCKcrk im done - video
command prompt stops responding in discord

Revision history for this message
Tomasz Makara (csowiec) wrote :
Download full text (28.4 KiB)

cze 20 01:13:58 Crawler-E25 kernel: ------------[ cut here ]------------
cze 20 01:13:58 Crawler-E25 kernel: UBSAN: array-index-out-of-bounds in build/nvidia/535.171.04/build/nvidia-uvm/uvm_pmm_gpu.c:2364:28
cze 20 01:13:58 Crawler-E25 kernel: index 0 is out of range for type 'uvm_gpu_chunk_t *[*]'
cze 20 01:13:58 Crawler-E25 kernel: CPU: 3 PID: 4296 Comm: d3ddriverquery6 Tainted: P O 6.8.0-35-generic #35-Ubuntu
cze 20 01:13:58 Crawler-E25 kernel: Hardware name: MEDION Crawler E25/NHxxEx, BIOS 1.07.05RME1 05/10/2022
cze 20 01:13:58 Crawler-E25 kernel: Call Trace:
cze 20 01:13:58 Crawler-E25 kernel: <TASK>
cze 20 01:13:58 Crawler-E25 kernel: dump_stack_lvl+0x48/0x70
cze 20 01:13:58 Crawler-E25 kernel: dump_stack+0x10/0x20
cze 20 01:13:58 Crawler-E25 kernel: __ubsan_handle_out_of_bounds+0xc6/0x110
cze 20 01:13:58 Crawler-E25 kernel: split_gpu_chunk+0x13f/0x410 [nvidia_uvm]
cze 20 01:13:58 Crawler-E25 kernel: uvm_pmm_gpu_alloc+0x2da/0x6d0 [nvidia_uvm]
cze 20 01:13:58 Crawler-E25 kernel: phys_mem_allocate+0xac/0x230 [nvidia_uvm]
cze 20 01:13:58 Crawler-E25 kernel: allocate_directory+0xb4/0x130 [nvidia_uvm]
cze 20 01:13:58 Crawler-E25 kernel: ? allocate_directory+0xb4/0x130 [nvidia_uvm]
cze 20 01:13:58 Crawler-E25 kernel: uvm_page_tree_init+0x133/0x450 [nvidia_uvm]
cze 20 01:13:58 Crawler-E25 kernel: uvm_gpu_retain_by_uuid+0x19df/0x2b80 [nvidia_uvm]
cze 20 01:13:58 Crawler-E25 kernel: uvm_va_space_register_gpu+0x47/0x740 [nvidia_uvm]
cze 20 01:13:58 Crawler-E25 kernel: ? srso_alias_return_thunk+0x5/0xfbef5
cze 20 01:13:58 Crawler-E25 kernel: ? __mod_memcg_lruvec_state+0xd6/0x1a0
cze 20 01:13:58 Crawler-E25 kernel: uvm_api_register_gpu+0x5a/0x90 [nvidia_uvm]
cze 20 01:13:58 Crawler-E25 kernel: uvm_ioctl+0x1a26/0x1cd0 [nvidia_uvm]
cze 20 01:13:58 Crawler-E25 kernel: ? srso_alias_return_thunk+0x5/0xfbef5
cze 20 01:13:58 Crawler-E25 kernel: ? next_uptodate_folio+0xa9/0x320
cze 20 01:13:58 Crawler-E25 kernel: ? srso_alias_return_thunk+0x5/0xfbef5
cze 20 01:13:58 Crawler-E25 kernel: ? filemap_map_pages+0x2fe/0x4c0
cze 20 01:13:58 Crawler-E25 kernel: ? srso_alias_return_thunk+0x5/0xfbef5
cze 20 01:13:58 Crawler-E25 kernel: ? _raw_spin_lock_irqsave+0xe/0x20
cze 20 01:13:58 Crawler-E25 kernel: ? srso_alias_return_thunk+0x5/0xfbef5
cze 20 01:13:58 Crawler-E25 kernel: ? thread_context_non_interrupt_add+0x13a/0x250 [nvidia_uvm]
cze 20 01:13:58 Crawler-E25 kernel: uvm_unlocked_ioctl_entry.part.0+0x7b/0xf0 [nvidia_uvm]
cze 20 01:13:58 Crawler-E25 kernel: ? srso_alias_return_thunk+0x5/0xfbef5
cze 20 01:13:58 Crawler-E25 kernel: ? __seccomp_filter+0xf1/0x570
cze 20 01:13:58 Crawler-E25 kernel: uvm_unlocked_ioctl_entry+0x6b/0x90 [nvidia_uvm]
cze 20 01:13:58 Crawler-E25 kernel: __x64_sys_ioctl+0xa3/0xf0
cze 20 01:13:58 Crawler-E25 kernel: x64_sys_call+0x143b/0x25c0
cze 20 01:13:58 Crawler-E25 kernel: do_syscall_64+0x7f/0x180
cze 20 01:13:58 Crawler-E25 kernel: ? srso_alias_return_thunk+0x5/0xfbef5
cze 20 01:13:58 Crawler-E25 kernel: ? do_user_addr_fault+0x338/0x6b0
cze 20 01:13:58 Crawler-E25 kernel: ? srso_alias_return_thunk+0x5/0xfbef5
cze 20 01:13:58 Crawler-E25 kernel: ? irqentry_exit_to_user_mode+0x7b/0x260
cz...

Revision history for this message
Tomasz Makara (csowiec) wrote :

i need someone to update line of ubuntu-drivers and mark 535.171.04 as unstable/untested

Revision history for this message
Tomasz Makara (csowiec) wrote :
Revision history for this message
Tomasz Makara (csowiec) wrote :

535.183.01

Revision history for this message
Tomasz Makara (csowiec) wrote :

duue to no willingness to update driver to mainline, i used straight from nvidia website and problem gone
550.90.07 - closing

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.