Graphics Crashes Frequently Making Machine Unusable
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
The issue is very reproducible. It happened within a minute or two of using my machine after updating from Lunar to Mantic. I tried upgrading to Noble to see if a newer kernel might have the fix - it didn't. My machine is unusable and I'm happy to repro if anyone has the fix
Sometimes (such as in the case of the logs grabbed from this report), the GPU is able to reset and I just have to restart my Gnome session and log back in. Far more frequently, however, reset fails and I have to hard reboot. In all cases I can ssh into my system to debug.
The crash symptoms start with a sudden freeze of the desktop environment for a few seconds, followed by the screen becoming pixelated and multi-colored (typical gpu crash in my experience).
Crashes typically happen when under load (firefox open, switching applications) and are less likely to happen when I'm only using gnome-terminal.
My system has a Vega 56 GPU using the amdgpu driver and has 3 actively used monitors.
[Thu Feb 1 11:22:17 2024] [drm:do_
[Thu Feb 1 11:22:17 2024] [drm:amdgpu_
[Thu Feb 1 11:22:17 2024] [drm:amdgpu_
[Thu Feb 1 11:22:17 2024] amdgpu 0000:1f:00.0: amdgpu: GPU reset begin!
[Thu Feb 1 11:22:18 2024] [drm] psp gfx command UNLOAD_TA(0x2) failed and response status is (0x117)
[Thu Feb 1 11:22:18 2024] amdgpu 0000:1f:00.0: amdgpu: BACO reset
[Thu Feb 1 11:22:18 2024] amdgpu 0000:1f:00.0: amdgpu: GPU reset succeeded, trying to resume
[Thu Feb 1 11:22:18 2024] [drm] PCIE GART of 512M enabled.
[Thu Feb 1 11:22:18 2024] [drm] PTB located at 0x000000F400000000
[Thu Feb 1 11:22:18 2024] [drm] VRAM is lost due to GPU reset!
[Thu Feb 1 11:22:18 2024] [drm] PSP is resuming...
[Thu Feb 1 11:22:19 2024] [drm] reserve 0x400000 from 0xf5fec00000 for PSP TMR
[Thu Feb 1 11:22:19 2024] [drm] kiq ring mec 2 pipe 1 q 0
[Thu Feb 1 11:22:19 2024] [drm] UVD and UVD ENC initialized successfully.
[Thu Feb 1 11:22:19 2024] [drm] VCE initialized successfully.
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring gfx_low uses VM inv eng 1 on hub 0
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring gfx_high uses VM inv eng 4 on hub 0
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 5 on hub 0
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 6 on hub 0
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 7 on hub 0
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 8 on hub 0
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 9 on hub 0
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 10 on hub 0
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 11 on hub 0
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 12 on hub 0
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 13 on hub 0
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 8
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring page0 uses VM inv eng 1 on hub 8
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring sdma1 uses VM inv eng 4 on hub 8
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring page1 uses VM inv eng 5 on hub 8
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring uvd_0 uses VM inv eng 6 on hub 8
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring uvd_enc_0.0 uses VM inv eng 7 on hub 8
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring uvd_enc_0.1 uses VM inv eng 8 on hub 8
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring vce0 uses VM inv eng 9 on hub 8
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring vce1 uses VM inv eng 10 on hub 8
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: ring vce2 uses VM inv eng 11 on hub 8
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: recover vram bo from shadow start
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: recover vram bo from shadow done
[Thu Feb 1 11:22:19 2024] amdgpu 0000:1f:00.0: amdgpu: GPU reset(2) succeeded!
[Thu Feb 1 11:22:19 2024] [drm] Skip scheduling IBs!
Also worth noting is that the system has lots of UBSAN array-index-
[Wed Jan 31 09:47:40 2024] =======
=
[Wed Jan 31 09:47:40 2024] UBSAN: array-index-
/drm/amd/
[Wed Jan 31 09:47:40 2024] index 1 is out of range for type 'ATOM_Vega10_
[Wed Jan 31 09:47:40 2024] CPU: 4 PID: 191 Comm: (udev-worker) Not tainted 6.6.0-14-generic #14-Ubuntu
[Wed Jan 31 09:47:40 2024] Hardware name: Micro-Star International Co., Ltd. MS-7B79/X470 GAMING PLUS (MS-
7B79), BIOS A.40 06/28/2018
[Wed Jan 31 09:47:40 2024] Call Trace:
[Wed Jan 31 09:47:40 2024] <TASK>
[Wed Jan 31 09:47:40 2024] dump_stack_
[Wed Jan 31 09:47:40 2024] dump_stack+
[Wed Jan 31 09:47:40 2024] __ubsan_
[Wed Jan 31 09:47:40 2024] get_vddc_
[Wed Jan 31 09:47:40 2024] vega10_
[Wed Jan 31 09:47:40 2024] hwmgr_hw_
[Wed Jan 31 09:47:40 2024] pp_hw_init+
[Wed Jan 31 09:47:40 2024] amdgpu_
[Wed Jan 31 09:47:40 2024] amdgpu_
[Wed Jan 31 09:47:40 2024] amdgpu_
[Wed Jan 31 09:47:40 2024] amdgpu_
[Wed Jan 31 09:47:40 2024] local_pci_
[Wed Jan 31 09:47:40 2024] pci_call_
[Wed Jan 31 09:47:40 2024] pci_device_
[Wed Jan 31 09:47:40 2024] ? srso_return_
[Wed Jan 31 09:47:40 2024] really_
[Wed Jan 31 09:47:40 2024] __driver_
[Wed Jan 31 09:47:40 2024] driver_
[Wed Jan 31 09:47:40 2024] __driver_
[Wed Jan 31 09:47:40 2024] ? __pfx__
[Wed Jan 31 09:47:40 2024] bus_for_
[Wed Jan 31 09:47:40 2024] driver_
[Wed Jan 31 09:47:40 2024] bus_add_
[Wed Jan 31 09:47:40 2024] driver_
[Wed Jan 31 09:47:40 2024] ? __pfx_amdgpu_
[Wed Jan 31 09:47:40 2024] __pci_register_
[Wed Jan 31 09:47:40 2024] amdgpu_
[Wed Jan 31 09:47:40 2024] ? srso_return_
[Wed Jan 31 09:47:40 2024] do_one_
[Wed Jan 31 09:47:40 2024] do_init_
[Wed Jan 31 09:47:40 2024] load_module+
[Wed Jan 31 09:47:40 2024] ? vfree.part.
[Wed Jan 31 09:47:40 2024] ? srso_return_
[Wed Jan 31 09:47:40 2024] ? kfree+0x78/0x120
[Wed Jan 31 09:47:40 2024] init_module_
[Wed Jan 31 09:47:40 2024] ? srso_return_
[Wed Jan 31 09:47:40 2024] ? init_module_
[Wed Jan 31 09:47:40 2024] idempotent_
[Wed Jan 31 09:47:40 2024] __x64_sys_
[Wed Jan 31 09:47:40 2024] do_syscall_
[Wed Jan 31 09:47:40 2024] ? do_syscall_
[Wed Jan 31 09:47:40 2024] ? srso_return_
[Wed Jan 31 09:47:40 2024] ? ksys_read+
[Wed Jan 31 09:47:40 2024] ? srso_return_
[Wed Jan 31 09:47:40 2024] ? exit_to_
[Wed Jan 31 09:47:40 2024] ? srso_return_
[Wed Jan 31 09:47:40 2024] ? syscall_
[Wed Jan 31 09:47:40 2024] ? srso_return_
[Wed Jan 31 09:47:40 2024] ? do_syscall_
[Wed Jan 31 09:47:40 2024] ? do_syscall_
[Wed Jan 31 09:47:40 2024] ? do_syscall_
[Wed Jan 31 09:47:40 2024] entry_SYSCALL_
[Wed Jan 31 09:47:40 2024] RIP: 0033:0x72d4f7e5ac7d
[Wed Jan 31 09:47:40 2024] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 81 0d 00 f7 d8 64 89 01 48
[Wed Jan 31 09:47:40 2024] RSP: 002b:00007fffd3
[Wed Jan 31 09:47:40 2024] RAX: ffffffffffffffda RBX: 000064f9c0d57670 RCX: 000072d4f7e5ac7d
[Wed Jan 31 09:47:40 2024] RDX: 0000000000000004 RSI: 000072d4f7fd744a RDI: 0000000000000019
[Wed Jan 31 09:47:40 2024] RBP: 000072d4f7fd744a R08: 0000000000000040 R09: fffffffffffffde0
[Wed Jan 31 09:47:40 2024] R10: fffffffffffffe18 R11: 0000000000000246 R12: 0000000000020000
[Wed Jan 31 09:47:40 2024] R13: 000064f9c0d575c0 R14: 0000000000000000 R15: 000064f9c0dde630
[Wed Jan 31 09:47:40 2024] </TASK>
[Wed Jan 31 09:47:40 2024] =======
ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: linux-image-
ProcVersionSign
Uname: Linux 6.6.0-14-generic x86_64
ApportVersion: 2.27.0-0ubuntu6
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/seq: holmanb 7815 F.... pipewire
/dev/snd/
/dev/snd/
/dev/snd/
CRDA: N/A
CasperMD5CheckR
CurrentDesktop: ubuntu:GNOME
Date: Thu Feb 1 11:34:21 2024
InstallationDate: Installed on 2021-10-20 (834 days ago)
InstallationMedia: Ubuntu 21.04 "Hirsute Hippo" - Release amd64 (20210420)
IwConfig:
lo no wireless extensions.
enp24s0 no wireless extensions.
virbr0 no wireless extensions.
MachineType: {report[
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=
RelatedPackageV
linux-
linux-
linux-firmware 20230919.
RfKill:
SourcePackage: linux
UpgradeStatus: Upgraded to noble on 2024-01-30 (2 days ago)
dmi.bios.date: 06/28/2018
dmi.bios.release: 5.13
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: A.40
dmi.board.
dmi.board.name: X470 GAMING PLUS (MS-7B79)
dmi.board.vendor: Micro-Star International Co., Ltd.
dmi.board.version: 2.0
dmi.chassis.
dmi.chassis.type: 3
dmi.chassis.vendor: Micro-Star International Co., Ltd.
dmi.chassis.
dmi.modalias: dmi:bvnAmerican
dmi.product.family: To be filled by O.E.M.
dmi.product.name: MS-7B79
dmi.product.sku: To be filled by O.E.M.
dmi.product.
dmi.sys.vendor: Micro-Star International Co., Ltd.
description: | updated |