Kernel crash in amd gpu driver

Bug #2056498 reported by Brad Figg
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-signed-hwe-6.5 (Ubuntu)
New
Undecided
Unassigned

Bug Description

Mar 7 19:07:10 ripper kernel: [ 9.873519] UBSAN: array-index-out-of-bounds in /build/linux-hwe-6.5-YpKOvT/linux-hwe-6.5-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/smu7_hwmgr.c:3676:4
Mar 7 19:07:10 ripper kernel: [ 9.873531] index 7 is out of range for type 'ATOM_Polaris_SCLK_Dependency_Record [1]'
Mar 7 19:07:10 ripper kernel: [ 9.873538] CPU: 4 PID: 849 Comm: systemd-udevd Not tainted 6.5.0-17-generic #17~22.04.1-Ubuntu
Mar 7 19:07:10 ripper kernel: [ 9.873542] Hardware name: LENOVO 30E1S3VV00/1046, BIOS S07KT45A 01/20/2022
Mar 7 19:07:10 ripper kernel: [ 9.873544] Call Trace:
Mar 7 19:07:10 ripper kernel: [ 9.873545] <TASK>
Mar 7 19:07:10 ripper kernel: [ 9.873547] dump_stack_lvl+0x48/0x70
Mar 7 19:07:10 ripper kernel: [ 9.873551] dump_stack+0x10/0x20
Mar 7 19:07:10 ripper kernel: [ 9.873554] __ubsan_handle_out_of_bounds+0xc6/0x110
Mar 7 19:07:10 ripper kernel: [ 9.873560] smu7_get_pp_table_entry_callback_func_v1+0x9b7/0xa00 [amdgpu]
Mar 7 19:07:10 ripper kernel: [ 9.873897] ? srso_return_thunk+0x5/0x10
Mar 7 19:07:10 ripper kernel: [ 9.873900] ? vi_pcie_rreg+0x6e/0x90 [amdgpu]
Mar 7 19:07:10 ripper kernel: [ 9.874187] ? __pfx_smu7_get_pp_table_entry_callback_func_v1+0x10/0x10 [amdgpu]
Mar 7 19:07:10 ripper kernel: [ 9.874515] get_powerplay_table_entry_v1_0+0xf8/0x490 [amdgpu]
Mar 7 19:07:10 ripper kernel: [ 9.874842] smu7_get_pp_table_entry_v1+0x41/0x4c0 [amdgpu]
Mar 7 19:07:10 ripper kernel: [ 9.875169] smu7_get_pp_table_entry+0x3d/0x50 [amdgpu]
Mar 7 19:07:10 ripper kernel: [ 9.875495] psm_init_power_state_table+0x161/0x250 [amdgpu]
Mar 7 19:07:10 ripper kernel: [ 9.875826] hwmgr_hw_init+0xe3/0x1e0 [amdgpu]
Mar 7 19:07:10 ripper kernel: [ 9.876150] pp_hw_init+0x16/0x50 [amdgpu]
Mar 7 19:07:10 ripper kernel: [ 9.876484] amdgpu_device_ip_init+0x48d/0x960 [amdgpu]
Mar 7 19:07:10 ripper kernel: [ 9.876749] amdgpu_device_init+0x9a2/0x1150 [amdgpu]
Mar 7 19:07:10 ripper kernel: [ 9.877014] amdgpu_driver_load_kms+0x1a/0x1c0 [amdgpu]
Mar 7 19:07:10 ripper kernel: [ 9.877278] amdgpu_pci_probe+0x182/0x450 [amdgpu]
Mar 7 19:07:10 ripper kernel: [ 9.877541] local_pci_probe+0x47/0xb0
Mar 7 19:07:10 ripper kernel: [ 9.877545] pci_call_probe+0x55/0x190
Mar 7 19:07:10 ripper kernel: [ 9.877550] pci_device_probe+0x84/0x120
Mar 7 19:07:10 ripper kernel: [ 9.877553] ? srso_return_thunk+0x5/0x10
Mar 7 19:07:10 ripper kernel: [ 9.877557] really_probe+0x1cc/0x430
Mar 7 19:07:10 ripper kernel: [ 9.877560] __driver_probe_device+0x8c/0x190
Mar 7 19:07:10 ripper kernel: [ 9.877563] driver_probe_device+0x24/0xd0
Mar 7 19:07:10 ripper kernel: [ 9.877566] __driver_attach+0x10b/0x210
Mar 7 19:07:10 ripper kernel: [ 9.877569] ? __pfx___driver_attach+0x10/0x10
Mar 7 19:07:10 ripper kernel: [ 9.877572] bus_for_each_dev+0x8d/0xf0
Mar 7 19:07:10 ripper kernel: [ 9.877576] driver_attach+0x1e/0x30
Mar 7 19:07:10 ripper kernel: [ 9.877579] bus_add_driver+0x127/0x240
Mar 7 19:07:10 ripper kernel: [ 9.877583] driver_register+0x5e/0x130
Mar 7 19:07:10 ripper kernel: [ 9.877586] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
Mar 7 19:07:10 ripper kernel: [ 9.877849] __pci_register_driver+0x62/0x70
Mar 7 19:07:10 ripper kernel: [ 9.877852] amdgpu_init+0x69/0xff0 [amdgpu]
Mar 7 19:07:10 ripper kernel: [ 9.878111] ? srso_return_thunk+0x5/0x10
Mar 7 19:07:10 ripper kernel: [ 9.878114] do_one_initcall+0x5e/0x340
Mar 7 19:07:10 ripper kernel: [ 9.878120] do_init_module+0x68/0x260
Mar 7 19:07:10 ripper kernel: [ 9.878123] load_module+0xb85/0xcd0
Mar 7 19:07:10 ripper kernel: [ 9.878128] ? srso_return_thunk+0x5/0x10
Mar 7 19:07:10 ripper kernel: [ 9.878131] ? security_kernel_post_read_file+0x75/0x90
Mar 7 19:07:10 ripper kernel: [ 9.878136] init_module_from_file+0x96/0x100
Mar 7 19:07:10 ripper kernel: [ 9.878139] ? srso_return_thunk+0x5/0x10
Mar 7 19:07:10 ripper kernel: [ 9.878142] ? init_module_from_file+0x96/0x100
Mar 7 19:07:10 ripper kernel: [ 9.878149] idempotent_init_module+0x11c/0x2b0
Mar 7 19:07:10 ripper kernel: [ 9.878155] __x64_sys_finit_module+0x64/0xd0
Mar 7 19:07:10 ripper kernel: [ 9.878159] do_syscall_64+0x5b/0x90
Mar 7 19:07:10 ripper kernel: [ 9.878161] ? srso_return_thunk+0x5/0x10
Mar 7 19:07:10 ripper kernel: [ 9.878164] ? ksys_mmap_pgoff+0x120/0x270
Mar 7 19:07:10 ripper kernel: [ 9.878167] ? __secure_computing+0x89/0xf0
Mar 7 19:07:10 ripper kernel: [ 9.878170] ? srso_return_thunk+0x5/0x10
Mar 7 19:07:10 ripper kernel: [ 9.878173] ? srso_return_thunk+0x5/0x10
Mar 7 19:07:10 ripper kernel: [ 9.878176] ? exit_to_user_mode_prepare+0x30/0xb0
Mar 7 19:07:10 ripper kernel: [ 9.878179] ? srso_return_thunk+0x5/0x10
Mar 7 19:07:10 ripper kernel: [ 9.878181] ? syscall_exit_to_user_mode+0x37/0x60
Mar 7 19:07:10 ripper kernel: [ 9.878184] ? srso_return_thunk+0x5/0x10
Mar 7 19:07:10 ripper kernel: [ 9.878187] ? do_syscall_64+0x67/0x90
Mar 7 19:07:10 ripper kernel: [ 9.878189] ? syscall_exit_to_user_mode+0x37/0x60
Mar 7 19:07:10 ripper kernel: [ 9.878192] ? srso_return_thunk+0x5/0x10
Mar 7 19:07:10 ripper kernel: [ 9.878195] ? do_syscall_64+0x67/0x90
Mar 7 19:07:10 ripper kernel: [ 9.878198] ? srso_return_thunk+0x5/0x10
Mar 7 19:07:10 ripper kernel: [ 9.878200] ? do_syscall_64+0x67/0x90
Mar 7 19:07:10 ripper kernel: [ 9.878203] ? do_syscall_64+0x67/0x90
Mar 7 19:07:10 ripper kernel: [ 9.878206] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Mar 7 19:07:10 ripper kernel: [ 9.878208] RIP: 0033:0x7f5989f1e88d
Mar 7 19:07:10 ripper kernel: [ 9.878215] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 01 48
Mar 7 19:07:10 ripper kernel: [ 9.878217] RSP: 002b:00007ffdb49c4aa8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Mar 7 19:07:10 ripper kernel: [ 9.878220] RAX: ffffffffffffffda RBX: 00005590095b55d0 RCX: 00007f5989f1e88d
Mar 7 19:07:10 ripper kernel: [ 9.878221] RDX: 0000000000000000 RSI: 00007f598a0ed441 RDI: 000000000000001b
Mar 7 19:07:10 ripper kernel: [ 9.878223] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000002
Mar 7 19:07:10 ripper kernel: [ 9.878225] R10: 000000000000001b R11: 0000000000000246 R12: 00007f598a0ed441
Mar 7 19:07:10 ripper kernel: [ 9.878226] R13: 00005590094b5180 R14: 00005590094ca990 R15: 00005590095bb790
Mar 7 19:07:10 ripper kernel: [ 9.878232] </TASK>
Mar 7 19:07:10 ripper kernel: [ 9.878239] ================================================================================
Mar 7 19:07:11 ripper kernel: [ 10.146135] [drm] Display Core v3.2.241 initialized on DCE 11.2
Mar 7 19:07:11 ripper kernel: [ 10.148066] snd_hda_intel 0000:61:00.1: bound 0000:61:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
Mar 7 19:07:11 ripper kernel: [ 10.225001] [drm] UVD and UVD ENC initialized successfully.
Mar 7 19:07:11 ripper kernel: [ 10.324927] [drm] VCE initialized successfully.
Mar 7 19:07:11 ripper kernel: [ 10.326900] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
Mar 7 19:07:11 ripper kernel: [ 10.326912] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
Mar 7 19:07:11 ripper kernel: [ 10.327087] amdgpu: Virtual CRAT table created for GPU
Mar 7 19:07:11 ripper kernel: [ 10.327171] amdgpu: Topology: Add dGPU node [0x67e3:0x1002]
Mar 7 19:07:11 ripper kernel: [ 10.327174] kfd kfd: amdgpu: added device 1002:67e3
Mar 7 19:07:11 ripper kernel: [ 10.327188] amdgpu 0000:61:00.0: amdgpu: SE 2, SH per SE 1, CU per SH 8, active_cu_number 16
Mar 7 19:07:11 ripper kernel: [ 10.330889] amdgpu 0000:61:00.0: amdgpu: Using BACO for runtime pm
Mar 7 19:07:11 ripper kernel: [ 10.331966] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:61:00.0 on minor 0

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: linux-image-6.5.0-17-generic 6.5.0-17.17~22.04.1
ProcVersionSignature: Ubuntu 6.5.0-17.17~22.04.1-generic 6.5.8
Uname: Linux 6.5.0-17-generic x86_64
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
Date: Thu Mar 7 19:16:45 2024
InstallationDate: Installed on 2022-08-03 (582 days ago)
InstallationMedia: Ubuntu 22.04 LTS "Jammy Jellyfish" - Daily amd64 (20220408)
ProcEnviron:
 TERM=tmux-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-signed-hwe-6.5
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Brad Figg (brad-figg) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote :

The above crash was happening with large downloads of img files or git clones of large repositories (Ubuntu kernels) over wifi. I have changed to hard wired ethernet and I've not been able to reproduce it. With Wifi it's been very reproduceable.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.