Error UBSAN: array-index-out-of-bounds amdgpu (drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/smu7_hwmgr.c)

Bug #2039926 reported by msr1985
96
This bug affects 19 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Unknown
linux (Ubuntu)
Confirmed
Undecided
Unassigned
linux-hwe-6.5 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Error in boot:

[ 8.597520] UBSAN: array-index-out-of-bounds in /build/linux-D15vQj/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/smu7_hwmgr.c:3676:4
[ 8.597527] index 7 is out of range for type 'ATOM_Polaris_SCLK_Dependency_Record [1]'

ProblemType: Bug
DistroRelease: Ubuntu 23.10
Package: linux-image-generic 6.5.0.9.11
ProcVersionSignature: Ubuntu 6.5.0-9.9-generic 6.5.3
Uname: Linux 6.5.0-9-generic x86_64
ApportVersion: 2.27.0-0ubuntu5
Architecture: amd64
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
Date: Fri Oct 20 09:28:16 2023
InstallationDate: Installed on 2022-10-12 (373 days ago)
InstallationMedia: Ubuntu 22.04.1 LTS "Jammy Jellyfish" - Release amd64 (20220809.1)
MachineType: {report['dmi.sys.vendor']} {report['dmi.product.name']}
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.5.0-9-generic root=UUID=9edc5478-c6c2-4cf3-9de8-01ccb697fb9e ro quiet splash audit=0 mitigations=off amdgpu.ppfeaturemask=0xffffffff vt.global_cursor_default=0 loglevel=2 rd.systemd.show_status=false rd.udev.log-prority=3 sysrq_always_enabled=1 audit=0 vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-6.5.0-9-generic N/A
 linux-backports-modules-6.5.0-9-generic N/A
 linux-firmware 20230919.git3672ccab-0ubuntu2.1
SourcePackage: linux
UpgradeStatus: Upgraded to mantic on 2023-10-13 (6 days ago)
dmi.bios.date: 07/11/2014
dmi.bios.release: 4.6
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: D3EMW08.110
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: D3F3-EM
dmi.board.vendor: MEDION
dmi.board.version: 1.0
dmi.chassis.type: 3
dmi.chassis.vendor: MEDION
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrD3EMW08.110:bd07/11/2014:br4.6:svnMEDION:pnD3F3-EM:pvr1.0:rvnMEDION:rnD3F3-EM:rvr1.0:cvnMEDION:ct3:cvr:skuTobefilledbyO.E.M.:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: D3F3-EM
dmi.product.sku: To be filled by O.E.M.
dmi.product.version: 1.0
dmi.sys.vendor: MEDION

Revision history for this message
msr1985 (mariosantamaria) wrote :
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Mario Limonciello (superm1) wrote (last edit ): Re: Error UBSAN: array-index-out-of-bounds amdgpu
Juerg Haefliger (juergh)
tags: added: kernel-flexible-array
Changed in linux:
status: Unknown → New
Revision history for this message
Olivier Duclos (odc) wrote :

The mentioned patches fix the issue for Polaris and Tonga, but other AMD GPUs are also affected. I have a Radeon Vega 64 and here is a summary of the UBSAN errors I see:

index 2 is out of range for type 'ATOM_Vega10_MM_Dependency_Record [1]'
index 1 is out of range for type 'ATOM_Vega10_CLK_Dependency_Record [1]'
index 5 is out of range for type 'ATOM_Vega10_CLK_Dependency_Record [1]'
index 1 is out of range for type 'ATOM_Vega10_MCLK_Dependency_Record [1]'
index 1 is out of range for type 'ATOM_Vega10_PCIE_Record [1]'
index 1 is out of range for type 'ATOM_Vega10_Voltage_Lookup_Record [1]'

Revision history for this message
Alex Deucher (alexander-deucher) wrote :

FWIW, none of these are actually out of bound accesses. These just happen to use the old nomenclature for variable sized arrays.

Revision history for this message
Alex Deucher (alexander-deucher) wrote :
Juerg Haefliger (juergh)
summary: Error UBSAN: array-index-out-of-bounds amdgpu
+ (drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/smu7_hwmgr.c)
Revision history for this message
misiu_mp (misiu-mp) wrote :

I have HAWAII (R9 290X). I get the following:

array-index-out-of-bounds in /build/linux-7T0Dsf/linux-6.6.6/drivers/gpu/drm/radeon/radeon_atombios.c:2717:34
index 18 is out of range for type 'UCHAR [1]'

array-index-out-of-bounds in /build/linux-7T0Dsf/linux-6.6.6/drivers/gpu/drm/radeon/radeon_atombios.c:2715:55
index 1 is out of range for type 'UCHAR [1]'

UBSAN: array-index-out-of-bounds in /build/linux-7T0Dsf/linux-6.6.6/drivers/gpu/drm/radeon/ci_dpm.c:5588:32
index 9 is out of range for type 'UCHAR [1]'

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-hwe-6.5 (Ubuntu):
status: New → Confirmed
Revision history for this message
Muhammed Sabbagh (muhammedsabbagh) wrote :
Download full text (3.7 KiB)

similar to other reports from the fellows up:

GPU: \ Vendor: AMD (0x1002)
    Device: AMD Radeon R7 M340 (iceland, LLVM 15.0.7, DRM 3.54, 6.5.0-14-generic) (0x6900)
    Version: 23.0.4

Linux: Ubuntu 22.04
Kernel: linux-image-6.5.0-14-generic/jammy-updates,jammy-security,now 6.5.0-14.14~22.04.1 amd64

[ 12.308765] ================================================================================
[ 12.309790] ================================================================================
[ 12.310831] UBSAN: array-index-out-of-bounds in /build/linux-hwe-6.5-q7NZ0T/linux-hwe-6.5-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1279:65
[ 12.312963] index 1 is out of range for type 'ATOM_PPLIB_SAMClk_Voltage_Limit_Record [1]'
[ 12.314267] CPU: 1 PID: 341 Comm: systemd-udevd Not tainted 6.5.0-14-generic #14~22.04.1-Ubuntu
[ 12.314275] Hardware name: LENOVO 80SY/LNVNB161216, BIOS 0ZCN52WW 12/11/2019
[ 12.314278] Call Trace:
[ 12.314281] <TASK>
[ 12.314286] dump_stack_lvl+0x48/0x70
[ 12.314298] dump_stack+0x10/0x20
[ 12.314304] __ubsan_handle_out_of_bounds+0xc6/0x110
[ 12.314314] init_clock_voltage_dependency+0x8ca/0xa60 [amdgpu]
[ 12.315266] pp_tables_initialize+0x116/0x440 [amdgpu]
[ 12.316037] ? amdgpu_ring_test_helper+0x83/0x90 [amdgpu]
[ 12.316822] hwmgr_hw_init+0x78/0x1e0 [amdgpu]
[ 12.317642] pp_hw_init+0x16/0x50 [amdgpu]
[ 12.318433] amdgpu_device_ip_init+0x48a/0x960 [amdgpu]
[ 12.319116] amdgpu_device_init+0x9c8/0x1160 [amdgpu]
[ 12.319779] amdgpu_driver_load_kms+0x1a/0x1c0 [amdgpu]
[ 12.320436] amdgpu_pci_probe+0x182/0x450 [amdgpu]
[ 12.321077] local_pci_probe+0x44/0xb0
[ 12.321087] pci_call_probe+0x55/0x190
[ 12.321092] pci_device_probe+0x84/0x120
[ 12.321098] really_probe+0x1c9/0x430
[ 12.321103] __driver_probe_device+0x8c/0x190
[ 12.321107] driver_probe_device+0x24/0xd0
[ 12.321111] __driver_attach+0x10b/0x210
[ 12.321115] ? __pfx___driver_attach+0x10/0x10
[ 12.321120] bus_for_each_dev+0x8a/0xf0
[ 12.321126] driver_attach+0x1e/0x30
[ 12.321132] bus_add_driver+0x127/0x240
[ 12.321139] driver_register+0x5e/0x130
[ 12.321144] ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[ 12.321817] __pci_register_driver+0x62/0x70
[ 12.321824] amdgpu_init+0x69/0xff0 [amdgpu]
[ 12.322454] do_one_initcall+0x5b/0x340
[ 12.322464] do_init_module+0x68/0x260
[ 12.322471] load_module+0xb85/0xcd0
[ 12.322479] ? security_kernel_post_read_file+0x75/0x90
[ 12.322484] ? security_kernel_post_read_file+0x75/0x90
[ 12.322491] init_module_from_file+0x96/0x100
[ 12.322497] ? init_module_from_file+0x96/0x100
[ 12.322509] idempotent_init_module+0x11c/0x2b0
[ 12.322517] __x64_sys_finit_module+0x64/0xd0
[ 12.322523] do_syscall_64+0x58/0x90
[ 12.322529] ? do_syscall_64+0x67/0x90
[ 12.322534] ? sysvec_call_function+0x4b/0xd0
[ 12.322539] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 12.322545] RIP: 0033:0x7f7bf951e88d
[ 12.322569] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 73 b5 0f 00 f7 d8 64 89 ...

Read more...

Changed in linux:
status: New → Fix Released
Revision history for this message
StoatWblr (stoatwblr) wrote :

Same issue with VERDE - (atombios and si_dpm) in both 6.5 and 6.8 series kernels on both radeon and amdgpu drivers

Whatever this is isn't just 1 or 2 places in the driver (and it's been occurring for several years)

Ubuntu Noble boot (dual W4100 quad-head cards), radeon driver - the amdgpu messages are similar

[ 11.030090] UBSAN: array-index-out-of-bounds in /build/linux-zdc93w/linux-6.8.0/drivers/gpu/drm/radeon/radeon_atombios.c:2718:34
[ 11.030095] index 48 is out of range for type 'UCHAR [1]'

[ 11.031839] UBSAN: array-index-out-of-bounds in /build/linux-zdc93w/linux-6.8.0/drivers/gpu/drm/radeon/radeon_atombios.c:2716:55
[ 11.031843] index 1 is out of range for type 'UCHAR [1]'

[ 11.032818] UBSAN: array-index-out-of-bounds in /build/linux-zdc93w/linux-6.8.0/drivers/gpu/drm/radeon/radeon_atombios.c:2706:39
[ 11.032823] index 2 is out of range for type 'ATOM_PPLIB_NONCLOCK_INFO [1]'

[ 11.033887] UBSAN: array-index-out-of-bounds in /build/linux-zdc93w/linux-6.8.0/drivers/gpu/drm/radeon/si_dpm.c:6831:39
[ 11.033891] index 2 is out of range for type 'ATOM_PPLIB_NONCLOCK_INFO [1]'

[ 11.034910] UBSAN: array-index-out-of-bounds in /build/linux-zdc93w/linux-6.8.0/drivers/gpu/drm/radeon/si_dpm.c:6868:32
[ 11.034913] index 16 is out of range for type 'UCHAR [1]'

Revision history for this message
Mario Limonciello (superm1) wrote :
Revision history for this message
Mario Limonciello (superm1) wrote :

Variable sized arrays have fallen out of fashion.

There are various patches like that which fix this issue every time it crops up.

If you can still reproduce on the latest 6.9-rc kernels you should report a bug upstream to get the remaining cases fixed. Or if you feel comfortable you can write a patch for any remaining cases you hit. As you can see it's a trivial fix.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.