UBSAN: array-index-out-of-bounds in /build/linux-D15vQj/linux-6.5.0/drivers/md/bcache/bset.c:1098:3

Bug #2039368 reported by Thomas Debesse
38
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Since I upgraded from lunar to mantic I get a load of those errors (41 on a fresh boot) in dmesg:

```
[ 4.277343] UBSAN: array-index-out-of-bounds in /build/linux-D15vQj/linux-6.5.0/drivers/md/bcache/bset.c:1098:3
[ 4.277728] index 4 is out of range for type 'btree_iter_set [4]'
[ 4.277925] CPU: 7 PID: 247 Comm: kworker/7:1 Not tainted 6.5.0-9-generic #9-Ubuntu
[ 4.278132] Hardware name: Default string Default string/Default string, BIOS WRX80SU8-F6 06/08/2023
[ 4.278531] Workqueue: events register_cache_worker [bcache]
[ 4.278754] Call Trace:
[ 4.278949] <TASK>
[ 4.279143] dump_stack_lvl+0x48/0x70
[ 4.279337] dump_stack+0x10/0x20
[ 4.279526] __ubsan_handle_out_of_bounds+0xc6/0x110
[ 4.279721] bch_btree_iter_push+0x4e6/0x4f0 [bcache]
[ 4.279929] bch_btree_node_read_done+0xcb/0x410 [bcache]
[ 4.280142] bch_btree_node_read+0xf8/0x1e0 [bcache]
[ 4.280349] ? __pfx_closure_sync_fn+0x10/0x10 [bcache]
[ 4.280557] bch_btree_node_get.part.0+0x15c/0x330 [bcache]
[ 4.280764] ? __bch_btree_ptr_invalid+0x66/0xe0 [bcache]
[ 4.280975] ? __pfx_up_write+0x10/0x10
[ 4.281170] bch_btree_node_get+0x16/0x30 [bcache]
[ 4.281375] run_cache_set+0x596/0x850 [bcache]
[ 4.281578] ? srso_return_thunk+0x5/0x10
[ 4.281773] register_cache_set+0x1a2/0x210 [bcache]
[ 4.281984] register_cache+0x11a/0x1a0 [bcache]
[ 4.282187] register_cache_worker+0x22/0x80 [bcache]
[ 4.282387] process_one_work+0x223/0x440
[ 4.282573] worker_thread+0x4d/0x3f0
[ 4.282753] ? srso_return_thunk+0x5/0x10
[ 4.282931] ? _raw_spin_lock_irqsave+0xe/0x20
[ 4.283113] ? __pfx_worker_thread+0x10/0x10
[ 4.283286] kthread+0xf2/0x120
[ 4.283458] ? __pfx_kthread+0x10/0x10
[ 4.283631] ret_from_fork+0x47/0x70
[ 4.283800] ? __pfx_kthread+0x10/0x10
[ 4.283972] ret_from_fork_asm+0x1b/0x30
[ 4.284143] </TASK>
```

This system has 4 bcache backing devices and 4 bcache cache devices, though they are not associated for now and caching is disabled. It was already like that when I upgraded, so the kernel only uses the backing code, not the caching one.

ProblemType: Bug
DistroRelease: Ubuntu 23.10
Package: linux-image-6.5.0-9-generic 6.5.0-9.9
ProcVersionSignature: Ubuntu 6.5.0-9.9-generic 6.5.3
Uname: Linux 6.5.0-9-generic x86_64
ApportVersion: 2.27.0-0ubuntu5
Architecture: amd64
CasperMD5CheckResult: unknown
CurrentDesktop: GNOME
Date: Sat Oct 14 23:16:33 2023
HibernationDevice: RESUME=none
MachineType: {report['dmi.sys.vendor']} {report['dmi.product.name']}
ProcFB:
 0 amdgpudrmfb
 1 astdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/@/boot/vmlinuz-6.5.0-9-generic root=UUID=f35ecf77-511e-4dde-ac11-c1d848e97315 ro rootflags=subvol=@ amdgpu.si_support=1 radeon.si_support=0 amdgpu.cik_support=1 radeon.cik_support=0 amdgpu.exp_hw_support=1 amdgpu.gpu_recovery=1 amdgpu.ppfeaturemask=0xffffffff delayacct zswap.enabled=1
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-6.5.0-9-generic N/A
 linux-backports-modules-6.5.0-9-generic N/A
 linux-firmware 20230919.git3672ccab-0ubuntu2.1
RfKill:

SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 06/08/2023
dmi.bios.release: 5.23
dmi.bios.vendor: American Megatrends International, LLC.
dmi.bios.version: WRX80SU8-F6
dmi.board.asset.tag: Default string
dmi.board.name: Default string
dmi.board.vendor: Default string
dmi.board.version: Default string
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInternational,LLC.:bvrWRX80SU8-F6:bd06/08/2023:br5.23:svnDefaultstring:pnDefaultstring:pvrDefaultstring:rvnDefaultstring:rnDefaultstring:rvrDefaultstring:cvnDefaultstring:ct3:cvrDefaultstring:skuDefaultstring:
dmi.product.family: Default string
dmi.product.name: Default string
dmi.product.sku: Default string
dmi.product.version: Default string
dmi.sys.vendor: Default string
modified.conffile..etc.default.apport: [modified]
mtime.conffile..etc.default.apport: 2018-06-16T17:39:00.798346

Revision history for this message
Thomas Debesse (illwieckz) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Juerg Haefliger (juergh)
tags: added: kernel-flexible-array
Revision history for this message
KonishchevDmitry (konishchevdmitry) wrote :
Download full text (65.7 KiB)

I have similar messages, but with AMD GPU:

```
Oct 16 18:41:04 server kernel: ================================================================================
Oct 16 18:41:04 server kernel: UBSAN: array-index-out-of-bounds in /build/linux-D15vQj/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1249:61
Oct 16 18:41:04 server kernel: index 1 is out of range for type 'ATOM_PPLIB_VCE_Clock_Voltage_Limit_Record [1]'
Oct 16 18:41:04 server kernel: CPU: 3 PID: 128 Comm: (udev-worker) Not tainted 6.5.0-9-generic #9-Ubuntu
Oct 16 18:41:04 server kernel: Hardware name: HPE ProLiant MicroServer Gen10/ProLiant MicroServer Gen10, BIOS 5.12 06/26/2018
Oct 16 18:41:04 server kernel: Call Trace:
Oct 16 18:41:04 server kernel: <TASK>
Oct 16 18:41:04 server kernel: dump_stack_lvl+0x48/0x70
Oct 16 18:41:04 server kernel: dump_stack+0x10/0x20
Oct 16 18:41:04 server kernel: __ubsan_handle_out_of_bounds+0xc6/0x110
Oct 16 18:41:04 server kernel: init_clock_voltage_dependency+0x9bb/0xa60 [amdgpu]
Oct 16 18:41:04 server kernel: pp_tables_initialize+0x116/0x440 [amdgpu]
Oct 16 18:41:04 server kernel: hwmgr_hw_init+0x7b/0x1e0 [amdgpu]
Oct 16 18:41:04 server kernel: pp_hw_init+0x16/0x50 [amdgpu]
Oct 16 18:41:04 server kernel: amdgpu_device_ip_init+0x48e/0x900 [amdgpu]
Oct 16 18:41:04 server kernel: amdgpu_device_init+0x975/0x1160 [amdgpu]
Oct 16 18:41:04 server kernel: amdgpu_driver_load_kms+0x1a/0x1c0 [amdgpu]
Oct 16 18:41:04 server kernel: amdgpu_pci_probe+0x175/0x490 [amdgpu]
Oct 16 18:41:04 server kernel: local_pci_probe+0x47/0xb0
Oct 16 18:41:04 server kernel: pci_call_probe+0x55/0x190
Oct 16 18:41:04 server kernel: pci_device_probe+0x84/0x120
Oct 16 18:41:04 server kernel: really_probe+0x1c7/0x410
Oct 16 18:41:04 server kernel: __driver_probe_device+0x8c/0x180
Oct 16 18:41:04 server kernel: driver_probe_device+0x24/0xd0
Oct 16 18:41:04 server kernel: __driver_attach+0x10b/0x210
Oct 16 18:41:04 server kernel: ? __pfx___driver_attach+0x10/0x10
Oct 16 18:41:04 server kernel: bus_for_each_dev+0x8d/0xf0
Oct 16 18:41:04 server kernel: driver_attach+0x1e/0x30
Oct 16 18:41:04 server kernel: bus_add_driver+0x127/0x240
Oct 16 18:41:04 server kernel: driver_register+0x5e/0x130
Oct 16 18:41:04 server kernel: ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
Oct 16 18:41:04 server kernel: __pci_register_driver+0x62/0x70
Oct 16 18:41:04 server kernel: amdgpu_init+0x69/0xff0 [amdgpu]
Oct 16 18:41:04 server kernel: do_one_initcall+0x5e/0x340
Oct 16 18:41:04 server kernel: do_init_module+0x91/0x290
Oct 16 18:41:04 server kernel: load_module+0xba1/0xcf0
Oct 16 18:41:04 server kernel: ? vfree+0xff/0x2d0
Oct 16 18:41:04 server kernel: init_module_from_file+0x96/0x100
Oct 16 18:41:04 server kernel: ? init_module_from_file+0x96/0x100
Oct 16 18:41:04 server kernel: idempotent_init_module+0x11c/0x2b0
Oct 16 18:41:04 server kernel: __x64_sys_finit_module+0x64/0xd0
Oct 16 18:41:04 server kernel: do_syscall_64+0x5c/0x90
Oct 16 18:41:04 server kernel: ? syscall_exit_to_user_mode+0x37/0x60
Oct 16 18:41:04 server kernel: ? do_syscall_64+0x68/0x90
Oct 16 18:41:04 server kernel: ? syscall_exit_to_user_mode+0x37/0x60
Oct 16 18:41:04 serve...

Revision history for this message
Jonathan Crooke (itsthejb) wrote :

I also have lots of these after upgrading to 23.10, featuing Bcache and Virtualbox

Revision history for this message
Eduardo-sanchez-mata (eduardo-sanchez-mata) wrote :

Can confirm. Dell Latitude E5570 with AMD Radeon R7 GPU (Topaz XT) running kernel 6.5.0-13-generic with libdrm-amdgpu1 2.4.115-1 in Xorg

Kernel UBSAN errors in amdgpu, apparently in drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c

Revision history for this message
Eduardo-sanchez-mata (eduardo-sanchez-mata) wrote :

uppon further inspection, the amdgpu traces are:

> journalctl -k --grep UBSAN -ojson --output-fields=MESSAGE |jq -r .MESSAGE |uniq
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1249:61
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1251:44
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1249:46
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1221:34
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1222:44
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1221:19
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1278:45
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1279:65
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1280:9
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1303:44
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1304:64
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1305:9
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:394:34
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:395:4
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:397:19
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1500:42
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1501:42
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:1502:42
UBSAN: array-index-out-of-bounds in /build/linux-X4HRYL/linux-6.5.0/drivers/gpu/drm/amd/amdgpu/../pm/powerplay/hwmgr/processpptables.c:929:40

I'm afraid they may be related to the kernel KSSP shenanigans landing in amdgpu, and not related at all to bcache (they just happen to be be USBSAN errors too). Should I open a separate bug report?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.