[mpt3sas, UBSAN] ]linux 6.5-rc give error messages at boot

Bug #2028830 reported by Fjodor
54
This bug affects 13 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Undecided
koba

Bug Description

Ubuntu release:

sune@jekaterina:~/src/deb$ lsb_release -rd
No LSB modules are available.
Description: Ubuntu 23.04
Release: 23.04
sune@jekaterina:~/src/deb$

Package:

linux-image-unsigned-6.5.0-060500rc1-generic_6.5.0-060500rc1.202307232333_amd64.deb from https://kernel.ubuntu.com/~kernel-ppa/mainline (rc2 and rc3 also affected)

Expectation: System boots normally

Actual results: See below

On the request of Koba Ko on the <email address hidden> mailing list, I hereby submit the following:

Do note that two "types" are mentioned, and that I am willing the assist in testing, as, obviously, I have a system with a controller covered by the mpt3sas driver.

This is output from attempted boot of 6.5-rc3 from kernel-ppa, but it is similar to rcs 1 and 2:

UBSAN: array-index-out-of-bounds in /home/kerne1/COD/linux/drivers/scsi/npt3sas/mpt3sas_scsih.c:4667:12
index 1 is out of range for type ’MPI12_EVENT_SAS_TOPO_PHY_ENTRY [1]'

UBSAN: array-index—out-of-bounds in /home/kerne1/COD/linux/drivers/scsi/mpt3sas/mpt3sas_scsih-c:4023:12
index 1 is out of range for type 'MPI2_EVENT_SAS_TOPO_PHY_ENTRY [1]'

UBSAN: array-index—out-of-bounds in /home/kernel/COD/linux/drivers/scsi/mpt3sas/mpt3sas_scsih_c:6810:36
index 1 is out of range for type 'MPI2_SAS_IO_UNIT0_PHY_DATA [1]'

UBSAN: array-index—out—of-bounds in /home/kerne1/COD/linux/drivers/scsi/mpt3sas/mpt3sas_scsih_c:6598:38
index 1 is out of range for type 'MPI2_SAS_IO_UNIT0_PHY_DATA [1]'

UBSAN: array-index—out—of-bounds in /home/kerne1/COD/linux/drivers/scsi/mpt3sas/mpt3sas_scsih_c:6602:36
index 1 is out of range for type 'MPI2_SAS_IO_UNIT0_PHY_DATA [1]'

UBSAN: array-index—out—of—bounds in /home/kernel/COD/linux-drivers/scsi/mpt3sas/mpt3sas_scsih.c:6619:7
index 1 is out of range for type 'MPI2_SAS_IO_UNIT0_PHY_DATA [1]'

UBSAN: array-index—out—of—bounds in /home/kernel/COD/linux/drivers/scsi/mpt3sas/mpt3sas_scsih.c:6666:21
index 1 is out of range for type ’MPI2_SAS_IO_UNIT0_PHY_DATA [1]'

UBSAN: array-index—out—of—bounds in /home/kernel/COD/linux/drivers/scsi/mpt3sas/mpt3sas_scsih.c:7649:32
index 1 is out of range for type ’MPI2_EVENT_SAS_TOPO_PHY_ENTRY [1]'

UBSAN: array-index-out-of—bounds in /home/kernel/COD/linux/drivers/scsi/mpt3sas/mpt3sas_scsih.c:7651:23
index 1 is out of range for type ’MPI2_EVENT_SAS_TOPO_PHY_ENTRY [1]'

UBSAN: array-index-out-of-bounds in /home/kernel/COD/linux/drivers/scsi/mpt3sas/mpt3sas_scsih.c:7655:12
index 1 is out of range for type ’MPI2_EVENT_SAS_TOPO_PHY_ENTRY [1]'

[EDIT] the next error, repeating for ~1½ hours before finishing boot, was unrelated to mpt2sas and UBSAN.

That error was "Timed out for waiting the udey queue being empty."

Bug subject altered to reflect this.

Revision history for this message
Fjodor (sune-molgaard) wrote :
koba (kobako)
Changed in linux (Ubuntu):
status: New → In Progress
assignee: nobody → koba (kobako)
Revision history for this message
koba (kobako) wrote :

+202308111054
@Fjodor,
Would you please have a try this test kernel(applied 6.4-rc3 config but it's latest vanilla kernel)?
https://drive.google.com/drive/folders/1cGAWoaXAEWsWKjz92xhSVelDwS9moihK?usp=sharing

Applied,
https://<email address hidden>/T/

Revision history for this message
Fjodor (sune-molgaard) wrote :

@Koba,

I took the liberty of applying the patches myself unto the cod/mainline/v6.5-rc5 branch of git://git.launchpad.net/~ubuntu-kernel-test/ubuntu/+source/linux/+git/mainline-crack, but I can happily report that the UBSAN error messages are all gone!

With that said though, at my very first attempt to boot an earlier 6.5 RC, I let some time pass, and after some minutes, another error appeared, which I assumed was related to the UBSAN warnings.

With those gone, the system still doesn't boot, and that one error still appears, and is visible in the attached photo (the first line always appears, it is the second one that is of interest).

The message states: "Timed out for waiting the udev queue being empty.[sic!]"

Whether or not that is related, though, I do not know, but I do know that it seems to happen consistently and that it doesn't happen on this machine with 6.4.x or earlier - my laptops boot 6.5 RCs just fine, by the way.

Revision history for this message
Fjodor (sune-molgaard) wrote :

I am increasingly convinced that the "for waiting the udev" thing is unrelated the things that the mentioned patch set addresses, so please disregard that part, and draw the conclusion that the patches work as advertised.

Revision history for this message
Fjodor (sune-molgaard) wrote :

@Koba,

As just reported on <email address hidden>, the "for waiting the udev"[sic!] matter was completely unrelated.

With the patch set, the UBSAN messages are completely gone, and with the workaround reported to the list, the system boots normally.

Having learned a bit more about UBSAN, it would seem that the system would still have booted despite the messages, so I shall try to see if I can alter the heading of this report.

summary: - [mpt3sas, UBSAN] ]linux 6.5-rc won't boot
+ [mpt3sas, UBSAN] ]linux 6.5-rc give error messages at boot
description: updated
koba (kobako)
Changed in linux (Ubuntu):
status: In Progress → Incomplete
Revision history for this message
Keeley Hoek (khoek) wrote :

I have just upgraded to 23.10 beta and have a large number of these spammed to the console. Does that patch fix the problem? Is there any more information you need?/why didn't that patch go into Linux 6.5?

Revision history for this message
koba (kobako) wrote :

@Keeley, please check the comments from here
https://lore.kernel.org/lkml/202310101748.5E39C3A@keescook/

Juerg Haefliger (juergh)
tags: added: kernel-flexible-array
Revision history for this message
Juerg Haefliger (juergh) wrote :

Mainline builds are unsupported.

Changed in linux (Ubuntu):
status: Incomplete → Won't Fix
Revision history for this message
Yvo (yvo-vandoorn) wrote (last edit ):
Download full text (22.3 KiB)

Using Ubuntu 22.04's LWE (linux-generic-hwe-22.04) package, it recently was updated from 6.3.0 -> 6.5.0 and am now getting very odd output in my dmesg. It is very similar to what was seen in #2039384 but it was closed as a duplicate for this particular bug.

I'm not using mainline @juergh, so I suspect (given the use of hwe) that this will eventually make it into 24.04's development.

[ 3.133061] mpt2sas_cm0: hba_port entry: 0000000060104c9a, port: 255 is added to hba_port list
[ 3.133694] ================================================================================
[ 3.134084] UBSAN: array-index-out-of-bounds in /build/linux-hwe-6.5-q7NZ0T/linux-hwe-6.5-6.5.0/drivers/scsi/mpt3sas/mpt3sas_scsih.c:6810:36
[ 3.134823] index 1 is out of range for type 'MPI2_SAS_IO_UNIT0_PHY_DATA [1]'
[ 3.135201] CPU: 9 PID: 112 Comm: kworker/u64:1 Not tainted 6.5.0-14-generic #14~22.04.1-Ubuntu
[ 3.135578] Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS P5.50 10/17/2023
[ 3.135968] Workqueue: fw_event_mpt2sas0 _firmware_event_work [mpt3sas]
[ 3.136380] Call Trace:
[ 3.136769] <TASK>
[ 3.137139] dump_stack_lvl+0x48/0x70
[ 3.137512] dump_stack+0x10/0x20
[ 3.137875] __ubsan_handle_out_of_bounds+0xc6/0x110
[ 3.138244] _scsih_sas_host_add+0x669/0x700 [mpt3sas]
[ 3.138620] _mpt3sas_fw_work+0x753/0xbc0 [mpt3sas]
[ 3.138989] ? srso_alias_return_thunk+0x5/0x7f
[ 3.139332] ? raw_spin_rq_unlock+0x10/0x40
[ 3.139681] ? srso_alias_return_thunk+0x5/0x7f
[ 3.140024] ? finish_task_switch.isra.0+0x85/0x2a0
[ 3.140364] ? srso_alias_return_thunk+0x5/0x7f
[ 3.140701] ? __schedule+0x2d4/0x750
[ 3.141035] _firmware_event_work+0x16/0x20 [mpt3sas]
[ 3.141375] process_one_work+0x240/0x450
[ 3.141705] worker_thread+0x50/0x3f0
[ 3.142033] ? __pfx_worker_thread+0x10/0x10
[ 3.142370] kthread+0xf2/0x120
[ 3.142696] ? __pfx_kthread+0x10/0x10
[ 3.143016] ret_from_fork+0x47/0x70
[ 3.143332] ? __pfx_kthread+0x10/0x10
[ 3.143647] ret_from_fork_asm+0x1b/0x30
[ 3.143960] </TASK>
[ 3.144265] ================================================================================
[ 3.146001] mpt2sas_cm0: host_add: handle(0x0001), sas_addr(0x51866da07302e300), phys(8)
[ 3.146590] ================================================================================
[ 3.146933] UBSAN: array-index-out-of-bounds in /build/linux-hwe-6.5-q7NZ0T/linux-hwe-6.5-6.5.0/drivers/scsi/mpt3sas/mpt3sas_scsih.c:6598:38
[ 3.147648] index 1 is out of range for type 'MPI2_SAS_IO_UNIT0_PHY_DATA [1]'
[ 3.148013] CPU: 9 PID: 112 Comm: kworker/u64:1 Not tainted 6.5.0-14-generic #14~22.04.1-Ubuntu
[ 3.148385] Hardware name: To Be Filled By O.E.M. X570M Pro4/X570M Pro4, BIOS P5.50 10/17/2023
[ 3.148753] Workqueue: fw_event_mpt2sas0 _firmware_event_work [mpt3sas]
[ 3.149122] Call Trace:
[ 3.149470] <TASK>
[ 3.149803] dump_stack_lvl+0x48/0x70
[ 3.150133] dump_stack+0x10/0x20
[ 3.150449] __ubsan_handle_out_of_bounds+0xc6/0x110
[ 3.150767] _scsih_sas_host_refresh+0x51f/0x590 [mpt3sas]
[ 3.151100] _scsih_sas_topology_change_event.isra.0+0x251/0x690 [mpt3sas]
[ ...

Revision history for this message
MichaelE (michael-eitelwein) wrote :
Revision history for this message
MichaelE (michael-eitelwein) wrote :

Still persists in Linux mothership 6.5.0-25-generic #25~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Feb 20 16:09:15 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
MichaelE (michael-eitelwein) wrote :

Reading thread at https://lore.kernel.org/lkml/202311150638.3BB079EB@keescook/ - this will be fixed in 6.8 then?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.