raid1: Fix NULL pointer dereference in process_checks()
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| linux (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
| Jammy |
Fix Committed
|
Undecided
|
Unassigned | ||
| Noble |
Fix Released
|
Medium
|
Matthew Ruffell | ||
| Oracular |
Fix Released
|
Medium
|
Matthew Ruffell | ||
| Plucky |
Fix Released
|
Undecided
|
Unassigned | ||
| Questing |
Fix Released
|
Undecided
|
Unassigned | ||
Bug Description
Subject: raid1: Fix NULL pointer de-reference in process_checks()
BugLink: https:/
[Impact]
A null pointer dereference was found in raid1 during failure mode testing.
A raid1 array was set up, filled with data and a check operation started. While
the check was underway, all underlying iSCSI disks were forcefully disconnected
with --failfast set to the md array, and the following kernel oops occurs:
md/raid1:: dm-0: unrecoverable I/O read error for block 527744
md/raid1:: dm-1: unrecoverable I/O read error for block 527616
md/raid1:: dm-0: unrecoverable I/O read error for block 527744
md/raid1:: dm-1: unrecoverable I/O read error for block 527616
md/raid1:: dm-1: unrecoverable I/O read error for block 527616
md/raid1:: dm-0: unrecoverable I/O read error for block 527744
md/raid1:: dm-1: unrecoverable I/O read error for block 527616
md/raid1:: dm-0: unrecoverable I/O read error for block 527744
BUG: kernel NULL pointer dereference, address: 0000000000000040
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
SMP NOPTI
CPU: 3 PID: 19372 Comm: md_1t889zmbfni_ Kdump: loaded Not tainted 6.8.0-1029-aws #31-Ubuntu
Hardware name: Amazon EC2 m6a.xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:process_
Code: 8e 19 01 00 00 48 8b 85 78 ff ff ff b9 08 00 00 00 48 8d 7d 90 49 8b 1c c4 49 63 c7 4d 8b 74 c4 50 31 c0 f3 48 ab 48 89 5d 88 <4c> 8b 53 40 45 0f b6 4e 18 49 8b 76 40 49 81 7e 38 a0 04 7c c0 75
RSP: 0018:ffffb39e81
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 0000000000000004 RDI: ffffb39e8142bd50
RBP: ffffb39e8142bd80 R08: ffff9a2e001ea000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9a2e0cd63280
R13: ffff9a2e50d1f800 R14: ffff9a2e50d1f000 R15: 0000000000000000
FS: 000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 00000001035b2004 CR4: 00000000003706f0
Call Trace:
<TASK>
? show_regs+0x6d/0x80
? __die+0x24/0x80
? page_fault_
? do_user_
? exc_page_
? asm_exc_
? process_
? process_
? srso_alias_
? ___ratelimit+
sync_request_
raid1d+0x13a/0x3f0 [raid1]
? srso_alias_
md_thread+
? __pfx_autoremov
? __pfx_md_
kthread+0xda/0x100
? __pfx_kthread+
ret_from_
? __pfx_kthread+
ret_from_
</TASK>
What happens is that process_checks() loops through all the available disks to
find a primary source with intact data, all disks are missing, and we shouldn't
move forward without having a valid primary source.
[Fix]
This was fixed in 6.15-rc3 with:
commit b7c178d9e57c8fd
Author: Meir Elisha <email address hidden>
Date: Tue Apr 8 17:38:08 2025 +0300
Subject: md/raid1: Add check for missing source disk in process_checks()
Link: https:/
This has been applied to focal, jammy and plucky already through upstream
-stable. Currently noble and oracular are lagging behind and are not up to the
-stable release with the fix.
Bug focal:
https:/
Bug jammy:
https:/
Bug plucky:
https:/
[Testcase]
You don't need to set up a full iscsi environment, you can just make some local
VMs and then forcefully remove the underlying disks using libvirt.
Create a VM, attach two scratch disks:
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vda 253:0 0 10G 0 disk
├─vda1 253:1 0 9G 0 part /
├─vda14 253:14 0 4M 0 part
├─vda15 253:15 0 106M 0 part /boot/efi
└─vda16 259:0 0 913M 0 part /boot
vdb 253:16 0 372K 0 disk
vdc 253:32 0 3G 0 disk
vdd 253:48 0 3G 0 disk
vde 253:64 0 3G 0 disk
Create a raid1 array:
$ sudo mdadm --create --failfast --verbose /dev/md0 --level=1 --raid-devices=3 /dev/vdc /dev/vdd /dev/vde
Make a filesystem:
$ sudo mkfs.xfs /dev/md0
$ sudo mkdir /mnt/disk
$ sudo mount /dev/md0 /mnt/disk
Fill scratch disks with files:
for n in {1..1000}; do dd if=/dev/urandom of=file$( printf %03d "$n" ).bin bs=1024 count=$(( RANDOM)); done
Start a check:
$ sudo mdadm --action=check /dev/md0
Use virt manager / libvirt to detach the disks, watch dmesg.
Test kernels are available in the following ppa:
https:/
If you install the test kernel, the null pointer dereference no longer occurs.
[Where problems can occur]
We are changing the logic such that if all the reads fail in process_check(),
and we have no valid primary source, then we disable recovery mode, mark an
error occurring, free the bio and exit out. Previously we would have just
continued onward and run into the null pointer dereference.
This really only affects situations where all backing disks are lost. This isn't
too uncommon though, particularly if all are network storage and network issues
occur, losing access to the disks. Things should remain as they are if at least
one primary source disk exists.
If a regression were to occur, it would affect raid1 arrays only, and only
during check/repair operations.
A workaround would be to disable check or repair operations on the md array
until the issue is fixed.
[Other info]
Upstream mailing list discussion:
V1:
https://<email address hidden>/T/
V2:
https://<email address hidden>/T/
CVE References
| description: | updated |
| Changed in linux (Ubuntu Questing): | |
| status: | New → Fix Released |
| Changed in linux (Ubuntu Plucky): | |
| status: | New → Fix Committed |
| Changed in linux (Ubuntu Oracular): | |
| status: | New → In Progress |
| Changed in linux (Ubuntu Jammy): | |
| status: | New → Fix Committed |
| Changed in linux (Ubuntu Noble): | |
| status: | New → In Progress |
| importance: | Undecided → Medium |
| Changed in linux (Ubuntu Oracular): | |
| importance: | Undecided → Medium |
| Changed in linux (Ubuntu Noble): | |
| assignee: | nobody → Matthew Ruffell (mruffell) |
| Changed in linux (Ubuntu Oracular): | |
| assignee: | nobody → Matthew Ruffell (mruffell) |
| tags: | added: sts |
| description: | updated |
| summary: |
- raid1: Fix NULL pointer de-reference in process_checks() + raid1: Fix NULL pointer dereference in process_checks() |
| Changed in linux (Ubuntu Noble): | |
| status: | In Progress → Fix Committed |
| Changed in linux (Ubuntu Oracular): | |
| status: | In Progress → Fix Committed |

Submitted to the Kernel Team mailing list:
Cover letter: /lists. ubuntu. com/archives/ kernel- team/2025- June/160217. html /lists. ubuntu. com/archives/ kernel- team/2025- June/160218. html
https:/
Patch:
https:/