raid1: Fix NULL pointer dereference in process_checks()

Bug #2112519 reported by Matthew Ruffell
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Jammy
Fix Committed
Undecided
Unassigned
Noble
Fix Released
Medium
Matthew Ruffell
Oracular
Fix Released
Medium
Matthew Ruffell
Plucky
Fix Released
Undecided
Unassigned
Questing
Fix Released
Undecided
Unassigned

Bug Description

Subject: raid1: Fix NULL pointer de-reference in process_checks()

BugLink: https://bugs.launchpad.net/bugs/2112519

[Impact]

A null pointer dereference was found in raid1 during failure mode testing.
A raid1 array was set up, filled with data and a check operation started. While
the check was underway, all underlying iSCSI disks were forcefully disconnected
with --failfast set to the md array, and the following kernel oops occurs:

md/raid1:: dm-0: unrecoverable I/O read error for block 527744
md/raid1:: dm-1: unrecoverable I/O read error for block 527616
md/raid1:: dm-0: unrecoverable I/O read error for block 527744
md/raid1:: dm-1: unrecoverable I/O read error for block 527616
md/raid1:: dm-1: unrecoverable I/O read error for block 527616
md/raid1:: dm-0: unrecoverable I/O read error for block 527744
md/raid1:: dm-1: unrecoverable I/O read error for block 527616
md/raid1:: dm-0: unrecoverable I/O read error for block 527744
BUG: kernel NULL pointer dereference, address: 0000000000000040
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
SMP NOPTI
CPU: 3 PID: 19372 Comm: md_1t889zmbfni_ Kdump: loaded Not tainted 6.8.0-1029-aws #31-Ubuntu
Hardware name: Amazon EC2 m6a.xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:process_checks+0x25e/0x5e0 [raid1]
Code: 8e 19 01 00 00 48 8b 85 78 ff ff ff b9 08 00 00 00 48 8d 7d 90 49 8b 1c c4 49 63 c7 4d 8b 74 c4 50 31 c0 f3 48 ab 48 89 5d 88 <4c> 8b 53 40 45 0f b6 4e 18 49 8b 76 40 49 81 7e 38 a0 04 7c c0 75
RSP: 0018:ffffb39e8142bcb8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000002 RSI: 0000000000000004 RDI: ffffb39e8142bd50
RBP: ffffb39e8142bd80 R08: ffff9a2e001ea000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff9a2e0cd63280
R13: ffff9a2e50d1f800 R14: ffff9a2e50d1f000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff9a3128780000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000040 CR3: 00000001035b2004 CR4: 00000000003706f0
Call Trace:
 <TASK>
 ? show_regs+0x6d/0x80
 ? __die+0x24/0x80
 ? page_fault_oops+0x99/0x1b0
 ? do_user_addr_fault+0x2e0/0x660
 ? exc_page_fault+0x83/0x190
 ? asm_exc_page_fault+0x27/0x30
 ? process_checks+0x25e/0x5e0 [raid1]
 ? process_checks+0x125/0x5e0 [raid1]
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? ___ratelimit+0xc7/0x130
 sync_request_write+0x1c8/0x1e0 [raid1]
 raid1d+0x13a/0x3f0 [raid1]
 ? srso_alias_return_thunk+0x5/0xfbef5
 md_thread+0xae/0x190
 ? __pfx_autoremove_wake_function+0x10/0x10
 ? __pfx_md_thread+0x10/0x10
 kthread+0xda/0x100
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x47/0x70
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>

What happens is that process_checks() loops through all the available disks to
find a primary source with intact data, all disks are missing, and we shouldn't
move forward without having a valid primary source.

[Fix]

This was fixed in 6.15-rc3 with:

commit b7c178d9e57c8fd4238ff77263b877f6f16182ba
Author: Meir Elisha <email address hidden>
Date: Tue Apr 8 17:38:08 2025 +0300
Subject: md/raid1: Add check for missing source disk in process_checks()
Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b7c178d9e57c8fd4238ff77263b877f6f16182ba

This has been applied to focal, jammy and plucky already through upstream
-stable. Currently noble and oracular are lagging behind and are not up to the
-stable release with the fix.

Bug focal:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111448
Bug jammy:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111606
Bug plucky:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2111268

[Testcase]

You don't need to set up a full iscsi environment, you can just make some local
VMs and then forcefully remove the underlying disks using libvirt.

Create a VM, attach two scratch disks:

$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
vda 253:0 0 10G 0 disk
├─vda1 253:1 0 9G 0 part /
├─vda14 253:14 0 4M 0 part
├─vda15 253:15 0 106M 0 part /boot/efi
└─vda16 259:0 0 913M 0 part /boot
vdb 253:16 0 372K 0 disk
vdc 253:32 0 3G 0 disk
vdd 253:48 0 3G 0 disk
vde 253:64 0 3G 0 disk

Create a raid1 array:

$ sudo mdadm --create --failfast --verbose /dev/md0 --level=1 --raid-devices=3 /dev/vdc /dev/vdd /dev/vde

Make a filesystem:

$ sudo mkfs.xfs /dev/md0

$ sudo mkdir /mnt/disk
$ sudo mount /dev/md0 /mnt/disk

Fill scratch disks with files:

for n in {1..1000}; do dd if=/dev/urandom of=file$( printf %03d "$n" ).bin bs=1024 count=$(( RANDOM)); done

Start a check:

$ sudo mdadm --action=check /dev/md0

Use virt manager / libvirt to detach the disks, watch dmesg.

Test kernels are available in the following ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/sf411666-test

If you install the test kernel, the null pointer dereference no longer occurs.

[Where problems can occur]

We are changing the logic such that if all the reads fail in process_check(),
and we have no valid primary source, then we disable recovery mode, mark an
error occurring, free the bio and exit out. Previously we would have just
continued onward and run into the null pointer dereference.

This really only affects situations where all backing disks are lost. This isn't
too uncommon though, particularly if all are network storage and network issues
occur, losing access to the disks. Things should remain as they are if at least
one primary source disk exists.

If a regression were to occur, it would affect raid1 arrays only, and only
during check/repair operations.

A workaround would be to disable check or repair operations on the md array
until the issue is fixed.

[Other info]

Upstream mailing list discussion:

V1:
https://<email address hidden>/T/
V2:
https://<email address hidden>/T/

description: updated
Changed in linux (Ubuntu Questing):
status: New → Fix Released
Changed in linux (Ubuntu Plucky):
status: New → Fix Committed
Changed in linux (Ubuntu Oracular):
status: New → In Progress
Changed in linux (Ubuntu Jammy):
status: New → Fix Committed
Changed in linux (Ubuntu Noble):
status: New → In Progress
importance: Undecided → Medium
Changed in linux (Ubuntu Oracular):
importance: Undecided → Medium
Changed in linux (Ubuntu Noble):
assignee: nobody → Matthew Ruffell (mruffell)
Changed in linux (Ubuntu Oracular):
assignee: nobody → Matthew Ruffell (mruffell)
tags: added: sts
description: updated
summary: - raid1: Fix NULL pointer de-reference in process_checks()
+ raid1: Fix NULL pointer dereference in process_checks()
Revision history for this message
Matthew Ruffell (mruffell) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/6.8.0-63.66 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux' to 'verification-done-noble-linux'. If the problem still exists, change the tag 'verification-needed-noble-linux' to 'verification-failed-noble-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-v2 verification-needed-noble-linux
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/6.11.0-29.29 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-oracular-linux' to 'verification-done-oracular-linux'. If the problem still exists, change the tag 'verification-needed-oracular-linux' to 'verification-failed-oracular-linux'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-oracular-linux-v2 verification-needed-oracular-linux
Changed in linux (Ubuntu Noble):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Oracular):
status: In Progress → Fix Committed
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Fix released plucky 6.14.0-20-generic

Changed in linux (Ubuntu Plucky):
status: Fix Committed → Fix Released
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Performing verification for Oracular.

My user ran 6.11.0-29-generic on an AWS instance, and connected multiple
iSCSI disks, created a raid1 array, copied some data, and then started a check
operation.

They then forcefully disconnected all the iSCSI devices, and the kernel did
not panic.

They are happy with the kernel in -proposed. Marking verified.

tags: added: verification-done-oracular-linux
removed: verification-needed-oracular-linux
Revision history for this message
Matthew Ruffell (mruffell) wrote :

Performing verification for noble.

My user ran 6.8.0-63-generic on an AWS instance, and connected multiple
iSCSI disks, created a raid1 array, copied some data, and then started a check
operation.

They then forcefully disconnected all the iSCSI devices, and the kernel did
not panic.

They are happy with the kernel in -proposed. Marking verified.

tags: added: verification-done-noble-linux
removed: verification-needed-noble-linux
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 6.11.0-29.29

---------------
linux (6.11.0-29.29) oracular; urgency=medium

  * oracular/linux: 6.11.0-29.29 -proposed tracker (LP: #2114305)

  * Packaging resync (LP: #1786013)
    - [Packaging] update variants
    - [Packaging] update annotations scripts

  * CVE-2025-37890
    - net_sched: hfsc: Fix a UAF vulnerability in class with netem as child
      qdisc
    - sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
    - net_sched: hfsc: Address reentrant enqueue adding class to eltree twice

  * raid1: Fix NULL pointer dereference in process_checks() (LP: #2112519)
    - md/raid1: Add check for missing source disk in process_checks()

  * CVE-2025-37798
    - sch_htb: make htb_qlen_notify() idempotent
    - sch_htb: make htb_deactivate() idempotent
    - sch_drr: make drr_qlen_notify() idempotent
    - sch_hfsc: make hfsc_qlen_notify() idempotent
    - sch_qfq: make qfq_qlen_notify() idempotent
    - sch_ets: make est_qlen_notify() idempotent
    - codel: remove sch->q.qlen check before qdisc_tree_reduce_backlog()

  * CVE-2025-37997
    - netfilter: ipset: fix region locking in hash types

 -- Manuel Diewald <email address hidden> Fri, 13 Jun 2025 18:31:19 +0200

Changed in linux (Ubuntu Oracular):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 6.8.0-63.66

---------------
linux (6.8.0-63.66) noble; urgency=medium

  * noble/linux: 6.8.0-63.66 -proposed tracker (LP: #2114341)

  * Packaging resync (LP: #1786013)
    - [Packaging] update variants
    - [Packaging] update annotations scripts

  * CVE-2025-37798
    - sch_htb: make htb_qlen_notify() idempotent
    - sch_htb: make htb_deactivate() idempotent
    - sch_drr: make drr_qlen_notify() idempotent
    - sch_hfsc: make hfsc_qlen_notify() idempotent
    - sch_qfq: make qfq_qlen_notify() idempotent
    - sch_ets: make est_qlen_notify() idempotent
    - codel: remove sch->q.qlen check before qdisc_tree_reduce_backlog()

  * CVE-2025-37997
    - netfilter: ipset: fix region locking in hash types

  * CVE-2025-22088
    - RDMA/erdma: Prevent use-after-free in erdma_accept_newconn()

  * CVE-2025-37890
    - net_sched: hfsc: Fix a UAF vulnerability in class with netem as child
      qdisc
    - sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()
    - net_sched: hfsc: Address reentrant enqueue adding class to eltree twice

  * raid1: Fix NULL pointer dereference in process_checks() (LP: #2112519)
    - md/raid1: Add check for missing source disk in process_checks()

 -- Manuel Diewald <email address hidden> Fri, 13 Jun 2025 16:50:07 +0200

Changed in linux (Ubuntu Noble):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-6.11/6.11.0-1012.12 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-nvidia-6.11' to 'verification-done-noble-linux-nvidia-6.11'. If the problem still exists, change the tag 'verification-needed-noble-linux-nvidia-6.11' to 'verification-failed-noble-linux-nvidia-6.11'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-nvidia-6.11-v2 verification-needed-noble-linux-nvidia-6.11
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-nvidia-tegra/6.8.0-1007.7 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-nvidia-tegra' to 'verification-done-noble-linux-nvidia-tegra'. If the problem still exists, change the tag 'verification-needed-noble-linux-nvidia-tegra' to 'verification-failed-noble-linux-nvidia-tegra'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-nvidia-tegra-v2 verification-needed-noble-linux-nvidia-tegra
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-azure/6.8.0-1033.38 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-azure' to 'verification-done-noble-linux-azure'. If the problem still exists, change the tag 'verification-needed-noble-linux-azure' to 'verification-failed-noble-linux-azure'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-azure-v2 verification-needed-noble-linux-azure
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-fips/6.8.0-72.72+fips1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-fips' to 'verification-done-noble-linux-fips'. If the problem still exists, change the tag 'verification-needed-noble-linux-fips' to 'verification-failed-noble-linux-fips'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-fips-v2 verification-needed-noble-linux-fips
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-aws-fips/6.8.0-1034.36+fips1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-aws-fips' to 'verification-done-noble-linux-aws-fips'. If the problem still exists, change the tag 'verification-needed-noble-linux-aws-fips' to 'verification-failed-noble-linux-aws-fips'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-aws-fips-v2 verification-needed-noble-linux-aws-fips
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-gcp-fips/6.8.0-1035.37+fips1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-gcp-fips' to 'verification-done-noble-linux-gcp-fips'. If the problem still exists, change the tag 'verification-needed-noble-linux-gcp-fips' to 'verification-failed-noble-linux-gcp-fips'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-gcp-fips-v2 verification-needed-noble-linux-gcp-fips
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-xilinx/6.8.0-1017.18 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-noble-linux-xilinx' to 'verification-done-noble-linux-xilinx'. If the problem still exists, change the tag 'verification-needed-noble-linux-xilinx' to 'verification-failed-noble-linux-xilinx'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: kernel-spammed-noble-linux-xilinx-v2 verification-needed-noble-linux-xilinx
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.