5.19.0-50 --- mdadm RAID 5 with write journal segfaults

Bug #2028826 reported by blogten
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mdadm (Ubuntu)
New
Undecided
Unassigned

Bug Description

After upgrading to kernel 5.19.0-50 for Ubuntu 22.04 LTS, up from 5.15 vintage kernels, the new kernel started pagefaulting after about 2 hours of uptime. The segfault is due to mdadm, and it relates a RAID 5 array that has a write-through journal. The RAID 5 array had 4 HDDs and a journal device being itself a 32gb RAID 1 mdadm array consisting of partitions on SSD devices. The failure details from the syslog are below.

Once this crash happens, the RAID array in question becomes unresponsive. The array cannot be stopped, and the reboot process will not complete successfully. After rebooting, mdadm will report 0 data pages and hundreds of thousands of parity pages have to be recovered from the journal. It looks like there is no data loss, but it's hard to tell obviously.

For reference, previously I had tried to use a write-back journal in the same RAID 5 array. With the earlier 5.15 vintage kernels, periodically mdadm would hang and also prevent a successful reboot of the machine. Upon restarting, mdadm would hang while trying to start the array until I cleared out the write-back journal and added a fresh one. This is similar to bugs reported in the mdadm mailing list in 2020. From that point on, I only used write-through journals that appeared to work ok. With the 5.19 kernel, the write-through journals started causing the crash described here. The present situation is similar to bugs reported in the mdadm mailing list in May of this year.

I dropped the old RAID 5 array with a write journal and switched to a RAID 6 array with an internal bitmap.

Jul 26 04:02:46 <redacted> kernel: [ 7093.186750] BUG: kernel NULL pointer dereference, address: 0000000000000155
Jul 26 04:02:46 <redacted> kernel: [ 7093.186769] #PF: supervisor read access in kernel mode
Jul 26 04:02:46 <redacted> kernel: [ 7093.186774] #PF: error_code(0x0000) - not-present page
Jul 26 04:02:46 <redacted> kernel: [ 7093.186778] PGD 0 P4D 0
Jul 26 04:02:46 <redacted> kernel: [ 7093.186785] Oops: 0000 [#1] PREEMPT SMP PTI
Jul 26 04:02:46 <redacted> kernel: [ 7093.186793] CPU: 4 PID: 5645 Comm: md126_raid5 Tainted: P OE 5.19.0-50-generic #50-Ubuntu
Jul 26 04:02:46 <redacted> kernel: [ 7093.186800] Hardware name: Supermicro X9DAi/X9DAi, BIOS 3.0 08/05/2013
Jul 26 04:02:46 <redacted> kernel: [ 7093.186804] RIP: 0010:submit_bio_noacct+0x18f/0x620
Jul 26 04:02:46 <redacted> kernel: [ 7093.186819] Code: 8b 9c eb b8 00 00 00 0f 1f 44 00 00 41 80 7c 24 14 00 79 09 f6 83 50 01 00 00 04 74 2f 41 8b 44 24 10 83 e0 01 05 54 01 00 00 <0f> b6 1c 03 80 fb 01 0f 87 2e 56 82 00 83 e3 01 74 10 4c 89 e7 e8
Jul 26 04:02:46 <redacted> kernel: [ 7093.186824] RSP: 0018:ffff9aee4fc27cb0 EFLAGS: 00010206
Jul 26 04:02:46 <redacted> kernel: [ 7093.186830] RAX: 0000000000000155 RBX: 0000000000000000 RCX: 0000000000000000
Jul 26 04:02:46 <redacted> kernel: [ 7093.186834] RDX: 0000000000040001 RSI: 0000000000000000 RDI: ffff8a76132e70b8
Jul 26 04:02:46 <redacted> kernel: [ 7093.186839] RBP: ffff9aee4fc27ce0 R08: 0000000000000000 R09: 0000000000000000
Jul 26 04:02:46 <redacted> kernel: [ 7093.186842] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a76132e70b8
Jul 26 04:02:46 <redacted> kernel: [ 7093.186846] R13: ffff8a6e57d88000 R14: ffff8a6e57bad780 R15: 0000000003fef800
Jul 26 04:02:46 <redacted> kernel: [ 7093.186851] FS: 0000000000000000(0000) GS:ffff8a759fb00000(0000) knlGS:0000000000000000
Jul 26 04:02:46 <redacted> kernel: [ 7093.186856] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 26 04:02:46 <redacted> kernel: [ 7093.186860] CR2: 0000000000000155 CR3: 0000000396a10004 CR4: 00000000001706e0
Jul 26 04:02:46 <redacted> kernel: [ 7093.186865] Call Trace:
Jul 26 04:02:46 <redacted> kernel: [ 7093.186870] <TASK>
Jul 26 04:02:46 <redacted> kernel: [ 7093.186876] submit_bio+0x40/0xf0
Jul 26 04:02:46 <redacted> kernel: [ 7093.186890] r5l_flush_stripe_to_raid+0x103/0x160 [raid456]
Jul 26 04:02:46 <redacted> kernel: [ 7093.186913] handle_active_stripes.constprop.0+0x99/0x2a0 [raid456]
Jul 26 04:02:46 <redacted> kernel: [ 7093.186928] ? md_wakeup_thread+0x2e/0x80
Jul 26 04:02:46 <redacted> kernel: [ 7093.186937] raid5d+0x377/0x5e0 [raid456]
Jul 26 04:02:46 <redacted> kernel: [ 7093.186953] ? schedule_timeout+0x122/0x160
Jul 26 04:02:46 <redacted> kernel: [ 7093.186964] md_thread+0xad/0x170
Jul 26 04:02:46 <redacted> kernel: [ 7093.186971] ? destroy_sched_domains_rcu+0x40/0x40
Jul 26 04:02:46 <redacted> kernel: [ 7093.186982] ? md_set_read_only+0xa0/0xa0
Jul 26 04:02:46 <redacted> kernel: [ 7093.186988] kthread+0xee/0x120
Jul 26 04:02:46 <redacted> kernel: [ 7093.186997] ? kthread_complete_and_exit+0x20/0x20
Jul 26 04:02:46 <redacted> kernel: [ 7093.187006] ret_from_fork+0x22/0x30
Jul 26 04:02:46 <redacted> kernel: [ 7093.187018] </TASK>
Jul 26 04:02:46 <redacted> kernel: [ 7093.187021] Modules linked in: tls intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp nvidia_uvm(POE) coretemp snd_hda_codec_hdmi nvidia_drm(POE) nvidia_modeset(POE) bfq binfmt_misc nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_hda_intel snd_intel_dspcfg kvm_intel snd_intel_sdw_acpi snd_hda_codec nvidia(POE) snd_hda_core kvm snd_hwdep snd_pcm crct10dif_pclmul ghash_clmulni_intel snd_seq_midi aesni_intel snd_seq_midi_event snd_rawmidi crypto_simd cryptd rapl snd_seq drm_kms_helper intel_cstate snd_seq_device fb_sys_fops syscopyarea sysfillrect snd_timer sysimgblt serio_raw joydev input_leds mxm_wmi snd soundcore ioatdma mac_hid sch_fq_codel msr parport_pc ppdev lp drm parport ramoops reed_solomon pstore_blk pstore_zone efi_pstore ip_tables x_tables autofs4 raid10 raid0 multipath linear hid_logitech_hidpp raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 hid_logitech_dj hid_generic

blogten (blogten)
summary: - 5.19.0-50 --- mdadm with write journal segfaults
+ 5.19.0-50 --- mdadm RAID 5 with write journal segfaults
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.