MD RAID 6 Periodic Kernel Panic Stack Overflow Double-Fault
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
mdadm (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Hello:
Every few days I get a kernel panic on my Ubuntu Server 20.10 box, which was recently upgraded to a Ryzen 3700X. I have 7 WD Red Pro HDDs in a RAID 6 array with Linux MD, and they're all attached to a LSI 9211-8ik PCIe card. Motherboard is currently a Gigabyte B550M Aorus Pro. My Ubuntu install is running the latest 5.8.0-53 kernel.
This is the 2nd hardware configuration with the exact same kernel panic text. Previously I had these HDDs directly connected to the SATA controller of a ASRock X570 Pro4 ATX mobo with the same 3700X. I was also previously using Ubuntu Server 20.04 LTS -- I had upgraded to 20.10 in hopes that the newer kernel would fix it, which it did not.
I had posted a whole story on StackOverflow about this journey if you're interested: https:/
However, I am now convinced this is a Linux kernel bug in the MD driver.
Example 1 kernel panic:
[406005.583315] BUG: stack guard page was hit at 000000007cbff150 (stack is 000000003b7072a
[406005.583315] kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI
[406005.583315] CPU: 15 PID: 514 Comm: md0_raid6 Tainted: P OE 5.8.0-36-generic #40-Ubuntu
[406005.583316] Hardware name: Gigabyte Technology Co., Ltd. B550M AORUS PRO/B550M AORUS PRO, BIOS F1 05/19/2020
[406005.583316] RIP: 0010:slab_
[406005.583316] Code: 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 02 4c 8b 36 48 c7 06 00 00 00 00 48 c7 02 00 00 00 00 48 85 c0 49 0f 44 c6 <48> 89 45 d0 eb 06 4c 3b 7d d0 74 5d 8b 53 20 4d 89 f7 49 8d 34 16
[406005.583316] RSP: 0018:ffffa620c0
[406005.583317] RAX: ffff9aaf36f54720 RBX: ffff9ab34b407800 RCX: 0000000000000001
[406005.583317] RDX: ffffa620c06e4040 RSI: ffffa620c06e4038 RDI: ffff9ab34b407800
[406005.583317] RBP: ffffa620c06e4028 R08: 0000000000000001 R09: ffffffffb9c54500
[406005.583318] R10: ffff9aaf36f54fe0 R11: 0000000000000001 R12: ffffa620c06e4038
[406005.583318] R13: ffffa620c06e4040 R14: ffff9aaf36f54720 R15: ffff9ab2925cbd10
[406005.583318] FS: 000000000000000
[406005.583318] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[406005.583318] CR2: ffffa620c06e3fe8 CR3: 00000005d52ac000 CR4: 0000000000340ee0
[406005.583319] Call Trace:
[406005.583319] ? mempool_
[406005.583319] ? kfree+0xb8/0x220
[406005.583319] ? mempool_
[406005.583319] ? mempool_
[406005.583319] ? md_end_io+0x4b/0x70
[406005.583319] ? bio_endio+
Example 2 kernel panic with old mobo:
[161342.301305] BUG: stack guard page was hit at 00000000fc60f228 (stack is 00000000875efe7
[161342.301306] kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI
[161342.301306] CPU: 10 PID: 465 Comm: md0_raid6 Tainted: P OE 5.8.0-33-generic #36-Ubuntu
[161342.301307] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570 Pro4, BIOS P3.60 12/01/2020
[161342.301307] RIP: 0010:slab_
[161342.301308] Code: 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 02 4c 8b 36 48 c7 06 00 00 00 00 48 c7 02 00 00 00 00 48 85 c0 49 0f 44 c6 <48> 89 45 d0 eb 06 4c 3b 7d d0 74 5d 8b 53 20 4d 89 f7 49 8d 34 16
[161342.301308] RSP: 0018:ffffa86b00
[161342.301309] RAX: ffff98edc21cac40 RBX: ffff98ef0b407800 RCX: 0000000000000001
[161342.301310] RDX: ffffa86b00c70040 RSI: ffffa86b00c70038 RDI: ffff98ef0b407800
[161342.301310] RBP: ffffa86b00c70028 R08: 0000000000000001 R09: ffffffff85854500
[161342.301311] R10: ffff98edc21ca100 R11: 0000000000000001 R12: ffffa86b00c70038
[161342.301311] R13: ffffa86b00c70040 R14: ffff98edc21cac40 R15: ffff98e9b53d74d8
[161342.301311] FS: 000000000000000
[161342.301312] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[161342.301312] CR2: ffffa86b00c6ffe8 CR3: 00000007fa766000 CR4: 0000000000340ee0
[161342.301312] Call Trace:
[161342.301313] ? mempool_
[161342.301313] ? kfree+0xb8/0x220
[161342.301313] ? mempool_
[161342.301313] ? mempool_
[161342.301314] ? md_end_io+0x4b/0x70
[161342.301314] ? bio_endio+
[161342.301314] ? bio_chain_
[161342.301315] ? md_end_io+0x5d/0x70
[161342.301315] ? bio_endio+
[161342.301315] ? bio_chain_
[161342.301315] ? md_end_io+0x5d/0x70
[161342.301316] ? bio_endio+
[161342.301316] ? bio_chain_
[161342.301316] ? md_end_io+0x5d/0x70
[161342.301316] ? bio_endio+
[161342.301317] ? bio_chain_
[161342.301317] ? md_end_io+0x5d/0x70
[161342.301317] ? bio_endio+
[161342.301317] ? bio_chain_
...
[161342.301379] ? md_end_io+0x5d/0x70
[161342.301379] ? bio_endio+
[161342.301380] ? bio_chain_
[161342.301380] ? md_end_io+0x5d/0x70
[161342.301380] ? bio_endio+
[161342.301380] ? bio_ch
[161342.301381] Lost 296 message(s)!
[ 0.000000] Linux version 5.8.0-33-generic (buildd@
I can provide newer kernel panics or other info if needed. Thanks!
ProblemType: Bug
DistroRelease: Ubuntu 20.10
Package: mdadm 4.1-5ubuntu5
ProcVersionSign
Uname: Linux 5.8.0-53-generic x86_64
NonfreeKernelMo
ApportVersion: 2.20.11-0ubuntu50.5
Architecture: amd64
CasperMD5CheckR
Date: Tue May 25 12:11:44 2021
InstallationDate: Installed on 2020-11-23 (182 days ago)
InstallationMedia: Ubuntu-Server 20.10 "Groovy Gorilla" - Release amd64 (20201022)
MachineType: Gigabyte Technology Co., Ltd. B550M AORUS PRO
ProcEnviron:
TERM=screen-
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=
SourcePackage: mdadm
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 05/19/2020
dmi.bios.release: 5.17
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: F1
dmi.board.
dmi.board.name: B550M AORUS PRO
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.
dmi.modalias: dmi:bvnAmerican
dmi.product.family: Default string
dmi.product.name: B550M AORUS PRO
dmi.product.sku: Default string
dmi.product.
dmi.sys.vendor: Gigabyte Technology Co., Ltd.
etc.blkid.tab: Error: [Errno 2] No such file or directory: '/etc/blkid.tab'
mtime.conffile.
Hey so this is totally still happening on kernel 5.8.0-53. Just got this serial console capture:
babylon login: [1457468.880947] BUG: stack guard page was hit at 000000007aef1a4a (stack is 00000000af9c61c d..000000007ccd a653) free_freelist_ hook+0x35/ 0x120 5efff8 EFLAGS: 00010246 0(0000) GS:ffff9bfcdeac 0000(0000) knlGS:000000000 0000000 kfree+0xe/ 0x10 kfree+0xe/ 0x10 free+0x2f/ 0x80 0xe6/0x150 endio+0x2d/ 0x40 0xe6/0x150 endio+0x2d/ 0x40 0xe6/0x150 endio+0x2d/ 0x40 0xe6/0x150 endio+0x2d/ 0x40 0xe6/0x150 endio+0x2d/ 0x40 0xe6/0x150 endio+0x2d/ 0x40 0xe6/0x150 endio+0x2d/ 0x40 0xe6/0x150 endio+0x2d/ 0x40 0xe6/0x150 endio+0x2d/ 0x40 0xe6/0x150 endio+0x2d/ 0x40 0xe6/0x150 endio+0x2d/ 0x40 0xe6/0x150
[1457468.880948] kernel stack overflow (double-fault): 0000 [#1] SMP NOPTI
[1457468.880948] CPU: 3 PID: 512 Comm: md0_raid6 Tainted: P OE 5.8.0-53-generic #60-Ubuntu
[1457468.880949] Hardware name: Gigabyte Technology Co., Ltd. B550M AORUS PRO/B550M AORUS PRO, BIOS F13h 04/23/2021
[1457468.880949] RIP: 0010:slab_
[1457468.880950] Code: 89 d5 41 54 49 89 f4 53 48 89 fb 48 83 ec 08 48 8b 02 4c 8b 36 48 c7 06 00 00 00 00 48 c7 02 00 00 00 00 48 85 c0 49 0f 44 c6 <48> 89 45 d0 eb 06 4c 3b 7d d0 74 5d 8b 53 20 4d 89 f7 49 8d 34 16
[1457468.880951] RSP: 0018:ffffbcda80
[1457468.880952] RAX: ffff9bfb8ccc42a0 RBX: ffff9bfcdb407800 RCX: 0000000000000001
[1457468.880952] RDX: ffffbcda805f0040 RSI: ffffbcda805f0038 RDI: ffff9bfcdb407800
[1457468.880953] RBP: ffffbcda805f0028 R08: 0000000000000001 R09: ffffffff90841600
[1457468.880953] R10: ffff9bfb8ccc4f40 R11: 0000000000000001 R12: ffffbcda805f0038
[1457468.880953] R13: ffffbcda805f0040 R14: ffff9bfb8ccc42a0 R15: ffff9bf766967940
[1457468.880954] FS: 000000000000000
[1457468.880954] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1457468.880955] CR2: ffffbcda805effe8 CR3: 00000003a65ee000 CR4: 0000000000340ee0
[1457468.880955] Call Trace:
[1457468.880955] ? mempool_
[1457468.880956] ? kfree+0xb8/0x220
[1457468.880956] ? mempool_
[1457468.880956] ? mempool_
[1457468.880956] ? md_end_io+0x4b/0x70
[1457468.880957] ? bio_endio+
[1457468.880957] ? bio_chain_
[1457468.880957] ? md_end_io+0x5d/0x70
[1457468.880958] ? bio_endio+
[1457468.880958] ? bio_chain_
[1457468.880958] ? md_end_io+0x5d/0x70
[1457468.880959] ? bio_endio+
[1457468.880959] ? bio_chain_
[1457468.880959] ? md_end_io+0x5d/0x70
[1457468.880959] ? bio_endio+
[1457468.880960] ? bio_chain_
[1457468.880960] ? md_end_io+0x5d/0x70
[1457468.880960] ? bio_endio+
[1457468.880960] ? bio_chain_
[1457468.880961] ? md_end_io+0x5d/0x70
[1457468.880961] ? bio_endio+
[1457468.880961] ? bio_chain_
[1457468.880962] ? md_end_io+0x5d/0x70
[1457468.880962] ? bio_endio+
[1457468.880962] ? bio_chain_
[1457468.880962] ? md_end_io+0x5d/0x70
[1457468.880963] ? bio_endio+
[1457468.880963] ? bio_chain_
[1457468.880963] ? md_end_io+0x5d/0x70
[1457468.880963] ? bio_endio+
[1457468.880964] ? bio_chain_
[1457468.880964] ? md_end_io+0x5d/0x70
[1457468.880964] ? bio_endio+
[1457468.880965] ? bio_chain_
[1457468.880965] ? md_end_io+0x5d/0x70
[1457468.880965] ? bio_endio+
[1457468.880965] ? bio_chain_
[1457468.880966] ? md_end_io+0x5d/0x70
[1457468.880966] ? bio_endio+
[1457...