commit 470193ffc9fcee9ca3eb53090cc5001f5f27980c
Author: Peng Zhang <email address hidden>
Date: Sat Dec 17 08:38:58 2022 +0800
kdump-tools: disable AER to fix kdump hung issue
This issue is detected after kernel updated from 5.10.112 version to
5.10.152 version. Bad commit is d83d886e69bd (PCI/ERR: Recover from
RCEC AER errors) which comes from linux-yocto 5.10 stable tree. It
will lead to board hang up after triggering kdump.
This issue can be reproduced on board whose name is Supermicro
A2SDi-16C-TP8F, bios version is 1.4 and build date is 01/29/2021.
We don't need pci AER functionality enabled in the kdump kernel, and it
causes some boards to hang in certain situations as kernel AER error log
shows. So we just disable it.
TEST PLAN:
PASS: build-pkgs -c -p kdump-tools
PASS: build-pkgs -c -p kdump-tools-rt
PASS: boot
PASS: on troublesome and non-troublesome platform
systemctl enable kdump-tools.service
systemctl start kdump-tools.service
echo 1 >/proc/sysrq-trigger
echo 'c' > /proc/sysrq-trigger
vmcore has been created successfully
system boots back up automatically
Reviewed: https:/ /review. opendev. org/c/starlingx /integ/ +/867738 /opendev. org/starlingx/ integ/commit/ 470193ffc9fcee9 ca3eb53090cc500 1f5f27980c
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 470193ffc9fcee9 ca3eb53090cc500 1f5f27980c
Author: Peng Zhang <email address hidden>
Date: Sat Dec 17 08:38:58 2022 +0800
kdump-tools: disable AER to fix kdump hung issue
This issue is detected after kernel updated from 5.10.112 version to
5.10.152 version. Bad commit is d83d886e69bd (PCI/ERR: Recover from
RCEC AER errors) which comes from linux-yocto 5.10 stable tree. It
will lead to board hang up after triggering kdump.
This issue can be reproduced on board whose name is Supermicro
A2SDi-16C-TP8F, bios version is 1.4 and build date is 01/29/2021.
We don't need pci AER functionality enabled in the kdump kernel, and it
causes some boards to hang in certain situations as kernel AER error log
shows. So we just disable it.
KERNEL AER ERROR LOG: 00028 E300-9A- 16CN8TP/ A2SDi-16C- TP8F, BIOS 1.4 01/29/2021 walk_bus+ 0x25/0x90 173dc8 EFLAGS: 00010282 00000 a3720 73d74 3c6e0 06328 0(0000) GS:ffff8b55bec0 0000(0000) 000000000000000 0 506f0 device+ 0x34/0x5a cold+0x89/ 0x9e allowed_ ptr+0xb6/ 0x220 irq_nosync+ 0x10/0x10 fn+0x20/ 0x60 0x104/0x1b0 oneshot. part.0+ 0xe0/0xe0 check_affinity+ 0xa0/0xa0 bind_mask+ 0x60/0x60 fork+0x22/ 0x30
[ 7.409296] pcieport 0000:00:05.0: AER: Multiple Corrected error
received: 0000:00:05.0
[ 7.417311] BUG: kernel NULL pointer dereference, address:
00000000000
[ 7.418296] #PF: supervisor read access in kernel mode
[ 7.418296] #PF: error_code(0x0000) - not-present page
[ 7.418296] PGD 0 P4D 0
[ 7.418296] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 7.418296] CPU: 0 PID: 93 Comm: irq/25-aerdrv Not tainted
5.10.0-6-amd64 #1 Debian 5.10.152-1.stx.25
[ 7.418296] Hardware name: Supermicro
SYS-
[ 7.418296] RIP: 0010:pci_
[ 7.418296] Code: 00 00 00 00 00 0f 1f 44 00 00 41 56 41 55 49 89 fd
48 c7 c7 20 37 9a 99 41 54 49 89 f4 55 48 89 d5 53 4c 89 eb e8 2b 5a 56
00 <49> 8b 7d 28 eb 1f 48 8b 47 18 48 85 c0 74 31 4c 8b 70 28 48 89 c3
[ 7.418296] RSP: 0000:ffffa60040
[ 7.418296] RAX: ffff8b553fded001 RBX: 0000000000000000 RCX:
00000000000
[ 7.418296] RDX: ffff8b553fded000 RSI: ffffffff9833c6e0 RDI:
ffffffff999
[ 7.418296] RBP: ffffa60040173e10 R08: 0000000000000002 R09:
ffffa600401
[ 7.418296] R10: 0000000000000001 R11: 0000000000000000 R12:
ffffffff983
[ 7.418296] R13: 0000000000000000 R14: 0000000000000028 R15:
ffff8b555e2
[ 7.418296] FS: 000000000000000
knlGS:
[ 7.418296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7.418296] CR2: 0000000000000028 CR3: 000000087d80a000 CR4:
00000000003
[ 7.418296] Call Trace:
[ 7.418296] find_source_
[ 7.418296] aer_isr.
[ 7.418296] ? __set_cpus_
[ 7.418296] ? disable_
[ 7.418296] irq_thread_
[ 7.418296] irq_thread+
[ 7.418296] ? irq_finalize_
[ 7.418296] ? irq_thread_
[ 7.418296] kthread+0x133/0x150
[ 7.418296] ? __kthread_
[ 7.418296] ret_from_
[ 7.418296] Modules linked in:
[ 7.418296] CR2: 0000000000000028
TEST PLAN: sysrq-trigger
PASS: build-pkgs -c -p kdump-tools
PASS: build-pkgs -c -p kdump-tools-rt
PASS: boot
PASS: on troublesome and non-troublesome platform
systemctl enable kdump-tools.service
systemctl start kdump-tools.service
echo 1 >/proc/
echo 'c' > /proc/sysrq-trigger
vmcore has been created successfully
system boots back up automatically
Closes-Bug: 1999646
Change-Id: I9ffc6e96d4b7fb d0b29a806d4d96d fc8e89dc4c6
Signed-off-by: Peng Zhang <email address hidden>