KDump boot fails with nr_cpus=1
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
The Ubuntu-power-systems project |
Fix Released
|
High
|
Canonical Kernel Team | ||
linux (Ubuntu) |
Confirmed
|
Low
|
Unassigned | ||
makedumpfile (Ubuntu) |
Fix Released
|
High
|
Thadeu Lima de Souza Cascardo | ||
Bionic |
Fix Released
|
Undecided
|
Unassigned | ||
Disco |
Fix Released
|
Undecided
|
Unassigned | ||
Eoan |
Fix Released
|
High
|
Thadeu Lima de Souza Cascardo |
Bug Description
[Impact]
The kdump kernel will crash during its boot if booted on a CPU other than 0.
[Test case]
Trigger a crash using taskset -c X, where X is not 0 and is a present CPU. Check that the dump is successful.
echo c | sudo taskset -c 1 tee /proc/sysrq-trigger
[Regression potential]
This will cause more memory to be used by the dump kernel, which may cause OOMs during the dump. The fix is restricted to ppc64el.
== Comment: #0 - Hari Krishna Bathini - 2019-05-10 06:38:21 ==
---Problem Description---
kdump boots fails in some environments when nr_cpus=1 is passed
---uname output---
na
Machine Type = na
---Debugger---
A debugger is not configured
---Steps to Reproduce---
1. configure kdump
2. trigger crash on non-boot cpu
Expected result:
Capture dump and reboot
Actual result:
Hang in early kdump boot process after crash
Userspace tool common name: kdump-tools
The userspace tool has the following bit modes: 64-bit
Userspace rpm: kdump-tools
Userspace tool obtained from project website: na
== Comment: #1 - Hari Krishna Bathini - 2019-05-10 06:45:46 ==
Launchpad bug 1560552 added "nr_cpus=1" support on ppc64 though
this change never made it upstream as maintainer has a few apprehensions..
With 4.18 kernels, this change is dropped on Ubuntu kernels too.
With nr_cpus=1 support in kernel, kdump-tools was also updated to
use "nr_cpsu=1" by default instead of "maxcpus=1" (see launchpad
bug 1568952). This kdump-tools change has to be reverted to make
it consist with the kernel change. Note that "nr_cpus=1 change had
a issues in kdump guest environment even with "nr_cpus=1" support
for kdump in kernel. So, even not withstanding the kernel revert, it is
better to default to "maxcpus=1" on all kernel versions. So, please
revert the kdump-tools fix that went in with launchpad bug 1568952
tags: | added: architecture-ppc64le bugnameltc-177552 severity-high targetmilestone-inin--- |
Changed in ubuntu: | |
assignee: | nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) |
affects: | ubuntu → kexec-tools (Ubuntu) |
Changed in ubuntu-power-systems: | |
assignee: | nobody → Canonical Kernel Team (canonical-kernel-team) |
importance: | Undecided → High |
Changed in ubuntu-power-systems: | |
status: | New → Confirmed |
Changed in ubuntu-power-systems: | |
status: | Incomplete → Confirmed |
Changed in makedumpfile (Ubuntu Eoan): | |
status: | Confirmed → Fix Committed |
description: | updated |
description: | updated |
no longer affects: | makedumpfile (Ubuntu Cosmic) |
no longer affects: | linux (Ubuntu Cosmic) |
tags: | added: cscc |
Changed in linux (Ubuntu Disco): | |
status: | Confirmed → Invalid |
Changed in linux (Ubuntu Eoan): | |
status: | Confirmed → Invalid |
Changed in ubuntu-power-systems: | |
status: | Confirmed → Fix Committed |
Changed in makedumpfile (Ubuntu Disco): | |
status: | Fix Committed → In Progress |
Changed in ubuntu-power-systems: | |
status: | Fix Committed → In Progress |
tags: |
added: verification-done-bionic removed: verification-needed-bionic |
Changed in ubuntu-power-systems: | |
status: | In Progress → Fix Committed |
Changed in ubuntu-power-systems: | |
status: | Fix Committed → Fix Released |
tags: |
added: targetmilestone-inin18043 removed: targetmilestone-inin--- |
no longer affects: | kexec-tools (Ubuntu) |
no longer affects: | kexec-tools (Ubuntu Cosmic) |
no longer affects: | kexec-tools (Ubuntu Disco) |
no longer affects: | kexec-tools (Ubuntu Eoan) |
no longer affects: | linux (Ubuntu) |
no longer affects: | linux (Ubuntu Disco) |
no longer affects: | linux (Ubuntu Eoan) |
This is really a bug on the kernel, after and including 4.18.
This is due to a patch that we have been carrying since forever, and when the involved code changed a lot from 4.15 to 4.18, the patch was dropped, as it couldn't be easily fixed up.
Even before that happened, I tried to upstream the patch, resending it to the mailing list, but PPC maintainers wanted something different. The original author resent with some modifications, but maintainers wouldn't still apply it. As far as I remember, that patchset doesn't apply anymore after the referred changes.
I have tried to work on a different solution, considering the new code base, but didn't have much time to get a working solution.
Cascardo.