KDump boot fails with nr_cpus=1

Bug #1828597 reported by bugproxy on 2019-05-10
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
High
Canonical Kernel Team
kexec-tools (Ubuntu)
Status tracked in Eoan
Cosmic
Undecided
Unassigned
Disco
Undecided
Unassigned
Eoan
Undecided
Ubuntu on IBM Power Systems Bug Triage
linux (Ubuntu)
Status tracked in Eoan
Disco
High
Thadeu Lima de Souza Cascardo
Eoan
High
Thadeu Lima de Souza Cascardo
makedumpfile (Ubuntu)
Status tracked in Eoan
Bionic
Undecided
Unassigned
Disco
Undecided
Unassigned
Eoan
High
Thadeu Lima de Souza Cascardo

Bug Description

[Impact]
The kdump kernel will crash during its boot if booted on a CPU other than 0.

[Test case]
Trigger a crash using taskset -c X, where X is not 0 and is a present CPU. Check that the dump is successful.

echo c | sudo taskset -c 1 tee /proc/sysrq-trigger

[Regression potential]
This will cause more memory to be used by the dump kernel, which may cause OOMs during the dump. The fix is restricted to ppc64el.

== Comment: #0 - Hari Krishna Bathini - 2019-05-10 06:38:21 ==

---Problem Description---
kdump boots fails in some environments when nr_cpus=1 is passed

---uname output---
na

Machine Type = na

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 1. configure kdump
2. trigger crash on non-boot cpu

Expected result:
Capture dump and reboot

Actual result:
Hang in early kdump boot process after crash

Userspace tool common name: kdump-tools

The userspace tool has the following bit modes: 64-bit

Userspace rpm: kdump-tools

Userspace tool obtained from project website: na

== Comment: #1 - Hari Krishna Bathini - 2019-05-10 06:45:46 ==
Launchpad bug 1560552 added "nr_cpus=1" support on ppc64 though
this change never made it upstream as maintainer has a few apprehensions..

With 4.18 kernels, this change is dropped on Ubuntu kernels too.
With nr_cpus=1 support in kernel, kdump-tools was also updated to
use "nr_cpsu=1" by default instead of "maxcpus=1" (see launchpad
bug 1568952). This kdump-tools change has to be reverted to make
it consist with the kernel change. Note that "nr_cpus=1 change had
a issues in kdump guest environment even with "nr_cpus=1" support
for kdump in kernel. So, even not withstanding the kernel revert, it is
better to default to "maxcpus=1" on all kernel versions. So, please
revert the kdump-tools fix that went in with launchpad bug 1568952

bugproxy (bugproxy) on 2019-05-10
tags: added: architecture-ppc64le bugnameltc-177552 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kexec-tools (Ubuntu)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High

This is really a bug on the kernel, after and including 4.18.

This is due to a patch that we have been carrying since forever, and when the involved code changed a lot from 4.15 to 4.18, the patch was dropped, as it couldn't be easily fixed up.

Even before that happened, I tried to upstream the patch, resending it to the mailing list, but PPC maintainers wanted something different. The original author resent with some modifications, but maintainers wouldn't still apply it. As far as I remember, that patchset doesn't apply anymore after the referred changes.

I have tried to work on a different solution, considering the new code base, but didn't have much time to get a working solution.

Cascardo.

Changed in linux (Ubuntu Cosmic):
importance: Undecided → High
Changed in linux (Ubuntu Disco):
importance: Undecided → High
Changed in linux (Ubuntu Eoan):
importance: Undecided → High
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Changed in linux (Ubuntu Disco):
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Changed in linux (Ubuntu Cosmic):
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
status: New → Confirmed
Changed in linux (Ubuntu Eoan):
status: New → Confirmed
Changed in linux (Ubuntu Disco):
status: New → Confirmed
Changed in kexec-tools (Ubuntu Eoan):
status: New → Invalid
Changed in kexec-tools (Ubuntu Disco):
status: New → Invalid
Changed in kexec-tools (Ubuntu Cosmic):
status: New → Invalid
Changed in ubuntu-power-systems:
status: New → Confirmed

------- Comment From <email address hidden> 2019-05-10 13:25 EDT-------
Right, Cascardo.

nr_cpus=1 for KDump case has never worked before on ppc64. Mahesh
tried fixing it but the maintainer had a few apprensions and rightly so.
This fix doesn't work in all cases. Say, KVM host (baremetal with SMT off)..

So, to start with, "nr_cpus=1" as default parameter instead of "maxcpus=1"
for KDump kernel on ppc64, even with the kernel fix from Mahes was not a
good idea, as it was not something that works always...

IMHO, with no alternate and foolproof way to fix this in discussion yet, we can
save the "nr_cpus=1" option as default for another day and stick with
"maxcpus=1" for now. That was the intention behind this bug..

Okay, let's consider this option on makedumpfile side. I'll test it and provide a test package by next week.

Cascardo.

Changed in makedumpfile (Ubuntu Eoan):
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
importance: Undecided → High
status: New → Confirmed

I have pushed a fix to my ppa ppa:cascardo/kdump2, for bionic, cosmic, disco and eoan.

Andrew Cloke (andrew-cloke) wrote :

Marking as "incomplete" while awaiting test results from Thadeu's PPA.

Changed in ubuntu-power-systems:
status: Confirmed → Incomplete
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-05-21 06:29 EDT-------
The change looks good. Thanks!

Changed in ubuntu-power-systems:
status: Incomplete → Confirmed
Andrew Cloke (andrew-cloke) wrote :

Based on the last comment, it looks like IBM's testing was successful and this patch is ready for SRU.
Thanks.

Changed in makedumpfile (Ubuntu Eoan):
status: Confirmed → Fix Committed

This is now on eoan-proposed. Please, verify. I will SRU to disco and cosmic soon, after it gets to eoan.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-06-24 07:37 EDT-------
This looks good. With the change, now we are defaulting to "maxpus=1"
instead of nr_cpus=1 while passing parameters to kdump kernel..

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.5-1ubuntu2

---------------
makedumpfile (1:1.6.5-1ubuntu2) eoan; urgency=medium

  [ Thadeu Lima de Souza Cascardo ]
  * Use maxcpus instead of nr_cpus on ppc64el. (LP: #1828597)
  * Reload kdump when CPU is brought online. (LP: #1828596)

 -- Thadeu Lima de Souza Cascardo <email address hidden> Fri, 14 Jun 2019 10:58:40 -0300

Changed in makedumpfile (Ubuntu Eoan):
status: Fix Committed → Fix Released
description: updated
description: updated
Manoj Iyer (manjo) on 2019-07-22
no longer affects: makedumpfile (Ubuntu Cosmic)
no longer affects: linux (Ubuntu Cosmic)
Brad Figg (brad-figg) on 2019-07-24
tags: added: cscc

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.5-1ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in makedumpfile (Ubuntu Disco):
status: New → Fix Committed
tags: added: verification-needed verification-needed-disco
Andy Whitcroft (apw) wrote :

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.5-1ubuntu1~18.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in makedumpfile (Ubuntu Bionic):
status: New → Fix Committed
tags: added: verification-needed-bionic

------- Comment From <email address hidden> 2019-08-29 07:44 EDT-------
With kdump-tools package version 1.6.5-1ubuntu1~18.04.2, the kdump kernel
is loaded with maxcpus=1 instead of nr_cpus=1

tags: added: verification-done-bionic
removed: verification-needed-bionic

All autopkgtests for the newly accepted makedumpfile (1:1.6.5-1ubuntu1.1) for disco have finished running.
The following regressions have been reported in tests triggered by the package:

makedumpfile/1:1.6.5-1ubuntu1.1 (s390x, ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/disco/update_excuses.html#makedumpfile

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

------- Comment From <email address hidden> 2019-08-30 02:19 EDT-------
Resolved with makedumpfile/kdump-tools version 1.6.5-1ubuntu1.1 on Disco

tags: added: verification-done verification-done-disco
removed: verification-needed verification-needed-disco
Manoj Iyer (manjo) 22 hours ago
Changed in linux (Ubuntu Disco):
status: Confirmed → Invalid
Changed in linux (Ubuntu Eoan):
status: Confirmed → Invalid
Changed in ubuntu-power-systems:
status: Confirmed → Fix Committed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers