KDump boot fails with nr_cpus=1

Bug #1828597 reported by bugproxy on 2019-05-10
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
High
Canonical Kernel Team
kexec-tools (Ubuntu)
Undecided
Ubuntu on IBM Power Systems Bug Triage
Cosmic
Undecided
Unassigned
Disco
Undecided
Unassigned
Eoan
Undecided
Ubuntu on IBM Power Systems Bug Triage
linux (Ubuntu)
High
Thadeu Lima de Souza Cascardo
Disco
High
Thadeu Lima de Souza Cascardo
Eoan
High
Thadeu Lima de Souza Cascardo
makedumpfile (Ubuntu)
High
Thadeu Lima de Souza Cascardo
Bionic
Undecided
Unassigned
Disco
Undecided
Unassigned
Eoan
High
Thadeu Lima de Souza Cascardo

Bug Description

[Impact]
The kdump kernel will crash during its boot if booted on a CPU other than 0.

[Test case]
Trigger a crash using taskset -c X, where X is not 0 and is a present CPU. Check that the dump is successful.

echo c | sudo taskset -c 1 tee /proc/sysrq-trigger

[Regression potential]
This will cause more memory to be used by the dump kernel, which may cause OOMs during the dump. The fix is restricted to ppc64el.

== Comment: #0 - Hari Krishna Bathini - 2019-05-10 06:38:21 ==

---Problem Description---
kdump boots fails in some environments when nr_cpus=1 is passed

---uname output---
na

Machine Type = na

---Debugger---
A debugger is not configured

---Steps to Reproduce---
 1. configure kdump
2. trigger crash on non-boot cpu

Expected result:
Capture dump and reboot

Actual result:
Hang in early kdump boot process after crash

Userspace tool common name: kdump-tools

The userspace tool has the following bit modes: 64-bit

Userspace rpm: kdump-tools

Userspace tool obtained from project website: na

== Comment: #1 - Hari Krishna Bathini - 2019-05-10 06:45:46 ==
Launchpad bug 1560552 added "nr_cpus=1" support on ppc64 though
this change never made it upstream as maintainer has a few apprehensions..

With 4.18 kernels, this change is dropped on Ubuntu kernels too.
With nr_cpus=1 support in kernel, kdump-tools was also updated to
use "nr_cpsu=1" by default instead of "maxcpus=1" (see launchpad
bug 1568952). This kdump-tools change has to be reverted to make
it consist with the kernel change. Note that "nr_cpus=1 change had
a issues in kdump guest environment even with "nr_cpus=1" support
for kdump in kernel. So, even not withstanding the kernel revert, it is
better to default to "maxcpus=1" on all kernel versions. So, please
revert the kdump-tools fix that went in with launchpad bug 1568952

bugproxy (bugproxy) on 2019-05-10
tags: added: architecture-ppc64le bugnameltc-177552 severity-high targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kexec-tools (Ubuntu)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → High

This is really a bug on the kernel, after and including 4.18.

This is due to a patch that we have been carrying since forever, and when the involved code changed a lot from 4.15 to 4.18, the patch was dropped, as it couldn't be easily fixed up.

Even before that happened, I tried to upstream the patch, resending it to the mailing list, but PPC maintainers wanted something different. The original author resent with some modifications, but maintainers wouldn't still apply it. As far as I remember, that patchset doesn't apply anymore after the referred changes.

I have tried to work on a different solution, considering the new code base, but didn't have much time to get a working solution.

Cascardo.

Changed in linux (Ubuntu Cosmic):
importance: Undecided → High
Changed in linux (Ubuntu Disco):
importance: Undecided → High
Changed in linux (Ubuntu Eoan):
importance: Undecided → High
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Changed in linux (Ubuntu Disco):
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Changed in linux (Ubuntu Cosmic):
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
status: New → Confirmed
Changed in linux (Ubuntu Eoan):
status: New → Confirmed
Changed in linux (Ubuntu Disco):
status: New → Confirmed
Changed in kexec-tools (Ubuntu Eoan):
status: New → Invalid
Changed in kexec-tools (Ubuntu Disco):
status: New → Invalid
Changed in kexec-tools (Ubuntu Cosmic):
status: New → Invalid
Frank Heimes (fheimes) on 2019-05-10
Changed in ubuntu-power-systems:
status: New → Confirmed

------- Comment From <email address hidden> 2019-05-10 13:25 EDT-------
Right, Cascardo.

nr_cpus=1 for KDump case has never worked before on ppc64. Mahesh
tried fixing it but the maintainer had a few apprensions and rightly so.
This fix doesn't work in all cases. Say, KVM host (baremetal with SMT off)..

So, to start with, "nr_cpus=1" as default parameter instead of "maxcpus=1"
for KDump kernel on ppc64, even with the kernel fix from Mahes was not a
good idea, as it was not something that works always...

IMHO, with no alternate and foolproof way to fix this in discussion yet, we can
save the "nr_cpus=1" option as default for another day and stick with
"maxcpus=1" for now. That was the intention behind this bug..

Okay, let's consider this option on makedumpfile side. I'll test it and provide a test package by next week.

Cascardo.

Changed in makedumpfile (Ubuntu Eoan):
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
importance: Undecided → High
status: New → Confirmed

I have pushed a fix to my ppa ppa:cascardo/kdump2, for bionic, cosmic, disco and eoan.

Andrew Cloke (andrew-cloke) wrote :

Marking as "incomplete" while awaiting test results from Thadeu's PPA.

Changed in ubuntu-power-systems:
status: Confirmed → Incomplete
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-05-21 06:29 EDT-------
The change looks good. Thanks!

Changed in ubuntu-power-systems:
status: Incomplete → Confirmed
Andrew Cloke (andrew-cloke) wrote :

Based on the last comment, it looks like IBM's testing was successful and this patch is ready for SRU.
Thanks.

Changed in makedumpfile (Ubuntu Eoan):
status: Confirmed → Fix Committed

This is now on eoan-proposed. Please, verify. I will SRU to disco and cosmic soon, after it gets to eoan.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-06-24 07:37 EDT-------
This looks good. With the change, now we are defaulting to "maxpus=1"
instead of nr_cpus=1 while passing parameters to kdump kernel..

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.5-1ubuntu2

---------------
makedumpfile (1:1.6.5-1ubuntu2) eoan; urgency=medium

  [ Thadeu Lima de Souza Cascardo ]
  * Use maxcpus instead of nr_cpus on ppc64el. (LP: #1828597)
  * Reload kdump when CPU is brought online. (LP: #1828596)

 -- Thadeu Lima de Souza Cascardo <email address hidden> Fri, 14 Jun 2019 10:58:40 -0300

Changed in makedumpfile (Ubuntu Eoan):
status: Fix Committed → Fix Released
description: updated
description: updated
Manoj Iyer (manjo) on 2019-07-22
no longer affects: makedumpfile (Ubuntu Cosmic)
no longer affects: linux (Ubuntu Cosmic)
Brad Figg (brad-figg) on 2019-07-24
tags: added: cscc

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.5-1ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in makedumpfile (Ubuntu Disco):
status: New → Fix Committed
tags: added: verification-needed verification-needed-disco
Andy Whitcroft (apw) wrote :

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.5-1ubuntu1~18.04.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in makedumpfile (Ubuntu Bionic):
status: New → Fix Committed
tags: added: verification-needed-bionic

------- Comment From <email address hidden> 2019-08-29 07:44 EDT-------
With kdump-tools package version 1.6.5-1ubuntu1~18.04.2, the kdump kernel
is loaded with maxcpus=1 instead of nr_cpus=1

tags: added: verification-done-bionic
removed: verification-needed-bionic

All autopkgtests for the newly accepted makedumpfile (1:1.6.5-1ubuntu1.1) for disco have finished running.
The following regressions have been reported in tests triggered by the package:

makedumpfile/1:1.6.5-1ubuntu1.1 (s390x, ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/disco/update_excuses.html#makedumpfile

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

------- Comment From <email address hidden> 2019-08-30 02:19 EDT-------
Resolved with makedumpfile/kdump-tools version 1.6.5-1ubuntu1.1 on Disco

tags: added: verification-done verification-done-disco
removed: verification-needed verification-needed-disco
Manoj Iyer (manjo) on 2019-09-16
Changed in linux (Ubuntu Disco):
status: Confirmed → Invalid
Changed in linux (Ubuntu Eoan):
status: Confirmed → Invalid
Changed in ubuntu-power-systems:
status: Confirmed → Fix Committed
Andrew Cloke (andrew-cloke) wrote :

Even though the bionic and disco verifications were successful (thanks for verifying), these patches were bundled in a single submission with other patches (from other bugs) which could not be successfully verified. As a result, all patches have had to be removed.

Next step is to re-upload new version of makedumpfile.

Changed in makedumpfile (Ubuntu Bionic):
status: Fix Committed → In Progress
Frank Heimes (fheimes) on 2019-09-30
Changed in makedumpfile (Ubuntu Disco):
status: Fix Committed → In Progress
Changed in ubuntu-power-systems:
status: Fix Committed → In Progress

New version of makedumpfile/kdump-tools now without the fix that failed verification. It contains the fix for this bug and 3 more.

This is available for disco and bionic at ppa:cascardo/ppa. Will get it shortly to -proposed.

Cascardo.

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.5-1ubuntu1.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in makedumpfile (Ubuntu Disco):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-disco
removed: verification-done verification-done-disco
Changed in makedumpfile (Ubuntu Bionic):
status: In Progress → Fix Committed
tags: added: verification-needed-bionic
removed: verification-done-bionic
Andy Whitcroft (apw) wrote :

Hello bugproxy, or anyone else affected,

Accepted makedumpfile into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.5-1ubuntu1~18.04.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

bugproxy (bugproxy) on 2019-10-23
tags: added: verification-done-bionic
removed: verification-needed-bionic

All autopkgtests for the newly accepted makedumpfile (1:1.6.5-1ubuntu1~18.04.3) for bionic have finished running.
The following regressions have been reported in tests triggered by the package:

makedumpfile/1:1.6.5-1ubuntu1~18.04.3 (ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/bionic/update_excuses.html#makedumpfile

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

------- Comment From <email address hidden> 2019-10-25 06:58 EDT-------
On disco, with kdump-tools/makedumpfile package version 1:1.6.5-1ubuntu1.3
maxcpus=1 is used instead of nr_cpus=1 by default as expected.

On bionic, with kdump-tools/makedumpfile package version 1:1.6.5-1ubuntu1~18.04.3
maxcpus=1 is used instead of nr_cpus=1 by default as expected.

tags: added: verification-done verification-done-disco
removed: verification-needed verification-needed-disco

All autopkgtests for the newly accepted makedumpfile (1:1.6.5-1ubuntu1.3) for disco have finished running.
The following regressions have been reported in tests triggered by the package:

makedumpfile/1:1.6.5-1ubuntu1.3 (s390x, ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/disco/update_excuses.html#makedumpfile

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Frank Heimes (fheimes) on 2019-10-25
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.5-1ubuntu1.3

---------------
makedumpfile (1:1.6.5-1ubuntu1.3) disco; urgency=medium

  [ Guilherme G. Piccoli ]
  * Add kdump retry/delay mechanism when dumping over network (LP: #1681909)

  [ Thadeu Lima de Souza Cascardo ]
  * Use maxcpus instead of nr_cpus on ppc64el. (LP: #1828597)
  * ppc64: increase MAX_PHYSMEM_BITS to 2PB (LP: #1841288)

  [ Connor Kuehl ]
  * Let the kernel decide the crashkernel offset for ppc64el (LP: #1741860)

 -- Thadeu Lima de Souza Cascardo <email address hidden> Wed, 09 Oct 2019 15:33:57 -0300

Changed in makedumpfile (Ubuntu Disco):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for makedumpfile has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.5-1ubuntu1~18.04.3

---------------
makedumpfile (1:1.6.5-1ubuntu1~18.04.3) bionic; urgency=medium

  [ Guilherme G. Piccoli ]
  * Add kdump retry/delay mechanism when dumping over network (LP: #1681909)

  [ Thadeu Lima de Souza Cascardo ]
  * Use maxcpus instead of nr_cpus on ppc64el. (LP: #1828597)
  * ppc64: increase MAX_PHYSMEM_BITS to 2PB (LP: #1841288)

  [ Connor Kuehl ]
  * Let the kernel decide the crashkernel offset for ppc64el (LP: #1741860)

 -- Thadeu Lima de Souza Cascardo <email address hidden> Wed, 09 Oct 2019 15:38:08 -0300

Changed in makedumpfile (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
bugproxy (bugproxy) on 2019-11-12
tags: added: targetmilestone-inin18043
removed: targetmilestone-inin---
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers