crashkernel offset prevents kernel boot

Bug #1728115 reported by Thadeu Lima de Souza Cascardo on 2017-10-27
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
Artful
Undecided
Unassigned
Bionic
Undecided
Unassigned
makedumpfile (Ubuntu)
High
Thadeu Lima de Souza Cascardo
Artful
High
Thadeu Lima de Souza Cascardo
Bionic
High
Thadeu Lima de Souza Cascardo

Bug Description

[Impact]
On Power Systems, the kernel won't boot when given a crashkernel parameter with an offset/start address of 32M. No output will be shown, giving no clue to the user why the system has not booted, or what the problem is.

[Test Case]
Installing kdump-tools on artful, then booting the system. It won't boot. With the fix, it boots and the crash kernel is reserved.

[Regression Potential]
Some Power Systems might have problems loading the kernel at this address. LP#1567539 is not really clear if PowerNV systems won't kdump when using an address different from 32M. However, it has been requested from an IBM person to test it with 128M instead, and no particular problem was shown. It's possible that there was no reason at first to use 32M, and no problems will be found on either PowerNV or other systems on the field. On the other hand, it's possible they might break kdump. But, right now, those systems won't even boot the first kernel without this change.

----

The linux kernel won't boot when crashkernel parameter tells it to load a crash kernel at 32Mi on ppc64el on artful.

This happens because the artful kernel is too big. In fact, multiple requirements on the architecture lead to that:

Kernel memory at address 0 is reserved.
crashkernel must be at first RMO, so architecture puts it at 128Mi. However, kdump-tools currently puts it at @32Mi because of bug https://bugs.launchpad.net/ubuntu/+source/makedumpfile/+bug/1567539.
PACA and LPPACA need to be at the first RMO as well, and with 2048 CPUs, they take more than 5MB and 2MiB, respectively.

With the kernel now taking around 25MB from stext to _end, the kernel can't reserve enough memory for PACA or LPPACA right after boot, and it panics.

So, right after installing kdump-tools on artful, and rebooting, the kernel won't boot, with no sign of life as we haven't even started any console. Investigation for this issue took an entire day.

The fix would be setting the loading address to 128MiB, and start reducing size of PACA and maybe remove some of the requirements for the location of PACA and crash kernel.

I would not even set the loading address of the crash kernel in the parameter itself, and leave it to the kernel to decide it, which it already does and already would put it at 128Mi.

Cascardo.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1728115

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
status: Incomplete → Triaged
Changed in makedumpfile (Ubuntu):
status: New → Triaged
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
description: updated
Changed in linux (Ubuntu):
status: Triaged → In Progress
Changed in makedumpfile (Ubuntu):
status: Triaged → In Progress
Changed in linux (Ubuntu):
status: In Progress → New

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1728115

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete

The attachment "makedumpfile_1.6.1-2ubuntu0.1.diff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]

tags: added: patch
Changed in makedumpfile (Ubuntu):
importance: Undecided → High
Changed in makedumpfile (Ubuntu Artful):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Thadeu Lima de Souza Cascardo (cascardo)
Łukasz Zemczak (sil2100) wrote :

I don't see this fix in bionic yet. Could anyone first release it there? Stable updates can only be backported if they're present in the devel series.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.2-1ubuntu1

---------------
makedumpfile (1:1.6.2-1ubuntu1) bionic; urgency=medium

  * KDUMP_CMDLINE_APPEND: add noirqdistrib to default command line. As it's
    only used by ppc64el, it's not required to be conditionally added.
    (LP: #1658733)
  * Set crashkernel for ppc64el to load at 128M instead of 32M. That allows
    larger kernels to boot. (LP: #1728115)

 -- Thadeu Lima de Souza Cascardo <email address hidden> Tue, 07 Nov 2017 12:23:33 +0000

Changed in makedumpfile (Ubuntu Bionic):
status: In Progress → Fix Released

Hello Thadeu, or anyone else affected,

Accepted makedumpfile into artful-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/makedumpfile/1:1.6.1-2ubuntu0.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-artful to verification-done-artful. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-artful. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in makedumpfile (Ubuntu Artful):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-artful
Chris Halse Rogers (raof) wrote :

This package has been waiting for testing in Artful for 90 days and is blocking the release of other makedumpfile fixes for Xenial and Artful. Please test this SRU!

bugproxy (bugproxy) on 2018-03-21
tags: added: architecture-ppc64le bugnameltc-165886 severity-medium targetmilestone-inin---
removed: verification-needed verification-needed-artful

ubuntu@artful:~$ dpkg -s kdump-tools | grep Version
Version: 1:1.6.1-2ubuntu0.1

System boots fine after installing kdump-tools. crash kernel is loaded.

ubuntu@artful:~$ dmesg | grep Reserving
[ 0.000000] Reserving 320MB of memory at 128MB for crashkernel (System RAM: 2048MB)

tags: added: verification-done verification-done-artful
bugproxy (bugproxy) on 2018-03-21
tags: removed: bugnameltc-165886 patch severity-medium verification-done verification-done-artful
tags: added: verification-done verification-done-artful
bugproxy (bugproxy) on 2018-03-21
tags: added: bugnameltc-165886 severity-medium
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package makedumpfile - 1:1.6.1-2ubuntu0.1

---------------
makedumpfile (1:1.6.1-2ubuntu0.1) artful; urgency=medium

  * KDUMP_CMDLINE_APPEND: add noirqdistrib to default command line. As it's
    only used by ppc64el, it's not required to be conditionally added.
    (LP: #1658733)
  * Set crashkernel for ppc64el to load at 128M instead of 32M. That allows
    larger kernels to boot. (LP: #1728115)

 -- Thadeu Lima de Souza Cascardo <email address hidden> Tue, 07 Nov 2017 12:23:33 +0000

Changed in makedumpfile (Ubuntu Artful):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for makedumpfile has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers