kdump fails on big arm64 systems when offset is not specified
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kexec-tools (Ubuntu) |
Fix Released
|
Medium
|
Ioanna Alifieraki | ||
Focal |
Fix Released
|
Medium
|
Ioanna Alifieraki | ||
Jammy |
Fix Released
|
Medium
|
Ioanna Alifieraki | ||
Kinetic |
Won't Fix
|
Undecided
|
Unassigned | ||
Lunar |
Fix Released
|
Medium
|
Ioanna Alifieraki | ||
Mantic |
Fix Released
|
Medium
|
Ioanna Alifieraki | ||
linux (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Focal |
Won't Fix
|
Undecided
|
Unassigned | ||
Jammy |
Fix Released
|
Medium
|
Ioanna Alifieraki | ||
Kinetic |
Won't Fix
|
Undecided
|
Unassigned | ||
linux-hwe-5.15 (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Medium
|
Ioanna Alifieraki | ||
Jammy |
Invalid
|
Undecided
|
Unassigned | ||
Kinetic |
Invalid
|
Undecided
|
Unassigned |
Bug Description
[Impact]
kdump fails on arm64, on machines with a lot of memory when offset is not specified,
e.g when /etc/default/
GRUB_CMDLINE_
If kdump-tools.cfg specifies the offset e.g.:
GRUB_CMDLINE_
it works ok.
The reason for this is that the kernel needs to allocate memory for the crashkernel both
in low and high memory.
This is addressed in kernel 6.2.
In addition kexec-tools needs to support more than one crash kernel region.
[Fix]
To address this issue the following upstream commits are needed:
- From the kernel side:
commit a9ae89df737756d
Author: Zhen Lei <email address hidden>
Date: Wed Nov 16 20:10:44 2022 +0800
arm64: kdump: Support crashkernel=X fall back to reserve region above DMA zones
commit a149cf00b158e17
Author: Zhen Lei <email address hidden>
Date: Wed Nov 16 20:10:43 2022 +0800
arm64: kdump: Provide default size when crashkernel=Y,low is not specified
- From kexec-tools:
commit b5a34a20984c4ad
Author: Chen Zhou <email address hidden>
Date: Mon Jan 10 18:20:08 2022 +0800
arm64: support more than one crash kernel regions
Affected releases:
Jammy, Focal, Bionic
For Bionic we won't fix it as we need to backport a lot of code and the regression potential is too high.
The same applies for the Focal 5.4 kernel.
Only the Focal 5.15 hwe kernel (from Jammy) will be fixed.
[Test Plan]
You need an arm64 machine (can be a VM too) with large memory e.g. 128G.
Install linux-crashdump, configure the crash kernel size, and trigger a crash.
1) Failing scenario (crashkernel >= 4G, without offset "@<address>"):
It won't work unless the offset is specified because the memory crashkernel cannot be allocated.
With the patches applied it works as expected without having to specify the offset.
2) Working scenario (crashkernel < 4G, e.g., 'crashkernel=1G')
This must continue to work with the new patches (ie, no regressions), including patched kexec-tools on unpatched kernel (eg, 5.4 kernel on Focal).
[Regression Potential]
KERNEL 5.15 - Jammy (and Focal via the HWE kernel):
To address this problem in the 5.15 kernel we need to pull in 7 commits (see [Other] section for details.
All the commits are changing code only for arm64 architecture and only the code related to reserving the crashkernel.
This means that any regression potential will affect only the arm64 architecture and in particular the crash/kdump functionality.
However, since the reservation of the crashkernel occurs at boot up, potentially things could go wrong there as well.
kexec-tools - FOCAL:
To fix the kexec_tools in focal we need to pull in 6 commits (see [Other section for details]). They all cherry pick.
Four out of six commits touch only arm64 code. Any regression potential because of these commits would regard either crashdump or kexec functionality.
Commit cf977b1af9ec67fab adds code without altering current functionality.
Commit f4ce0706d9574aecb7 adds functionality to read elf notes. In practive it moves the code from vmcore-dmesg.c to elf_info.c so it can be used by other features.
kexec-tools - JAMMY, LUNAR, MANTIC:
Commit b5a34a20984c is pulled in, it cherry-picks. It changes only arm64 code. It enables kexec to recognise that the reserved kernel may use more than one kernel region. Things could go worng when gatherinng a crashdump.
[Other]
Commits to backport
- MANTIC:
kernel 6.3: not affected
kexec-tools:
b5a34a20984
- LUNAR:
kernel 6.2: not affected
kexec-tools:
b5a34a20984
- KINETIC: WON'T FIX
Kinetic won't be fixed as it is EOL soon.
- JAMMY:
kernel (5.15 kernel):
a9ae89df737
a149cf00b15
4890cc18f94
8f0f104e2ab
5832f1ae506
944a45abfab
d339f1584f0
kexec-tools:
b5a34a20984
- FOCAL:
Kernel 5.4: Won't fix because of high regression potential.
Kernel 5.15 (HWE): Fixed via Jammy.
kexec-tools:
b5a34a20984
2572b8d702e
cf977b1af9e
f736104f533
64c49f27d88
f4ce0706d95
description: | updated |
Changed in linux (Ubuntu Focal): | |
status: | Incomplete → Won't Fix |
Changed in kexec-tools (Ubuntu): | |
importance: | Undecided → Medium |
Changed in kexec-tools (Ubuntu Focal): | |
importance: | Undecided → Medium |
Changed in kexec-tools (Ubuntu Jammy): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Focal): | |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Jammy): | |
importance: | Undecided → Medium |
assignee: | nobody → Ioanna Alifieraki (joalif) |
Changed in linux (Ubuntu Focal): | |
assignee: | nobody → Ioanna Alifieraki (joalif) |
assignee: | Ioanna Alifieraki (joalif) → nobody |
Changed in linux (Ubuntu): | |
assignee: | nobody → Ioanna Alifieraki (joalif) |
Changed in kexec-tools (Ubuntu Jammy): | |
assignee: | nobody → Ioanna Alifieraki (joalif) |
Changed in kexec-tools (Ubuntu Focal): | |
assignee: | nobody → Ioanna Alifieraki (joalif) |
Changed in kexec-tools (Ubuntu): | |
assignee: | nobody → Ioanna Alifieraki (joalif) |
description: | updated |
Changed in linux (Ubuntu Kinetic): | |
status: | New → Won't Fix |
no longer affects: | linux (Ubuntu Mantic) |
no longer affects: | linux (Ubuntu Lunar) |
Changed in kexec-tools (Ubuntu Kinetic): | |
status: | New → Won't Fix |
description: | updated |
description: | updated |
Changed in kexec-tools (Ubuntu Lunar): | |
assignee: | nobody → Ioanna Alifieraki (joalif) |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Jammy): | |
status: | Incomplete → In Progress |
description: | updated |
Changed in linux-hwe-5.15 (Ubuntu Jammy): | |
status: | New → Invalid |
Changed in linux-hwe-5.15 (Ubuntu Kinetic): | |
status: | New → Invalid |
Changed in linux-hwe-5.15 (Ubuntu): | |
status: | New → Invalid |
Changed in linux-hwe-5.15 (Ubuntu Focal): | |
status: | New → In Progress |
assignee: | nobody → Ioanna Alifieraki (joalif) |
Changed in linux (Ubuntu): | |
assignee: | Ioanna Alifieraki (joalif) → nobody |
status: | Incomplete → Invalid |
importance: | Medium → Undecided |
Changed in linux (Ubuntu Focal): | |
importance: | Medium → Undecided |
Changed in linux-hwe-5.15 (Ubuntu Focal): | |
importance: | Undecided → Medium |
description: | updated |
summary: |
- kdump fails on arm64 when offset is not specified + kdump fails on big arm64 systems when offset is not specified |
Changed in kexec-tools (Ubuntu Focal): | |
status: | Incomplete → In Progress |
description: | updated |
Changed in linux (Ubuntu Jammy): | |
status: | In Progress → Fix Committed |
Changed in linux (Ubuntu Jammy): | |
status: | Fix Committed → Fix Released |
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 2024479
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.