kdump-gaps: Default 192MB crashkernel reservation is never enough (kdump fails on 20.04)

Bug #1908090 reported by Eric DeVolder
38
This bug affects 4 people
Affects Status Importance Assigned to Milestone
kdump-tools (Ubuntu)
Fix Released
Medium
Unassigned
Focal
New
Undecided
Unassigned
Jammy
New
Undecided
Unassigned
Mantic
New
Undecided
Unassigned
Noble
Fix Released
Medium
Unassigned

Bug Description

When linux-crashdump (5.4.0.58.61) is enabled on Ubuntu 20.04 LTS, everything appears to be in good working order, according to "systemctl status kdump-tools" and "kdump-config status". However, upon an actual crash, the system hangs, and no crash files are produced. I've investigated and have learned that the capture kernel does indeed start, but it is unable to unpack the rootfs/initrd, and thus fails and hangs.

[ 1.070469] Trying to unpack rootfs image as initramfs...
[ 1.333182] swapper/0 invoked oom-killer: gfp_mask=0x100cc2(GFP_HIGHUSER), order=0, oom_score_adj=0
[ 1.335074] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-26-generic #30-Ubuntu
[ 1.336396] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[ 1.336396] Call Trace:
[ 1.336396] dump_stack+0x6d/0x9a
[ 1.336396] dump_header+0x4f/0x1eb
[ 1.336396] out_of_memory.part.0.cold+0x39/0x83
[ 1.336396] out_of_memory+0x6d/0xd0
...
[ 1.413202] ---[ end Kernel panic - not syncing: System is deadlocked on memory ]---

On this system with 8G of memory, the crash memory as specified on the kernel command line is "crashkernel=512M-:192M". I changed the 192M to 256M, and now kdump works.

Not sure how the 192M value is chosen, but it does not work. I think this used value used to work for 16.04 and maybe 18.04 (I didn't try), but is no longer useful for 20.04.

Revision history for this message
norman shen (jshen28) wrote :

hello, may I ask how did you capture the log? My kdump also stuck but unfortunately does not print anything. the system looks dead though. the vm got a memory size of 4GiB

Revision history for this message
Eric DeVolder (edevolde) wrote : Re: [Bug 1908090] Re: ubuntu 20.04 kdump fails

From my notes:

=====
OK, I did the following and made some forward progress:

/etc/default/grub: added
 console=ttyS0
update-grub
reboot
virsh console vm843
echo 8 > /proc/sys/kernel/printk
rm -fr /var/crash/*
sync
echo c > /proc/sysrq-trigger

I now see the capture kernel panic messages.
=====

The important item here is the additional serial console which allowed me to view the messages in an xterm and scroll back (on a vnc console, there is no such ability).
eric

________________________________
From: <email address hidden> <email address hidden> on behalf of norman shen <email address hidden>
Sent: Wednesday, June 16, 2021 1:57 AM
To: Eric DeVolder <email address hidden>
Subject: [Bug 1908090] Re: ubuntu 20.04 kdump fails

hello, may I ask how did you capture the log? My kdump also stuck but
unfortunately does not print anything. the system looks dead though. the
vm got a memory size of 4GiB

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1908090

Title:
  ubuntu 20.04 kdump fails

Status in kexec-tools package in Ubuntu:
  New

Bug description:
  When linux-crashdump (5.4.0.58.61) is enabled on Ubuntu 20.04 LTS,
  everything appears to be in good working order, according to
  "systemctl status kdump-tools" and "kdump-config status". However,
  upon an actual crash, the system hangs, and no crash files are
  produced. I've investigated and have learned that the capture kernel
  does indeed start, but it is unable to unpack the rootfs/initrd, and
  thus fails and hangs.

  [ 1.070469] Trying to unpack rootfs image as initramfs...
  [ 1.333182] swapper/0 invoked oom-killer: gfp_mask=0x100cc2(GFP_HIGHUSER), order=0, oom_score_adj=0
  [ 1.335074] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-26-generic #30-Ubuntu
  [ 1.336396] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  [ 1.336396] Call Trace:
  [ 1.336396] dump_stack+0x6d/0x9a
  [ 1.336396] dump_header+0x4f/0x1eb
  [ 1.336396] out_of_memory.part.0.cold+0x39/0x83
  [ 1.336396] out_of_memory+0x6d/0xd0
  ...
  [ 1.413202] ---[ end Kernel panic - not syncing: System is deadlocked on memory ]---

  On this system with 8G of memory, the crash memory as specified on the kernel command line is "crashkernel=512M-:192M". I changed the 192M to 256M, and now kdump works.

  Not sure how the 192M value is chosen, but it does not work. I think
  this used value used to work for 16.04 and maybe 18.04 (I didn't try),
  but is no longer useful for 20.04.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/1908090/+subscriptions

Revision history for this message
Launchpad Janitor (janitor) wrote : Re: ubuntu 20.04 kdump fails

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in kexec-tools (Ubuntu):
status: New → Confirmed
Revision history for this message
Todd Taft (taft) wrote :

Are you sure you have this bug filed against the correct package?

At a glance, it looks like you should file it against linux-meta (see https://packages.ubuntu.com/focal/linux-crashdump )

kexec-tools "provides tools to load a kernel into memory and then "reboot"
 directly into that kernel using the kexec system call, bypassing the normal
 boot process."

Revision history for this message
Eric DeVolder (edevolde) wrote : Re: [Bug 1908090] Re: ubuntu 20.04 kdump fails

I was/am unsure as to which package determines the crashkernel= settings. If it is another package, then by all means please update accordingly.
Thanks,
eric

________________________________
From: <email address hidden> <email address hidden> on behalf of Todd Taft <email address hidden>
Sent: Thursday, July 1, 2021 2:51 AM
To: Eric DeVolder <email address hidden>
Subject: [Bug 1908090] Re: ubuntu 20.04 kdump fails

Are you sure you have this bug filed against the correct package?

At a glance, it looks like you should file it against linux-meta (see
https://packages.ubuntu.com/focal/linux-crashdump )

kexec-tools "provides tools to load a kernel into memory and then "reboot"
 directly into that kernel using the kexec system call, bypassing the normal
 boot process."

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1908090

Title:
  ubuntu 20.04 kdump fails

Status in kexec-tools package in Ubuntu:
  Confirmed

Bug description:
  When linux-crashdump (5.4.0.58.61) is enabled on Ubuntu 20.04 LTS,
  everything appears to be in good working order, according to
  "systemctl status kdump-tools" and "kdump-config status". However,
  upon an actual crash, the system hangs, and no crash files are
  produced. I've investigated and have learned that the capture kernel
  does indeed start, but it is unable to unpack the rootfs/initrd, and
  thus fails and hangs.

  [ 1.070469] Trying to unpack rootfs image as initramfs...
  [ 1.333182] swapper/0 invoked oom-killer: gfp_mask=0x100cc2(GFP_HIGHUSER), order=0, oom_score_adj=0
  [ 1.335074] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.4.0-26-generic #30-Ubuntu
  [ 1.336396] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  [ 1.336396] Call Trace:
  [ 1.336396] dump_stack+0x6d/0x9a
  [ 1.336396] dump_header+0x4f/0x1eb
  [ 1.336396] out_of_memory.part.0.cold+0x39/0x83
  [ 1.336396] out_of_memory+0x6d/0xd0
  ...
  [ 1.413202] ---[ end Kernel panic - not syncing: System is deadlocked on memory ]---

  On this system with 8G of memory, the crash memory as specified on the kernel command line is "crashkernel=512M-:192M". I changed the 192M to 256M, and now kdump works.

  Not sure how the 192M value is chosen, but it does not work. I think
  this used value used to work for 16.04 and maybe 18.04 (I didn't try),
  but is no longer useful for 20.04.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/kexec-tools/+bug/1908090/+subscriptions

Revision history for this message
Adam Dorsey (adorsey) wrote : Re: ubuntu 20.04 kdump fails

Same issue here, with the same workaround for me:

Edit /etc/default/grub.d/kdump-tools.cfg

Change the crashkernel part to read
    crashkernel=512M-:256M

Revision history for this message
Mark Robson (markxr) wrote :

Yes, me too. The same fix fixed it.

affects: kexec-tools (Ubuntu) → kdump-tools (Ubuntu)
Revision history for this message
Simeon Wilkinson (swilki) wrote :

This is still an issue on Ubuntu Server 20.04.4 LTS 5.4.0.122-generic using linux-crashdump-5.4.0.122.123.

The workaround in comment #6 didn't work but did work increasing it to 512M-:1024M .

Revision history for this message
Trent Lloyd (lathiat) wrote :

I ran into this today. 192MB seems not sufficient at all - even with an LXD VM with only 1G of RAM. 384M seems to work reliably for basic empty VMs though we often find larger servers need more.

I think we need to revise this default, it has support for setting defaults based on how much RAM the host has. I think we can leverage that to try set a better default out of the box and increase it for big servers where we won't notice the lost RAM.

Changed in kdump-tools (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Trent Lloyd (lathiat)
summary: - ubuntu 20.04 kdump fails
+ Default 192MB crashkernel reservation is never enough (kdump fails on
+ 20.04)
Revision history for this message
Tyler Stachecki (tstachecki) wrote (last edit ): Re: Default 192MB crashkernel reservation is never enough (kdump fails on 20.04)

This really, really should be raised.

Not setting the default higher risks a loss of coredumps from production systems due to memory exhaustion when the host boots the kdump kernel. You would think that something like that was tested in all cases, but I did not and just got bit on what was fortunately a toy system...

If you really want to be conservative, you could even say kdump with systems <= 2GiB of RAM or something should only use 192M still (though as per Trent's observations -- it seems like the floor needs to be raised, period? or initrd needs to go on a diet?)

Trent Lloyd (lathiat)
Changed in kdump-tools (Ubuntu):
assignee: Trent Lloyd (lathiat) → nobody
Revision history for this message
Heather Lemon (hypothetical-lemon) wrote :

Is the target series Focal+?

Revision history for this message
dann frazier (dannf) wrote :

I'm open to raising the default for future releases if it is no longer reasonable. I'd say if a 512M LXD VM can not crash dump w/ 192M, we should bump it. If 512M VMs still work w/ 192M, but 1G needs more, let's bump it for the 1G: range.

However, I'm not convinced this is something we should SRU, unless you can somehow demonstrate that the default works for no one. We are literally stealing memory away from the OS here, and rebooting a VM after applying updates to find your application now OOMs would be a clear regression.

Revision history for this message
Heather Lemon (hypothetical-lemon) wrote :

Hello,

I was able to generate a crash kernel dump with crashkernel=512M-:192M and 1GB of memory.
This was with libvirt/kvm not lxd, i'm not sure if you can use containers to generate crash dumps as I got an error, but following the instructions from here: https://ubuntu.com/server/docs/kernel-crash-dump.

/var/crash/202310271409/
-rw------- 1 root whoopsie 57K Oct 27 14:09 dmesg.202310271409
-rw------- 1 root whoopsie 37M Oct 27 14:09 dump.202310271409

I will continue testing vm memory size to crashkernel= parameters.

Thanks,
Heather Lemon

summary: - Default 192MB crashkernel reservation is never enough (kdump fails on
- 20.04)
+ kdump-gaps: Default 192MB crashkernel reservation is never enough (kdump
+ fails on 20.04)
Revision history for this message
Heather Lemon (hypothetical-lemon) wrote :

As a follow-up with lengthy discussions from multiple teams. The decision to upgrade the default memory limit is being put on hold indefinitely. Testing was done following the instructions provided here [1].

Unfortunately, the Ubuntu SRU process does not allow for behavior changes once a stable release is out. In this case, this would not only change the current behavior by reducing memory available to the host OS, but raises regression potential which could lead to OOMs during or after the system (re)boot. This might lead to un-bootable systems if it gets it wrong.

- What we've considered:
There is an auto option, which "works", but is somewhat brain dead in its intelligence to get this right.

Dynamically calculating the value appears to be a promising solution but would be subject to its own issues too, including memory footprint changing due to changes in memory allocation dynamics in the network or storage drivers, which would require repeatedly updating the math used to calculate these.
However these are strongly tied to the total physical memory and memory usage by kernel (varies w/ system devices) and userspace to reach kdump-tools.target and run makedumpfile to completion.

Thank you for the summaries @setuid! @mfo!

Cheers,
Heather Lemon | hlemon | hypothetical-lemon

upstream debian values - https://salsa.debian.org/debian/kdump-tools/-/blob/master/debian/kdump-tools.grub.default
[1] how to crash dump - https://ubuntu.com/server/docs/kernel-crash-dump
previous bump request - https://git.launchpad.net/ubuntu/+source/makedumpfile/commit/?h=applied/ubuntu/focal-updates&id=62949fcafa23dbc71003271d889afbdb441fcb8d

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package kdump-tools - 1:1.10.2ubuntu1

---------------
kdump-tools (1:1.10.2ubuntu1) noble; urgency=medium

  * Merge from Debian unstable. Remaining changes:
    - Update default s390x crashkernel.
      - Install the updated zipl.conf with ucf, so users will be able to
        decide whether to pick any crashkernel changes.
    - Bump grub default (x86 amd64) crashkernel params to those used on
      arm64, ppc64le, s390x.
    - Add riscv64 build

kdump-tools (1:1.10.2) unstable; urgency=medium

  * debian/tests/crash: Support makedumpfile's flattened format,
    which file(1) reports as "Flattened kdump compressed dump" instead
    of the "Kdump compressed dump" string we were looking for.

kdump-tools (1:1.10.1) unstable; urgency=medium

  * Delete the temporary sysctl files on kernel removal. Thanks to
    Guilherme G. Piccoli.

 -- dann frazier <email address hidden> Thu, 22 Feb 2024 15:39:13 -0700

Changed in kdump-tools (Ubuntu Noble):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.