linux-crashdump doesn't actually dump and reboot

Bug #1235616 reported by Igor Galić on 2013-10-05
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
kexec-tools (Ubuntu)
Undecided
Unassigned

Bug Description

In our infrastructure we have a large number of virtual machines running Ubuntu 12.04 (and some that are running 10.04), as well as a couple of CentOS machines.
From time to time they will panic and we'll be left with nothing but a screenshot, which makes for poor debugging. I pushed forward to install linux-crashdump (or crash on CentOS) by default.
Following this documentation on how to enable crash dumps: https://help.ubuntu.com/lts/serverguide/kernel-crash-dump.html I've done the testing myself, on both libvirt/qemu, virtualbox, but also on actual hardware (my laptop, which is the yet unreleased running Ubuntu 13.04).

The results have been rather sobering. After initiating a panic, the machines shows a back-trace and then just hangs. As the documentation says that the crash-dump can take some time, I've left the machines for several hours in this state with no change.
I've followed several cues, documenting my journey in this ask.ubuntu answer: http://askubuntu.com/questions/310885/kernel-dump-using-linux-crashdump-under-vmware — The testing on my own laptop has led to seeing the first bluescreen in over ten years: http://i.imgur.com/oAwEJZ8.jpg

My final tests were with CentOS, since we only have a handful of machines running CentOS I put a lower priority on testing/implementing that. However, following RHEL's documentation ( https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Deployment_Guide/ch-kdump.html ) I managed to get crash/dump/reboot and crashdump on the very first try.
By this token I'm assuming it's not entirely my fault or my systems' fault, but perhaps is more specific to Debian.

Igor Galić (i.galic) wrote :

s/Debian/Ubuntu/ — I haven't actually tested this on Debian! (I don't have any debian systems.)

Igor Galić (i.galic) wrote :

Btw, just for completenes I have tested this on Debian now too, there the crash kernel crashes. http://i.imgur.com/amCrAZW.png

Andreas Ntaflos (daff) wrote :

Just about the same here on a few generic Ubuntu 12.04.2 VMs (using KVM). Followed the documentation on https://wiki.ubuntu.com/Kernel/CrashdumpRecipe and https://help.ubuntu.com/lts/serverguide/kernel-crash-dump.html which basically says: install linux-crashdump, reboot, check /proc/cmdline, make sure `/proc/sys/kernel/sysrq` is enabled and then "echo c > /proc/sysrq_trigger". This certainly crashes the VM but it just stays crashed: http://i.imgur.com/amCrAZW.png

After forcing a reboot using virt-manager the machine boots fine but /var/crash is empty. No crash dumps, no nothing.

What gives?

Andreas Ntaflos (daff) wrote :

Very interestingly, when setting `crashkernel=128M` instead of the default `crashkernel=384M-2G:64M,2G-:128M`, crashing the machine works correctly and also leaves a crashdump in /var/crash. Unfortunately setting the crashkernel parameter requires editing /etc/grub.d/10_linux directly and can not be overridden by means of /etc/default/grub.

Andreas Ntaflos (daff) wrote :

Looking further this memory problem is already tracked in bug #785394, untouched for almost two years now.

So this bug here should serve as a reminder that the official documentation (https://help.ubuntu.com/lts/serverguide/kernel-crash-dump.html) is wrong. It should at least contain links to the bugs affecting linux-crashdump, like https://wiki.ubuntu.com/Kernel/CrashdumpRecipe#Release_specific_notes does, because when just following the documentation the crashdump functionality just does not work.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in kexec-tools (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers