Ubuntu
linux package

can't kdump in trusty ec2 instance

Bug #1421391 reported by Chris J Arges on 2015-02-12

This bug affects 6 people

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Incomplete	Undecided	Stefan Bader

Bug Description

[Impact]

I can't get a crash dump in an ec2 trusty instance. When it kexecs, I see the following backtrace:

[ 0.813826] ------------[ cut here ]------------
[ 0.817517] WARNING: CPU: 0 PID: 1 at /build/buildd/linux-3.13.0/arch/x86/mm/ioremap.c:102 __ioremap_caller+0x374/0x380()
[ 0.823494] Modules linked in:
[ 0.825807] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-44-generic #73-Ubuntu
[ 0.829917] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/03/2014
[ 0.833266] 0000000000000009 ffff8800362f1c18 ffffffff81720d86 0000000000000000
[ 0.838861] ffff8800362f1c50 ffffffff810677cd ffffea0000ff0640 000000000003fc19
[ 0.844463] 000000003fc19000 000000000003fc19 0000000000001000 ffff8800362f1c60
[ 0.850005] Call Trace:
[ 0.851708] [<ffffffff81720d86>] dump_stack+0x45/0x56
[ 0.854563] [<ffffffff810677cd>] warn_slowpath_common+0x7d/0xa0
[ 0.857735] [<ffffffff810678aa>] warn_slowpath_null+0x1a/0x20
[ 0.860855] [<ffffffff81056ba4>] __ioremap_caller+0x374/0x380
[ 0.864047] [<ffffffff8104b528>] ? copy_oldmem_page+0x48/0xc0
[ 0.867193] [<ffffffff81056be4>] ioremap_cache+0x14/0x20
[ 0.870123] [<ffffffff8104b528>] copy_oldmem_page+0x48/0xc0
[ 0.873223] [<ffffffff81231fd4>] read_from_oldmem.part.0+0xa4/0xe0
[ 0.876534] [<ffffffff8123222b>] elfcorehdr_read_notes+0x1b/0x20
[ 0.879797] [<ffffffff81d66809>] merge_note_headers_elf64.constprop.7+0x71/0x24a
[ 0.883949] [<ffffffff81d67188>] ? vmcore_init.part.4+0x55d/0x55d
[ 0.887380] [<ffffffff81d66dbd>] vmcore_init.part.4+0x192/0x55d
[ 0.890670] [<ffffffff81d67188>] ? vmcore_init.part.4+0x55d/0x55d
[ 0.894075] [<ffffffff81d671b9>] vmcore_init+0x31/0x33
[ 0.897022] [<ffffffff8100214a>] do_one_initcall+0xfa/0x1b0
[ 0.900121] [<ffffffff81089555>] ? parse_args+0x225/0x3f0
[ 0.903231] [<ffffffff81d360f6>] kernel_init_freeable+0x17b/0x200
[ 0.906683] [<ffffffff81d358e5>] ? do_early_param+0x88/0x88
[ 0.909814] [<ffffffff8170f250>] ? rest_init+0x80/0x80
[ 0.912777] [<ffffffff8170f25e>] kernel_init+0xe/0x130
[ 0.915648] [<ffffffff817317bc>] ret_from_fork+0x7c/0xb0
[ 0.918684] [<ffffffff8170f250>] ? rest_init+0x80/0x80
[ 0.921565] ---[ end trace 8b6e218b41648bbd ]---

[ Test Case ]

boot ec2 trusty instance
sudo apt-get install linux-crashdump
sudo sed -i 's/USE_KDUMP=0/USE_KDUMP=1/' /etc/default/kdump-tools
sudo reboot
sudo kdump-config show
echo c | sudo tee /proc/sysrq-trigger

Tags:

Chris J Arges (arges) on 2015-02-12

tags:

added: ec2

Revision history for this message

Brad Figg (brad-figg) wrote on 2015-02-12: Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1421391

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete
tags:	added: trusty

Revision history for this message

Stefan Bader (smb) wrote on 2015-02-13:

To my knowledge kexec (and thus kexec-crashdumps) do not work on Xen PVM guests. Only for HVM instance types. Someone upstream tried to enable that but it was complicated and I think it never was finished. One could obtain a dump of PVM guests using the xen toolstack but AWS does not support that.

Revision history for this message

Stefan Bader (smb) wrote on 2015-02-17:

Hm, should have looked closer. This seemed to be an HVM instance. So in theory should be able to kdump. I should check whether this works or not running on a ubuntu host.

Chris J Arges (arges) on 2015-02-20

Changed in linux (Ubuntu):
assignee:	Chris J Arges (arges) → Stefan Bader (smb)

Revision history for this message

Stefan Bader (smb) wrote on 2015-03-11:

I finally had some time to play around with this locally (not on AWS, so things still might differ as there could be a dependency on the version of the Xen hyperviror as well). The default setup I used initially failed for memory issues. But then I used full server installations in a HVM guest (which brings many more modules). So tweaking the modules to include by setting

/etc/initramfs-tools/initramfs.conf:
MODULES=dep

and

/etc/default/grub.d/kexec-tools.cfg:
crashkernel=256M

may or my not be required for EC2. The main problem seemed to be related to unplugging the emulated devices (in favour of the pv drivers). The only variant that seemed to partially work for me was to use "xen_emul_unplug=never" for the normal boot. Of course this is not really ideal as this impacts normal usage performance. This also only worked as much as creating a dump but it took a bit of time since the network interface would not come up.

A RH bug suggests a slight variation which supposedly avoids using the emulated drivers. But either I mis-read the instructions or it just does not work in our environment. At least those attempts just hung like before.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=815785