can't kdump in trusty ec2 instance

Bug #1421391 reported by Chris J Arges
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
Undecided
Stefan Bader

Bug Description

[Impact]

I can't get a crash dump in an ec2 trusty instance. When it kexecs, I see the following backtrace:

[ 0.813826] ------------[ cut here ]------------
[ 0.817517] WARNING: CPU: 0 PID: 1 at /build/buildd/linux-3.13.0/arch/x86/mm/ioremap.c:102 __ioremap_caller+0x374/0x380()
[ 0.823494] Modules linked in:
[ 0.825807] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.13.0-44-generic #73-Ubuntu
[ 0.829917] Hardware name: Xen HVM domU, BIOS 4.2.amazon 12/03/2014
[ 0.833266] 0000000000000009 ffff8800362f1c18 ffffffff81720d86 0000000000000000
[ 0.838861] ffff8800362f1c50 ffffffff810677cd ffffea0000ff0640 000000000003fc19
[ 0.844463] 000000003fc19000 000000000003fc19 0000000000001000 ffff8800362f1c60
[ 0.850005] Call Trace:
[ 0.851708] [<ffffffff81720d86>] dump_stack+0x45/0x56
[ 0.854563] [<ffffffff810677cd>] warn_slowpath_common+0x7d/0xa0
[ 0.857735] [<ffffffff810678aa>] warn_slowpath_null+0x1a/0x20
[ 0.860855] [<ffffffff81056ba4>] __ioremap_caller+0x374/0x380
[ 0.864047] [<ffffffff8104b528>] ? copy_oldmem_page+0x48/0xc0
[ 0.867193] [<ffffffff81056be4>] ioremap_cache+0x14/0x20
[ 0.870123] [<ffffffff8104b528>] copy_oldmem_page+0x48/0xc0
[ 0.873223] [<ffffffff81231fd4>] read_from_oldmem.part.0+0xa4/0xe0
[ 0.876534] [<ffffffff8123222b>] elfcorehdr_read_notes+0x1b/0x20
[ 0.879797] [<ffffffff81d66809>] merge_note_headers_elf64.constprop.7+0x71/0x24a
[ 0.883949] [<ffffffff81d67188>] ? vmcore_init.part.4+0x55d/0x55d
[ 0.887380] [<ffffffff81d66dbd>] vmcore_init.part.4+0x192/0x55d
[ 0.890670] [<ffffffff81d67188>] ? vmcore_init.part.4+0x55d/0x55d
[ 0.894075] [<ffffffff81d671b9>] vmcore_init+0x31/0x33
[ 0.897022] [<ffffffff8100214a>] do_one_initcall+0xfa/0x1b0
[ 0.900121] [<ffffffff81089555>] ? parse_args+0x225/0x3f0
[ 0.903231] [<ffffffff81d360f6>] kernel_init_freeable+0x17b/0x200
[ 0.906683] [<ffffffff81d358e5>] ? do_early_param+0x88/0x88
[ 0.909814] [<ffffffff8170f250>] ? rest_init+0x80/0x80
[ 0.912777] [<ffffffff8170f25e>] kernel_init+0xe/0x130
[ 0.915648] [<ffffffff817317bc>] ret_from_fork+0x7c/0xb0
[ 0.918684] [<ffffffff8170f250>] ? rest_init+0x80/0x80
[ 0.921565] ---[ end trace 8b6e218b41648bbd ]---

[ Test Case ]

boot ec2 trusty instance
sudo apt-get install linux-crashdump
sudo sed -i 's/USE_KDUMP=0/USE_KDUMP=1/' /etc/default/kdump-tools
sudo reboot
sudo kdump-config show
echo c | sudo tee /proc/sysrq-trigger

Tags: ec2 trusty
Chris J Arges (arges)
tags: added: ec2
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1421391

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Revision history for this message
Stefan Bader (smb) wrote :

To my knowledge kexec (and thus kexec-crashdumps) do not work on Xen PVM guests. Only for HVM instance types. Someone upstream tried to enable that but it was complicated and I think it never was finished. One could obtain a dump of PVM guests using the xen toolstack but AWS does not support that.

Revision history for this message
Stefan Bader (smb) wrote :

Hm, should have looked closer. This seemed to be an HVM instance. So in theory should be able to kdump. I should check whether this works or not running on a ubuntu host.

Chris J Arges (arges)
Changed in linux (Ubuntu):
assignee: Chris J Arges (arges) → Stefan Bader (smb)
Revision history for this message
Stefan Bader (smb) wrote :

I finally had some time to play around with this locally (not on AWS, so things still might differ as there could be a dependency on the version of the Xen hyperviror as well). The default setup I used initially failed for memory issues. But then I used full server installations in a HVM guest (which brings many more modules). So tweaking the modules to include by setting

/etc/initramfs-tools/initramfs.conf:
MODULES=dep

and

/etc/default/grub.d/kexec-tools.cfg:
crashkernel=256M

may or my not be required for EC2. The main problem seemed to be related to unplugging the emulated devices (in favour of the pv drivers). The only variant that seemed to partially work for me was to use "xen_emul_unplug=never" for the normal boot. Of course this is not really ideal as this impacts normal usage performance. This also only worked as much as creating a dump but it took a bit of time since the network interface would not come up.

A RH bug suggests a slight variation which supposedly avoids using the emulated drivers. But either I mis-read the instructions or it just does not work in our environment. At least those attempts just hung like before.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=815785

Revision history for this message
Stefan Bader (smb) wrote :

For reference this seems to be a known issue when using PV drivers on HVM:

http://lists.xen.org/archives/html/xen-devel/2015-03/msg01394.html

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.