kernel crash on EC2 raring

Bug #1160543 reported by Scott Moser
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
High
Unassigned

Bug Description

I was running an EC2 instance of raring beta2, and playing with lxc. The system locked up, and I saw:

[30147690.704899] ------------[ cut here ]------------
[30147690.704912] Kernel BUG at ffffffff816c6edc [verbose debug info unavailable]
[30147690.704919] invalid opcode: 0000 [#1] SMP
[30147690.704925] Modules linked in: veth(F) ipt_MASQUERADE(F) iptable_nat(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack(F) ip_tables(F) x_tables(F) bridge(F) stp(F) llc(F) isofs(F) dm_crypt(F) ghash_clmulni_intel(F) aesni_intel(F) aes_x86_64(F) xts(F) lrw(F) gf128mul(F) ablk_helper(F) cryptd(F) microcode(F) acpiphp(F)
[30147690.704963] CPU 0
[30147690.704969] Pid: 25624, comm: find Tainted: GF 3.8.0-12-generic #21-Ubuntu
[30147690.704975] RIP: e030:[<ffffffff816c6edc>] [<ffffffff816c6edc>] vmalloc_fault+0x1dc/0x20e
[30147690.704990] RSP: e02b:ffff8800056cba30 EFLAGS: 00010046
[30147690.704995] RAX: ffff88000efeeff8 RBX: ffffe8ffffc00980 RCX: 0000000000000000
[30147690.705000] RDX: 00003ffffffff000 RSI: ffff880000000ff8 RDI: 0000000000000000
[30147690.705007] RBP: ffff8800056cba50 R08: ffff88006590a7e8 R09: 0000000000637000
[30147690.705012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88000ef23e88
[30147690.705020] R13: ffff88000efeeff8 R14: ffff880000000ff8 R15: 0000000000000000
[30147690.705030] FS: 00007f51502c4740(0000) GS:ffff880069c00000(0000) knlGS:0000000000000000
[30147690.705037] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[30147690.705042] CR2: ffffe8ffffc00980 CR3: 000000000ef23000 CR4: 0000000000002660
[30147690.705048] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[30147690.705054] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[30147690.705061] Process find (pid: 25624, threadinfo ffff8800056ca000, task ffff880065cfae80)
[30147690.705067] Stack:
[30147690.705070] 0000000000000029 0000000000000002 ffffe8ffffc00980 ffff8800056cbb68
[30147690.705080] ffff8800056cbb48 ffffffff816c720c 0000000000000200 ffff880065cfae80
[30147690.705089] 0000000000000068 ffff8800155d529c ffff8800135e8540 ffff8800056cba98
[30147690.705098] Call Trace:
[30147690.705104] [<ffffffff816c720c>] __do_page_fault+0x1ec/0x4e0
[30147690.705116] [<ffffffff81085c3a>] ? lg_local_unlock+0x1a/0x20
[30147690.705125] [<ffffffff811a888e>] ? prepend_path+0x9e/0x1f0
[30147690.705134] [<ffffffff8114cc89>] ? zone_statistics+0x99/0xc0
[30147690.705141] [<ffffffff8114cc89>] ? zone_statistics+0x99/0xc0
[30147690.705147] [<ffffffff816c750e>] do_page_fault+0xe/0x10
[30147690.705154] [<ffffffff816c3b58>] page_fault+0x28/0x30
[30147690.705162] [<ffffffff81188144>] ? mem_cgroup_charge_statistics.isra.20+0x14/0x50
[30147690.705170] [<ffffffff8118a1f0>] __mem_cgroup_uncharge_common+0xd0/0x2d0
[30147690.705178] [<ffffffff8118d73a>] mem_cgroup_uncharge_page+0x2a/0x30
[30147690.705185] [<ffffffff81162ef9>] page_remove_rmap+0x89/0x160
[30147690.705193] [<ffffffff81156d74>] unmap_page_range+0x4a4/0x750
[30147690.705200] [<ffffffff8100122a>] ? xen_hypercall_xen_version+0xa/0x20
[30147690.705207] [<ffffffff811570aa>] unmap_single_vma+0x8a/0x100
[30147690.705214] [<ffffffff81157909>] unmap_vmas+0x49/0x90
[30147690.705220] [<ffffffff8115fcf8>] exit_mmap+0x98/0x170
[30147690.705229] [<ffffffff810559d4>] mmput+0x64/0x100
[30147690.705236] [<ffffffff8105e302>] do_exit+0x242/0x9d0
[30147690.705243] [<ffffffff8119557e>] ? ____fput+0xe/0x10
[30147690.705249] [<ffffffff8105eb0f>] do_group_exit+0x3f/0xa0
[30147690.705255] [<ffffffff8105eb87>] sys_exit_group+0x17/0x20
[30147690.705263] [<ffffffff816cbb1d>] system_call_fastpath+0x1a/0x1f
[30147690.705268] Code: de 48 89 d7 e8 08 f1 fe ff 48 f7 00 01 01 00 00 49 89 c5 74 3c 4c 89 f7 48 89 de e8 f1 f0 fe ff 48 8b 38 f7 c7 01 01 00 00 75 02 <0f> 0b ff 14 25 a8 35 c2 81 48 89 c2 49 8b 7d 00 ff 14 25 a8 35
[30147690.705329] RIP [<ffffffff816c6edc>] vmalloc_fault+0x1dc/0x20e
[30147690.705336] RSP <ffff8800056cba30>
[30147690.705356] ---[ end trace ba8063f40c0aaa21 ]---
[30147690.705443] Fixing recursive fault but reboot is needed!

full log attached.
---
AlsaDevices:
 total 0
 crw-rw---T 1 root audio 116, 1 Mar 27 14:08 seq
 crw-rw---T 1 root audio 116, 33 Mar 27 14:08 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.9.1-0ubuntu1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
CurrentDmesg: [25557234.872217] init: plymouth-stop pre-start process (902) terminated with status 1
DistroRelease: Ubuntu 13.04
Ec2AMI: ami-8c841ae5
Ec2AMIManifest: ubuntu-us-east-1/images-milestone/ubuntu-raring-13.04-beta1-amd64-server-20130313.manifest.xml
Ec2AvailabilityZone: us-east-1c
Ec2InstanceType: m1.small
Ec2Kernel: aki-88aa75e1
Ec2Ramdisk: unavailable
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
MarkForUpload: True
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0
ProcModules:
 acpiphp 23954 0 - Live 0x0000000000000000 (F)
 dm_crypt 22820 0 - Live 0x0000000000000000 (F)
 isofs 39815 0 - Live 0x0000000000000000 (F)
 microcode 22881 0 - Live 0x0000000000000000 (F)
ProcVersionSignature: User Name 3.8.0-12.21-generic 3.8.2
RelatedPackageVersions:
 linux-restricted-modules-3.8.0-12-generic N/A
 linux-backports-modules-3.8.0-12-generic N/A
 linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory
Tags: raring ec2-images
Uname: Linux 3.8.0-12-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dialout dip floppy netdev plugdev video

Revision history for this message
Scott Moser (smoser) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1160543

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: raring
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Can you reproduce the crash, or was it a one time event?

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Revision history for this message
Stefan Bader (smb) wrote :

If there is a somewhat reliable reproducer, we could try some patches. I think this might be the same issue that is currently in the progress of getting patches pushed upstream [1]:

* [PATCH 1/2] x86: mm: Fix vmalloc_fault oops during lazy MMU updates.
* [PATCH 2/2] mm/x86: Patch out arch_flush_lazy_mmu_mode() when running on bare metal

[1] http://www.spinics.net/lists/stable/msg02839.html

Revision history for this message
Scott Moser (smoser) wrote : BootDmesg.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Scott Moser (smoser) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Scott Moser (smoser) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Scott Moser (smoser) wrote : UdevDb.txt

apport information

Revision history for this message
Scott Moser (smoser) wrote : UdevLog.txt

apport information

Revision history for this message
Scott Moser (smoser) wrote : WifiSyslog.txt

apport information

Revision history for this message
Scott Moser (smoser) wrote :

Unrelated to this bug, but likely something Brad's bot should be aware of.
I did:
$ apport-collect 1160543
You need to run 'sudo apt-get install python-apport' for apport-collect to work.
$ sudo apt-get install python-apport -y
$ apport-collect 1160543
ERROR: The launchpadlib Python module is not installed. This functionality is not available.
$ sudo apt-get install python-launchpadlib -y
$ apport-collect 1160543

*this* time it worked (surprisingly, 2 factor Oauth even worked through w3m!)

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.