We have a number of small and large instances running the release version of 12.04. The small instances have been completely stable. However, every large instance we have has crashed at a seemingly random interval. This is repeatable on individual systems, though not within a defined time period. It appears to be triggered by our half hourly run of OpsCode's chef-client. We tried running the client in a tight loop to recreate the crash but were unable to get it to do so in a short time period. It still took two days to crash again.
This was affecting the 3.2.0-23-virtual kernel, so we updated to the 3.2.0-24-virtual kernel but still have found the same crash. The only information available in the system logs is:
The stack trace is identical between the two kernels. I am unable to find any reference to this on Ubuntu, Xen, or kernel forums or mailing lists but it's repeatable even on freshly installed m1.large instances on EC2.
We have a number of small and large instances running the release version of 12.04. The small instances have been completely stable. However, every large instance we have has crashed at a seemingly random interval. This is repeatable on individual systems, though not within a defined time period. It appears to be triggered by our half hourly run of OpsCode's chef-client. We tried running the client in a tight loop to recreate the crash but were unable to get it to do so in a short time period. It still took two days to crash again.
This was affecting the 3.2.0-23-virtual kernel, so we updated to the 3.2.0-24-virtual kernel but still have found the same crash. The only information available in the system logs is:
[17605315.391128] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 7f1>] rb_next+0x1/0x50 ffffffff8130d7f 1>] [<ffffffff8130d 7f1>] rb_next+0x1/0x50 659c18 EFLAGS: 00010046 0(0000) GS:ffff8801dff8 f000(0000) knlGS:000000000 0000000 ce9>] ? pick_next_ entity+ 0xb9/0xe0 4b8>] pick_next_ task_fair+ 0x38/0x70 ddc>] __schedule+ 0x14c/0x6f0 4ee>] ? _raw_spin_ unlock_ irqrestore+ 0x1e/0x30 44f>] schedule+0x3f/0x60 119>] pipe_wait+0x59/0x80 340>] ? add_wait_ queue+0x60/ 0x60 87a>] pipe_read+ 0x1da/0x330 522>] do_sync_ read+0xd2/ 0x110 25d>] ? xen_force_ evtchn_ callback+ 0xd/0x10 a32>] ? check_events+ 0x12/0x20 d33>] ? security_ file_permission +0x93/0xb0 9a1>] ? rw_verify_ area+0x61/ 0xf0 e80>] vfs_read+0xb0/0x180 f9a>] sys_read+0x4a/0x90 8c2>] system_ call_fastpath+ 0x16/0x1b 7f1>] rb_next+0x1/0x50
[17605315.391148] IP: [<ffffffff8130d
[17605315.391163] PGD 1d2fdc067 PUD 1d0e3c067 PMD 0
[17605315.391172] Oops: 0000 [#1] SMP
[17605315.391179] CPU 1
[17605315.391182] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables isofs acpiphp
[17605315.391209]
[17605315.391214] Pid: 28794, comm: chef-client Not tainted 3.2.0-23-virtual #36-Ubuntu
[17605315.391223] RIP: e030:[<
[17605315.391232] RSP: e02b:ffff8801d2
[17605315.391238] RAX: 0000000000000000 RBX: ffff8801d2eb5a00 RCX: 0000000000000000
[17605315.391244] RDX: fffffffffffffff0 RSI: 0000000000000000 RDI: 0000000000000010
[17605315.391250] RBP: ffff8801d2659c48 R08: 0000000000000000 R09: 0000000000000000
[17605315.391255] R10: ffff8801dff866c0 R11: 0000000000000001 R12: 0000000000000000
[17605315.391263] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000033b9e28
[17605315.391274] FS: 00007fee8cc1070
[17605315.391281] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[17605315.391287] CR2: 0000000000000010 CR3: 00000001d2a0b000 CR4: 0000000000002660
[17605315.391294] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17605315.391301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[17605315.391308] Process chef-client (pid: 28794, threadinfo ffff8801d2658000, task ffff8801d0870000)
[17605315.391315] Stack:
[17605315.391319] ffff8801d2659c48 ffffffff8104ece9 ffff8801d2eb5a00 ffff8801dffa26c0
[17605315.391331] ffff8801d2eb5200 0000000000000000 ffff8801d2659c78 ffffffff810544b8
[17605315.391343] ffff8801d2659c78 ffff8801dffa26c0 0000000000000001 ffff8801d08703a8
[17605315.391354] Call Trace:
[17605315.391364] [<ffffffff8104e
[17605315.391373] [<ffffffff81054
[17605315.391382] [<ffffffff81652
[17605315.391391] [<ffffffff81655
[17605315.391399] [<ffffffff81653
[17605315.391408] [<ffffffff8117e
[17605315.391417] [<ffffffff81089
[17605315.391425] [<ffffffff8117e
[17605315.391433] [<ffffffff81174
[17605315.391443] [<ffffffff8100a
[17605315.391451] [<ffffffff8100a
[17605315.391459] [<ffffffff81298
[17605315.391466] [<ffffffff81174
[17605315.391473] [<ffffffff81174
[17605315.391479] [<ffffffff81174
[17605315.391488] [<ffffffff8165d
[17605315.391494] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89 46 10 c3 0f 1f 80 00 00 00 00 48 89 32 eb b2 0f 1f 00 48 89 70 10 eb a9 66 90 55 <48> 8b 17 48 89 e5 48 89 d0 48 83 e0 fc 48 39 c7 74 34 48 8b 47
[17605315.391577] RIP [<ffffffff8130d
[17605315.391583] RSP <ffff8801d2659c18>
[17605315.391587] CR2: 0000000000000010
[17605315.391596] ---[ end trace 586cfae3c9e3e67e ]---
The stack trace is identical between the two kernels. I am unable to find any reference to this on Ubuntu, Xen, or kernel forums or mailing lists but it's repeatable even on freshly installed m1.large instances on EC2.
ProblemType: Bug 3.2.0-24- virtual 3.2.0-24.37 ature: Ubuntu 3.2.0-24.37-virtual 3.2.14
DistroRelease: Ubuntu 12.04
Package: linux-image-
ProcVersionSign
Uname: Linux 3.2.0-24-virtual x86_64
AcpiTables:
AlsaDevices: eu-west- 1/images- testing/ ubuntu- precise- daily-amd64- desktop- 20120420. manifest. xml Zone: eu-west-1b
total 0
crw-rw---T 1 root audio 116, 1 May 7 09:58 seq
crw-rw---T 1 root audio 116, 33 May 7 09:58 timer
AplayDevices: aplay: device_list:252: no soundcards found...
ApportVersion: 2.0.1-0ubuntu7
Architecture: amd64
ArecordDevices: arecord: device_list:252: no soundcards found...
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Date: Tue May 15 15:23:54 2012
Ec2AMI: ami-fd1c2789
Ec2AMIManifest: ubuntu-
Ec2Availability
Ec2InstanceType: m1.large
Ec2Kernel: aki-62695816
Ec2Ramdisk: unavailable
IwConfig:
lo no wireless extensions.
eth0 no wireless extensions.
Lspci:
Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
PciMultimedia:
ProcEnviron: 256color
TERM=xterm-
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: root=LABEL= cloudimg- rootfs ro console=hvc0 ersions: restricted- modules- 3.2.0-24- virtual N/A backports- modules- 3.2.0-24- virtual N/A
PulseList:
Error: command ['pacmd', 'list'] failed with exit code 1: Home directory /home/mydrive not ours.
No PulseAudio daemon running, or not running as session daemon.
RelatedPackageV
linux-
linux-
linux-firmware 1.79
RfKill:
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
WifiSyslog: