Activity log for bug #999755

Date Who What changed Old value New value Message
2012-05-15 15:33:24 Karl Matthias bug added bug
2012-05-15 16:00:11 Brad Figg linux (Ubuntu): status New Incomplete
2012-05-15 19:10:14 Joseph Salisbury linux (Ubuntu): importance Undecided Medium
2012-05-16 06:58:42 Stefan Bader bug added subscriber Stefan Bader
2012-05-17 09:58:26 Karl Matthias linux (Ubuntu): status Incomplete Confirmed
2012-05-19 07:24:54 Gavin Heavyside bug added subscriber Gavin Heavyside
2012-05-25 14:27:30 Barrie Bremner bug added subscriber Barrie Bremner
2012-05-28 11:16:12 Gavin Heavyside summary Kernel crash on EC2 m1.large instances Kernel crash on EC2 & VirtualBox
2012-05-29 14:50:15 Stefan Bader summary Kernel crash on EC2 & VirtualBox Kernel crash in rb_next doin ohai loops
2012-05-29 14:52:30 Stefan Bader description We have a number of small and large instances running the release version of 12.04. The small instances have been completely stable. However, every large instance we have has crashed at a seemingly random interval. This is repeatable on individual systems, though not within a defined time period. It appears to be triggered by our half hourly run of OpsCode's chef-client. We tried running the client in a tight loop to recreate the crash but were unable to get it to do so in a short time period. It still took two days to crash again. This was affecting the 3.2.0-23-virtual kernel, so we updated to the 3.2.0-24-virtual kernel but still have found the same crash. The only information available in the system logs is: [17605315.391128] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [17605315.391148] IP: [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391163] PGD 1d2fdc067 PUD 1d0e3c067 PMD 0 [17605315.391172] Oops: 0000 [#1] SMP [17605315.391179] CPU 1 [17605315.391182] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables isofs acpiphp [17605315.391209] [17605315.391214] Pid: 28794, comm: chef-client Not tainted 3.2.0-23-virtual #36-Ubuntu [17605315.391223] RIP: e030:[<ffffffff8130d7f1>] [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391232] RSP: e02b:ffff8801d2659c18 EFLAGS: 00010046 [17605315.391238] RAX: 0000000000000000 RBX: ffff8801d2eb5a00 RCX: 0000000000000000 [17605315.391244] RDX: fffffffffffffff0 RSI: 0000000000000000 RDI: 0000000000000010 [17605315.391250] RBP: ffff8801d2659c48 R08: 0000000000000000 R09: 0000000000000000 [17605315.391255] R10: ffff8801dff866c0 R11: 0000000000000001 R12: 0000000000000000 [17605315.391263] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000033b9e28 [17605315.391274] FS: 00007fee8cc10700(0000) GS:ffff8801dff8f000(0000) knlGS:0000000000000000 [17605315.391281] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [17605315.391287] CR2: 0000000000000010 CR3: 00000001d2a0b000 CR4: 0000000000002660 [17605315.391294] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [17605315.391301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [17605315.391308] Process chef-client (pid: 28794, threadinfo ffff8801d2658000, task ffff8801d0870000) [17605315.391315] Stack: [17605315.391319] ffff8801d2659c48 ffffffff8104ece9 ffff8801d2eb5a00 ffff8801dffa26c0 [17605315.391331] ffff8801d2eb5200 0000000000000000 ffff8801d2659c78 ffffffff810544b8 [17605315.391343] ffff8801d2659c78 ffff8801dffa26c0 0000000000000001 ffff8801d08703a8 [17605315.391354] Call Trace: [17605315.391364] [<ffffffff8104ece9>] ? pick_next_entity+0xb9/0xe0 [17605315.391373] [<ffffffff810544b8>] pick_next_task_fair+0x38/0x70 [17605315.391382] [<ffffffff81652ddc>] __schedule+0x14c/0x6f0 [17605315.391391] [<ffffffff816554ee>] ? _raw_spin_unlock_irqrestore+0x1e/0x30 [17605315.391399] [<ffffffff8165344f>] schedule+0x3f/0x60 [17605315.391408] [<ffffffff8117e119>] pipe_wait+0x59/0x80 [17605315.391417] [<ffffffff81089340>] ? add_wait_queue+0x60/0x60 [17605315.391425] [<ffffffff8117e87a>] pipe_read+0x1da/0x330 [17605315.391433] [<ffffffff81174522>] do_sync_read+0xd2/0x110 [17605315.391443] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10 [17605315.391451] [<ffffffff8100aa32>] ? check_events+0x12/0x20 [17605315.391459] [<ffffffff81298d33>] ? security_file_permission+0x93/0xb0 [17605315.391466] [<ffffffff811749a1>] ? rw_verify_area+0x61/0xf0 [17605315.391473] [<ffffffff81174e80>] vfs_read+0xb0/0x180 [17605315.391479] [<ffffffff81174f9a>] sys_read+0x4a/0x90 [17605315.391488] [<ffffffff8165d8c2>] system_call_fastpath+0x16/0x1b [17605315.391494] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89 46 10 c3 0f 1f 80 00 00 00 00 48 89 32 eb b2 0f 1f 00 48 89 70 10 eb a9 66 90 55 <48> 8b 17 48 89 e5 48 89 d0 48 83 e0 fc 48 39 c7 74 34 48 8b 47 [17605315.391577] RIP [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391583] RSP <ffff8801d2659c18> [17605315.391587] CR2: 0000000000000010 [17605315.391596] ---[ end trace 586cfae3c9e3e67e ]--- The stack trace is identical between the two kernels. I am unable to find any reference to this on Ubuntu, Xen, or kernel forums or mailing lists but it's repeatable even on freshly installed m1.large instances on EC2. ProblemType: Bug DistroRelease: Ubuntu 12.04 Package: linux-image-3.2.0-24-virtual 3.2.0-24.37 ProcVersionSignature: Ubuntu 3.2.0-24.37-virtual 3.2.14 Uname: Linux 3.2.0-24-virtual x86_64 AcpiTables: AlsaDevices: total 0 crw-rw---T 1 root audio 116, 1 May 7 09:58 seq crw-rw---T 1 root audio 116, 33 May 7 09:58 timer AplayDevices: aplay: device_list:252: no soundcards found... ApportVersion: 2.0.1-0ubuntu7 Architecture: amd64 ArecordDevices: arecord: device_list:252: no soundcards found... AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found. Date: Tue May 15 15:23:54 2012 Ec2AMI: ami-fd1c2789 Ec2AMIManifest: ubuntu-eu-west-1/images-testing/ubuntu-precise-daily-amd64-desktop-20120420.manifest.xml Ec2AvailabilityZone: eu-west-1b Ec2InstanceType: m1.large Ec2Kernel: aki-62695816 Ec2Ramdisk: unavailable IwConfig: lo no wireless extensions. eth0 no wireless extensions. Lspci: Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99 PciMultimedia: ProcEnviron: TERM=xterm-256color PATH=(custom, no user) LANG=en_US.UTF-8 SHELL=/bin/bash ProcFB: ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0 PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: Home directory /home/mydrive not ours. No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions: linux-restricted-modules-3.2.0-24-virtual N/A linux-backports-modules-3.2.0-24-virtual N/A linux-firmware 1.79 RfKill: SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) WifiSyslog: Testcase: 1. apt-get install build-essential ruby-1.9.3 screen 2. gem install chef 3. in screen session: while true; oahi; done --- We have a number of small and large instances running the release version of 12.04. The small instances have been completely stable. However, every large instance we have has crashed at a seemingly random interval. This is repeatable on individual systems, though not within a defined time period. It appears to be triggered by our half hourly run of OpsCode's chef-client. We tried running the client in a tight loop to recreate the crash but were unable to get it to do so in a short time period. It still took two days to crash again. This was affecting the 3.2.0-23-virtual kernel, so we updated to the 3.2.0-24-virtual kernel but still have found the same crash. The only information available in the system logs is: [17605315.391128] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [17605315.391148] IP: [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391163] PGD 1d2fdc067 PUD 1d0e3c067 PMD 0 [17605315.391172] Oops: 0000 [#1] SMP [17605315.391179] CPU 1 [17605315.391182] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables isofs acpiphp [17605315.391209] [17605315.391214] Pid: 28794, comm: chef-client Not tainted 3.2.0-23-virtual #36-Ubuntu [17605315.391223] RIP: e030:[<ffffffff8130d7f1>] [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391232] RSP: e02b:ffff8801d2659c18 EFLAGS: 00010046 [17605315.391238] RAX: 0000000000000000 RBX: ffff8801d2eb5a00 RCX: 0000000000000000 [17605315.391244] RDX: fffffffffffffff0 RSI: 0000000000000000 RDI: 0000000000000010 [17605315.391250] RBP: ffff8801d2659c48 R08: 0000000000000000 R09: 0000000000000000 [17605315.391255] R10: ffff8801dff866c0 R11: 0000000000000001 R12: 0000000000000000 [17605315.391263] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000033b9e28 [17605315.391274] FS: 00007fee8cc10700(0000) GS:ffff8801dff8f000(0000) knlGS:0000000000000000 [17605315.391281] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [17605315.391287] CR2: 0000000000000010 CR3: 00000001d2a0b000 CR4: 0000000000002660 [17605315.391294] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [17605315.391301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [17605315.391308] Process chef-client (pid: 28794, threadinfo ffff8801d2658000, task ffff8801d0870000) [17605315.391315] Stack: [17605315.391319] ffff8801d2659c48 ffffffff8104ece9 ffff8801d2eb5a00 ffff8801dffa26c0 [17605315.391331] ffff8801d2eb5200 0000000000000000 ffff8801d2659c78 ffffffff810544b8 [17605315.391343] ffff8801d2659c78 ffff8801dffa26c0 0000000000000001 ffff8801d08703a8 [17605315.391354] Call Trace: [17605315.391364] [<ffffffff8104ece9>] ? pick_next_entity+0xb9/0xe0 [17605315.391373] [<ffffffff810544b8>] pick_next_task_fair+0x38/0x70 [17605315.391382] [<ffffffff81652ddc>] __schedule+0x14c/0x6f0 [17605315.391391] [<ffffffff816554ee>] ? _raw_spin_unlock_irqrestore+0x1e/0x30 [17605315.391399] [<ffffffff8165344f>] schedule+0x3f/0x60 [17605315.391408] [<ffffffff8117e119>] pipe_wait+0x59/0x80 [17605315.391417] [<ffffffff81089340>] ? add_wait_queue+0x60/0x60 [17605315.391425] [<ffffffff8117e87a>] pipe_read+0x1da/0x330 [17605315.391433] [<ffffffff81174522>] do_sync_read+0xd2/0x110 [17605315.391443] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10 [17605315.391451] [<ffffffff8100aa32>] ? check_events+0x12/0x20 [17605315.391459] [<ffffffff81298d33>] ? security_file_permission+0x93/0xb0 [17605315.391466] [<ffffffff811749a1>] ? rw_verify_area+0x61/0xf0 [17605315.391473] [<ffffffff81174e80>] vfs_read+0xb0/0x180 [17605315.391479] [<ffffffff81174f9a>] sys_read+0x4a/0x90 [17605315.391488] [<ffffffff8165d8c2>] system_call_fastpath+0x16/0x1b [17605315.391494] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89 46 10 c3 0f 1f 80 00 00 00 00 48 89 32 eb b2 0f 1f 00 48 89 70 10 eb a9 66 90 55 <48> 8b 17 48 89 e5 48 89 d0 48 83 e0 fc 48 39 c7 74 34 48 8b 47 [17605315.391577] RIP [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391583] RSP <ffff8801d2659c18> [17605315.391587] CR2: 0000000000000010 [17605315.391596] ---[ end trace 586cfae3c9e3e67e ]--- The stack trace is identical between the two kernels. I am unable to find any reference to this on Ubuntu, Xen, or kernel forums or mailing lists but it's repeatable even on freshly installed m1.large instances on EC2. ProblemType: Bug DistroRelease: Ubuntu 12.04 Package: linux-image-3.2.0-24-virtual 3.2.0-24.37 ProcVersionSignature: Ubuntu 3.2.0-24.37-virtual 3.2.14 Uname: Linux 3.2.0-24-virtual x86_64 AcpiTables: AlsaDevices:  total 0  crw-rw---T 1 root audio 116, 1 May 7 09:58 seq  crw-rw---T 1 root audio 116, 33 May 7 09:58 timer AplayDevices: aplay: device_list:252: no soundcards found... ApportVersion: 2.0.1-0ubuntu7 Architecture: amd64 ArecordDevices: arecord: device_list:252: no soundcards found... AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found. Date: Tue May 15 15:23:54 2012 Ec2AMI: ami-fd1c2789 Ec2AMIManifest: ubuntu-eu-west-1/images-testing/ubuntu-precise-daily-amd64-desktop-20120420.manifest.xml Ec2AvailabilityZone: eu-west-1b Ec2InstanceType: m1.large Ec2Kernel: aki-62695816 Ec2Ramdisk: unavailable IwConfig:  lo no wireless extensions.  eth0 no wireless extensions. Lspci: Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99 PciMultimedia: ProcEnviron:  TERM=xterm-256color  PATH=(custom, no user)  LANG=en_US.UTF-8  SHELL=/bin/bash ProcFB: ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0 PulseList:  Error: command ['pacmd', 'list'] failed with exit code 1: Home directory /home/mydrive not ours.  No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions:  linux-restricted-modules-3.2.0-24-virtual N/A  linux-backports-modules-3.2.0-24-virtual N/A  linux-firmware 1.79 RfKill: SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) WifiSyslog:
2012-05-29 14:53:12 Stefan Bader summary Kernel crash in rb_next doin ohai loops Kernel crash in rb_next doing ohai loops
2012-05-29 15:13:41 Karl Matthias description Testcase: 1. apt-get install build-essential ruby-1.9.3 screen 2. gem install chef 3. in screen session: while true; oahi; done --- We have a number of small and large instances running the release version of 12.04. The small instances have been completely stable. However, every large instance we have has crashed at a seemingly random interval. This is repeatable on individual systems, though not within a defined time period. It appears to be triggered by our half hourly run of OpsCode's chef-client. We tried running the client in a tight loop to recreate the crash but were unable to get it to do so in a short time period. It still took two days to crash again. This was affecting the 3.2.0-23-virtual kernel, so we updated to the 3.2.0-24-virtual kernel but still have found the same crash. The only information available in the system logs is: [17605315.391128] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [17605315.391148] IP: [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391163] PGD 1d2fdc067 PUD 1d0e3c067 PMD 0 [17605315.391172] Oops: 0000 [#1] SMP [17605315.391179] CPU 1 [17605315.391182] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables isofs acpiphp [17605315.391209] [17605315.391214] Pid: 28794, comm: chef-client Not tainted 3.2.0-23-virtual #36-Ubuntu [17605315.391223] RIP: e030:[<ffffffff8130d7f1>] [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391232] RSP: e02b:ffff8801d2659c18 EFLAGS: 00010046 [17605315.391238] RAX: 0000000000000000 RBX: ffff8801d2eb5a00 RCX: 0000000000000000 [17605315.391244] RDX: fffffffffffffff0 RSI: 0000000000000000 RDI: 0000000000000010 [17605315.391250] RBP: ffff8801d2659c48 R08: 0000000000000000 R09: 0000000000000000 [17605315.391255] R10: ffff8801dff866c0 R11: 0000000000000001 R12: 0000000000000000 [17605315.391263] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000033b9e28 [17605315.391274] FS: 00007fee8cc10700(0000) GS:ffff8801dff8f000(0000) knlGS:0000000000000000 [17605315.391281] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [17605315.391287] CR2: 0000000000000010 CR3: 00000001d2a0b000 CR4: 0000000000002660 [17605315.391294] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [17605315.391301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [17605315.391308] Process chef-client (pid: 28794, threadinfo ffff8801d2658000, task ffff8801d0870000) [17605315.391315] Stack: [17605315.391319] ffff8801d2659c48 ffffffff8104ece9 ffff8801d2eb5a00 ffff8801dffa26c0 [17605315.391331] ffff8801d2eb5200 0000000000000000 ffff8801d2659c78 ffffffff810544b8 [17605315.391343] ffff8801d2659c78 ffff8801dffa26c0 0000000000000001 ffff8801d08703a8 [17605315.391354] Call Trace: [17605315.391364] [<ffffffff8104ece9>] ? pick_next_entity+0xb9/0xe0 [17605315.391373] [<ffffffff810544b8>] pick_next_task_fair+0x38/0x70 [17605315.391382] [<ffffffff81652ddc>] __schedule+0x14c/0x6f0 [17605315.391391] [<ffffffff816554ee>] ? _raw_spin_unlock_irqrestore+0x1e/0x30 [17605315.391399] [<ffffffff8165344f>] schedule+0x3f/0x60 [17605315.391408] [<ffffffff8117e119>] pipe_wait+0x59/0x80 [17605315.391417] [<ffffffff81089340>] ? add_wait_queue+0x60/0x60 [17605315.391425] [<ffffffff8117e87a>] pipe_read+0x1da/0x330 [17605315.391433] [<ffffffff81174522>] do_sync_read+0xd2/0x110 [17605315.391443] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10 [17605315.391451] [<ffffffff8100aa32>] ? check_events+0x12/0x20 [17605315.391459] [<ffffffff81298d33>] ? security_file_permission+0x93/0xb0 [17605315.391466] [<ffffffff811749a1>] ? rw_verify_area+0x61/0xf0 [17605315.391473] [<ffffffff81174e80>] vfs_read+0xb0/0x180 [17605315.391479] [<ffffffff81174f9a>] sys_read+0x4a/0x90 [17605315.391488] [<ffffffff8165d8c2>] system_call_fastpath+0x16/0x1b [17605315.391494] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89 46 10 c3 0f 1f 80 00 00 00 00 48 89 32 eb b2 0f 1f 00 48 89 70 10 eb a9 66 90 55 <48> 8b 17 48 89 e5 48 89 d0 48 83 e0 fc 48 39 c7 74 34 48 8b 47 [17605315.391577] RIP [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391583] RSP <ffff8801d2659c18> [17605315.391587] CR2: 0000000000000010 [17605315.391596] ---[ end trace 586cfae3c9e3e67e ]--- The stack trace is identical between the two kernels. I am unable to find any reference to this on Ubuntu, Xen, or kernel forums or mailing lists but it's repeatable even on freshly installed m1.large instances on EC2. ProblemType: Bug DistroRelease: Ubuntu 12.04 Package: linux-image-3.2.0-24-virtual 3.2.0-24.37 ProcVersionSignature: Ubuntu 3.2.0-24.37-virtual 3.2.14 Uname: Linux 3.2.0-24-virtual x86_64 AcpiTables: AlsaDevices:  total 0  crw-rw---T 1 root audio 116, 1 May 7 09:58 seq  crw-rw---T 1 root audio 116, 33 May 7 09:58 timer AplayDevices: aplay: device_list:252: no soundcards found... ApportVersion: 2.0.1-0ubuntu7 Architecture: amd64 ArecordDevices: arecord: device_list:252: no soundcards found... AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found. Date: Tue May 15 15:23:54 2012 Ec2AMI: ami-fd1c2789 Ec2AMIManifest: ubuntu-eu-west-1/images-testing/ubuntu-precise-daily-amd64-desktop-20120420.manifest.xml Ec2AvailabilityZone: eu-west-1b Ec2InstanceType: m1.large Ec2Kernel: aki-62695816 Ec2Ramdisk: unavailable IwConfig:  lo no wireless extensions.  eth0 no wireless extensions. Lspci: Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99 PciMultimedia: ProcEnviron:  TERM=xterm-256color  PATH=(custom, no user)  LANG=en_US.UTF-8  SHELL=/bin/bash ProcFB: ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0 PulseList:  Error: command ['pacmd', 'list'] failed with exit code 1: Home directory /home/mydrive not ours.  No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions:  linux-restricted-modules-3.2.0-24-virtual N/A  linux-backports-modules-3.2.0-24-virtual N/A  linux-firmware 1.79 RfKill: SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) WifiSyslog: Testcase: 1. apt-get install build-essential ruby-1.9.3 screen 2. gem install chef 3. in screen session: while true; ohai; done --- We have a number of small and large instances running the release version of 12.04. The small instances have been completely stable. However, every large instance we have has crashed at a seemingly random interval. This is repeatable on individual systems, though not within a defined time period. It appears to be triggered by our half hourly run of OpsCode's chef-client. We tried running the client in a tight loop to recreate the crash but were unable to get it to do so in a short time period. It still took two days to crash again. This was affecting the 3.2.0-23-virtual kernel, so we updated to the 3.2.0-24-virtual kernel but still have found the same crash. The only information available in the system logs is: [17605315.391128] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [17605315.391148] IP: [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391163] PGD 1d2fdc067 PUD 1d0e3c067 PMD 0 [17605315.391172] Oops: 0000 [#1] SMP [17605315.391179] CPU 1 [17605315.391182] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables isofs acpiphp [17605315.391209] [17605315.391214] Pid: 28794, comm: chef-client Not tainted 3.2.0-23-virtual #36-Ubuntu [17605315.391223] RIP: e030:[<ffffffff8130d7f1>] [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391232] RSP: e02b:ffff8801d2659c18 EFLAGS: 00010046 [17605315.391238] RAX: 0000000000000000 RBX: ffff8801d2eb5a00 RCX: 0000000000000000 [17605315.391244] RDX: fffffffffffffff0 RSI: 0000000000000000 RDI: 0000000000000010 [17605315.391250] RBP: ffff8801d2659c48 R08: 0000000000000000 R09: 0000000000000000 [17605315.391255] R10: ffff8801dff866c0 R11: 0000000000000001 R12: 0000000000000000 [17605315.391263] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000033b9e28 [17605315.391274] FS: 00007fee8cc10700(0000) GS:ffff8801dff8f000(0000) knlGS:0000000000000000 [17605315.391281] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [17605315.391287] CR2: 0000000000000010 CR3: 00000001d2a0b000 CR4: 0000000000002660 [17605315.391294] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [17605315.391301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [17605315.391308] Process chef-client (pid: 28794, threadinfo ffff8801d2658000, task ffff8801d0870000) [17605315.391315] Stack: [17605315.391319] ffff8801d2659c48 ffffffff8104ece9 ffff8801d2eb5a00 ffff8801dffa26c0 [17605315.391331] ffff8801d2eb5200 0000000000000000 ffff8801d2659c78 ffffffff810544b8 [17605315.391343] ffff8801d2659c78 ffff8801dffa26c0 0000000000000001 ffff8801d08703a8 [17605315.391354] Call Trace: [17605315.391364] [<ffffffff8104ece9>] ? pick_next_entity+0xb9/0xe0 [17605315.391373] [<ffffffff810544b8>] pick_next_task_fair+0x38/0x70 [17605315.391382] [<ffffffff81652ddc>] __schedule+0x14c/0x6f0 [17605315.391391] [<ffffffff816554ee>] ? _raw_spin_unlock_irqrestore+0x1e/0x30 [17605315.391399] [<ffffffff8165344f>] schedule+0x3f/0x60 [17605315.391408] [<ffffffff8117e119>] pipe_wait+0x59/0x80 [17605315.391417] [<ffffffff81089340>] ? add_wait_queue+0x60/0x60 [17605315.391425] [<ffffffff8117e87a>] pipe_read+0x1da/0x330 [17605315.391433] [<ffffffff81174522>] do_sync_read+0xd2/0x110 [17605315.391443] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10 [17605315.391451] [<ffffffff8100aa32>] ? check_events+0x12/0x20 [17605315.391459] [<ffffffff81298d33>] ? security_file_permission+0x93/0xb0 [17605315.391466] [<ffffffff811749a1>] ? rw_verify_area+0x61/0xf0 [17605315.391473] [<ffffffff81174e80>] vfs_read+0xb0/0x180 [17605315.391479] [<ffffffff81174f9a>] sys_read+0x4a/0x90 [17605315.391488] [<ffffffff8165d8c2>] system_call_fastpath+0x16/0x1b [17605315.391494] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89 46 10 c3 0f 1f 80 00 00 00 00 48 89 32 eb b2 0f 1f 00 48 89 70 10 eb a9 66 90 55 <48> 8b 17 48 89 e5 48 89 d0 48 83 e0 fc 48 39 c7 74 34 48 8b 47 [17605315.391577] RIP [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391583] RSP <ffff8801d2659c18> [17605315.391587] CR2: 0000000000000010 [17605315.391596] ---[ end trace 586cfae3c9e3e67e ]--- The stack trace is identical between the two kernels. I am unable to find any reference to this on Ubuntu, Xen, or kernel forums or mailing lists but it's repeatable even on freshly installed m1.large instances on EC2. ProblemType: Bug DistroRelease: Ubuntu 12.04 Package: linux-image-3.2.0-24-virtual 3.2.0-24.37 ProcVersionSignature: Ubuntu 3.2.0-24.37-virtual 3.2.14 Uname: Linux 3.2.0-24-virtual x86_64 AcpiTables: AlsaDevices:  total 0  crw-rw---T 1 root audio 116, 1 May 7 09:58 seq  crw-rw---T 1 root audio 116, 33 May 7 09:58 timer AplayDevices: aplay: device_list:252: no soundcards found... ApportVersion: 2.0.1-0ubuntu7 Architecture: amd64 ArecordDevices: arecord: device_list:252: no soundcards found... AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found. Date: Tue May 15 15:23:54 2012 Ec2AMI: ami-fd1c2789 Ec2AMIManifest: ubuntu-eu-west-1/images-testing/ubuntu-precise-daily-amd64-desktop-20120420.manifest.xml Ec2AvailabilityZone: eu-west-1b Ec2InstanceType: m1.large Ec2Kernel: aki-62695816 Ec2Ramdisk: unavailable IwConfig:  lo no wireless extensions.  eth0 no wireless extensions. Lspci: Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99 PciMultimedia: ProcEnviron:  TERM=xterm-256color  PATH=(custom, no user)  LANG=en_US.UTF-8  SHELL=/bin/bash ProcFB: ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0 PulseList:  Error: command ['pacmd', 'list'] failed with exit code 1: Home directory /home/mydrive not ours.  No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions:  linux-restricted-modules-3.2.0-24-virtual N/A  linux-backports-modules-3.2.0-24-virtual N/A  linux-firmware 1.79 RfKill: SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) WifiSyslog:
2012-05-29 15:55:06 Stefan Bader description Testcase: 1. apt-get install build-essential ruby-1.9.3 screen 2. gem install chef 3. in screen session: while true; ohai; done --- We have a number of small and large instances running the release version of 12.04. The small instances have been completely stable. However, every large instance we have has crashed at a seemingly random interval. This is repeatable on individual systems, though not within a defined time period. It appears to be triggered by our half hourly run of OpsCode's chef-client. We tried running the client in a tight loop to recreate the crash but were unable to get it to do so in a short time period. It still took two days to crash again. This was affecting the 3.2.0-23-virtual kernel, so we updated to the 3.2.0-24-virtual kernel but still have found the same crash. The only information available in the system logs is: [17605315.391128] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [17605315.391148] IP: [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391163] PGD 1d2fdc067 PUD 1d0e3c067 PMD 0 [17605315.391172] Oops: 0000 [#1] SMP [17605315.391179] CPU 1 [17605315.391182] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables isofs acpiphp [17605315.391209] [17605315.391214] Pid: 28794, comm: chef-client Not tainted 3.2.0-23-virtual #36-Ubuntu [17605315.391223] RIP: e030:[<ffffffff8130d7f1>] [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391232] RSP: e02b:ffff8801d2659c18 EFLAGS: 00010046 [17605315.391238] RAX: 0000000000000000 RBX: ffff8801d2eb5a00 RCX: 0000000000000000 [17605315.391244] RDX: fffffffffffffff0 RSI: 0000000000000000 RDI: 0000000000000010 [17605315.391250] RBP: ffff8801d2659c48 R08: 0000000000000000 R09: 0000000000000000 [17605315.391255] R10: ffff8801dff866c0 R11: 0000000000000001 R12: 0000000000000000 [17605315.391263] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000033b9e28 [17605315.391274] FS: 00007fee8cc10700(0000) GS:ffff8801dff8f000(0000) knlGS:0000000000000000 [17605315.391281] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [17605315.391287] CR2: 0000000000000010 CR3: 00000001d2a0b000 CR4: 0000000000002660 [17605315.391294] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [17605315.391301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [17605315.391308] Process chef-client (pid: 28794, threadinfo ffff8801d2658000, task ffff8801d0870000) [17605315.391315] Stack: [17605315.391319] ffff8801d2659c48 ffffffff8104ece9 ffff8801d2eb5a00 ffff8801dffa26c0 [17605315.391331] ffff8801d2eb5200 0000000000000000 ffff8801d2659c78 ffffffff810544b8 [17605315.391343] ffff8801d2659c78 ffff8801dffa26c0 0000000000000001 ffff8801d08703a8 [17605315.391354] Call Trace: [17605315.391364] [<ffffffff8104ece9>] ? pick_next_entity+0xb9/0xe0 [17605315.391373] [<ffffffff810544b8>] pick_next_task_fair+0x38/0x70 [17605315.391382] [<ffffffff81652ddc>] __schedule+0x14c/0x6f0 [17605315.391391] [<ffffffff816554ee>] ? _raw_spin_unlock_irqrestore+0x1e/0x30 [17605315.391399] [<ffffffff8165344f>] schedule+0x3f/0x60 [17605315.391408] [<ffffffff8117e119>] pipe_wait+0x59/0x80 [17605315.391417] [<ffffffff81089340>] ? add_wait_queue+0x60/0x60 [17605315.391425] [<ffffffff8117e87a>] pipe_read+0x1da/0x330 [17605315.391433] [<ffffffff81174522>] do_sync_read+0xd2/0x110 [17605315.391443] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10 [17605315.391451] [<ffffffff8100aa32>] ? check_events+0x12/0x20 [17605315.391459] [<ffffffff81298d33>] ? security_file_permission+0x93/0xb0 [17605315.391466] [<ffffffff811749a1>] ? rw_verify_area+0x61/0xf0 [17605315.391473] [<ffffffff81174e80>] vfs_read+0xb0/0x180 [17605315.391479] [<ffffffff81174f9a>] sys_read+0x4a/0x90 [17605315.391488] [<ffffffff8165d8c2>] system_call_fastpath+0x16/0x1b [17605315.391494] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89 46 10 c3 0f 1f 80 00 00 00 00 48 89 32 eb b2 0f 1f 00 48 89 70 10 eb a9 66 90 55 <48> 8b 17 48 89 e5 48 89 d0 48 83 e0 fc 48 39 c7 74 34 48 8b 47 [17605315.391577] RIP [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391583] RSP <ffff8801d2659c18> [17605315.391587] CR2: 0000000000000010 [17605315.391596] ---[ end trace 586cfae3c9e3e67e ]--- The stack trace is identical between the two kernels. I am unable to find any reference to this on Ubuntu, Xen, or kernel forums or mailing lists but it's repeatable even on freshly installed m1.large instances on EC2. ProblemType: Bug DistroRelease: Ubuntu 12.04 Package: linux-image-3.2.0-24-virtual 3.2.0-24.37 ProcVersionSignature: Ubuntu 3.2.0-24.37-virtual 3.2.14 Uname: Linux 3.2.0-24-virtual x86_64 AcpiTables: AlsaDevices:  total 0  crw-rw---T 1 root audio 116, 1 May 7 09:58 seq  crw-rw---T 1 root audio 116, 33 May 7 09:58 timer AplayDevices: aplay: device_list:252: no soundcards found... ApportVersion: 2.0.1-0ubuntu7 Architecture: amd64 ArecordDevices: arecord: device_list:252: no soundcards found... AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found. Date: Tue May 15 15:23:54 2012 Ec2AMI: ami-fd1c2789 Ec2AMIManifest: ubuntu-eu-west-1/images-testing/ubuntu-precise-daily-amd64-desktop-20120420.manifest.xml Ec2AvailabilityZone: eu-west-1b Ec2InstanceType: m1.large Ec2Kernel: aki-62695816 Ec2Ramdisk: unavailable IwConfig:  lo no wireless extensions.  eth0 no wireless extensions. Lspci: Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99 PciMultimedia: ProcEnviron:  TERM=xterm-256color  PATH=(custom, no user)  LANG=en_US.UTF-8  SHELL=/bin/bash ProcFB: ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0 PulseList:  Error: command ['pacmd', 'list'] failed with exit code 1: Home directory /home/mydrive not ours.  No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions:  linux-restricted-modules-3.2.0-24-virtual N/A  linux-backports-modules-3.2.0-24-virtual N/A  linux-firmware 1.79 RfKill: SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) WifiSyslog: Testcase: 1. apt-get install build-essential ruby-1.9.3 screen 2. gem install chef 3. in screen session: while true; do ohai; done --- We have a number of small and large instances running the release version of 12.04. The small instances have been completely stable. However, every large instance we have has crashed at a seemingly random interval. This is repeatable on individual systems, though not within a defined time period. It appears to be triggered by our half hourly run of OpsCode's chef-client. We tried running the client in a tight loop to recreate the crash but were unable to get it to do so in a short time period. It still took two days to crash again. This was affecting the 3.2.0-23-virtual kernel, so we updated to the 3.2.0-24-virtual kernel but still have found the same crash. The only information available in the system logs is: [17605315.391128] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [17605315.391148] IP: [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391163] PGD 1d2fdc067 PUD 1d0e3c067 PMD 0 [17605315.391172] Oops: 0000 [#1] SMP [17605315.391179] CPU 1 [17605315.391182] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables isofs acpiphp [17605315.391209] [17605315.391214] Pid: 28794, comm: chef-client Not tainted 3.2.0-23-virtual #36-Ubuntu [17605315.391223] RIP: e030:[<ffffffff8130d7f1>] [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391232] RSP: e02b:ffff8801d2659c18 EFLAGS: 00010046 [17605315.391238] RAX: 0000000000000000 RBX: ffff8801d2eb5a00 RCX: 0000000000000000 [17605315.391244] RDX: fffffffffffffff0 RSI: 0000000000000000 RDI: 0000000000000010 [17605315.391250] RBP: ffff8801d2659c48 R08: 0000000000000000 R09: 0000000000000000 [17605315.391255] R10: ffff8801dff866c0 R11: 0000000000000001 R12: 0000000000000000 [17605315.391263] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000033b9e28 [17605315.391274] FS: 00007fee8cc10700(0000) GS:ffff8801dff8f000(0000) knlGS:0000000000000000 [17605315.391281] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [17605315.391287] CR2: 0000000000000010 CR3: 00000001d2a0b000 CR4: 0000000000002660 [17605315.391294] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [17605315.391301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [17605315.391308] Process chef-client (pid: 28794, threadinfo ffff8801d2658000, task ffff8801d0870000) [17605315.391315] Stack: [17605315.391319] ffff8801d2659c48 ffffffff8104ece9 ffff8801d2eb5a00 ffff8801dffa26c0 [17605315.391331] ffff8801d2eb5200 0000000000000000 ffff8801d2659c78 ffffffff810544b8 [17605315.391343] ffff8801d2659c78 ffff8801dffa26c0 0000000000000001 ffff8801d08703a8 [17605315.391354] Call Trace: [17605315.391364] [<ffffffff8104ece9>] ? pick_next_entity+0xb9/0xe0 [17605315.391373] [<ffffffff810544b8>] pick_next_task_fair+0x38/0x70 [17605315.391382] [<ffffffff81652ddc>] __schedule+0x14c/0x6f0 [17605315.391391] [<ffffffff816554ee>] ? _raw_spin_unlock_irqrestore+0x1e/0x30 [17605315.391399] [<ffffffff8165344f>] schedule+0x3f/0x60 [17605315.391408] [<ffffffff8117e119>] pipe_wait+0x59/0x80 [17605315.391417] [<ffffffff81089340>] ? add_wait_queue+0x60/0x60 [17605315.391425] [<ffffffff8117e87a>] pipe_read+0x1da/0x330 [17605315.391433] [<ffffffff81174522>] do_sync_read+0xd2/0x110 [17605315.391443] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10 [17605315.391451] [<ffffffff8100aa32>] ? check_events+0x12/0x20 [17605315.391459] [<ffffffff81298d33>] ? security_file_permission+0x93/0xb0 [17605315.391466] [<ffffffff811749a1>] ? rw_verify_area+0x61/0xf0 [17605315.391473] [<ffffffff81174e80>] vfs_read+0xb0/0x180 [17605315.391479] [<ffffffff81174f9a>] sys_read+0x4a/0x90 [17605315.391488] [<ffffffff8165d8c2>] system_call_fastpath+0x16/0x1b [17605315.391494] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89 46 10 c3 0f 1f 80 00 00 00 00 48 89 32 eb b2 0f 1f 00 48 89 70 10 eb a9 66 90 55 <48> 8b 17 48 89 e5 48 89 d0 48 83 e0 fc 48 39 c7 74 34 48 8b 47 [17605315.391577] RIP [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391583] RSP <ffff8801d2659c18> [17605315.391587] CR2: 0000000000000010 [17605315.391596] ---[ end trace 586cfae3c9e3e67e ]--- The stack trace is identical between the two kernels. I am unable to find any reference to this on Ubuntu, Xen, or kernel forums or mailing lists but it's repeatable even on freshly installed m1.large instances on EC2. ProblemType: Bug DistroRelease: Ubuntu 12.04 Package: linux-image-3.2.0-24-virtual 3.2.0-24.37 ProcVersionSignature: Ubuntu 3.2.0-24.37-virtual 3.2.14 Uname: Linux 3.2.0-24-virtual x86_64 AcpiTables: AlsaDevices:  total 0  crw-rw---T 1 root audio 116, 1 May 7 09:58 seq  crw-rw---T 1 root audio 116, 33 May 7 09:58 timer AplayDevices: aplay: device_list:252: no soundcards found... ApportVersion: 2.0.1-0ubuntu7 Architecture: amd64 ArecordDevices: arecord: device_list:252: no soundcards found... AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found. Date: Tue May 15 15:23:54 2012 Ec2AMI: ami-fd1c2789 Ec2AMIManifest: ubuntu-eu-west-1/images-testing/ubuntu-precise-daily-amd64-desktop-20120420.manifest.xml Ec2AvailabilityZone: eu-west-1b Ec2InstanceType: m1.large Ec2Kernel: aki-62695816 Ec2Ramdisk: unavailable IwConfig:  lo no wireless extensions.  eth0 no wireless extensions. Lspci: Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99 PciMultimedia: ProcEnviron:  TERM=xterm-256color  PATH=(custom, no user)  LANG=en_US.UTF-8  SHELL=/bin/bash ProcFB: ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0 PulseList:  Error: command ['pacmd', 'list'] failed with exit code 1: Home directory /home/mydrive not ours.  No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions:  linux-restricted-modules-3.2.0-24-virtual N/A  linux-backports-modules-3.2.0-24-virtual N/A  linux-firmware 1.79 RfKill: SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) WifiSyslog:
2012-06-11 22:15:00 Ken bug added subscriber Ken
2012-07-03 12:42:09 Stefan Bader attachment added 0001-sched-Fix-race-in-task_group.patch https://bugs.launchpad.net/ubuntu/+source/linux/+bug/999755/+attachment/3211939/+files/0001-sched-Fix-race-in-task_group.patch
2012-07-03 12:42:25 Stefan Bader nominated for series Ubuntu Natty
2012-07-03 12:42:25 Stefan Bader bug task added linux (Ubuntu Natty)
2012-07-03 12:42:25 Stefan Bader nominated for series Ubuntu Oneiric
2012-07-03 12:42:25 Stefan Bader bug task added linux (Ubuntu Oneiric)
2012-07-03 12:42:25 Stefan Bader nominated for series Ubuntu Precise
2012-07-03 12:42:25 Stefan Bader bug task added linux (Ubuntu Precise)
2012-07-03 12:43:16 Stefan Bader linux (Ubuntu): status Confirmed Triaged
2012-07-03 12:49:49 Stefan Bader description Testcase: 1. apt-get install build-essential ruby-1.9.3 screen 2. gem install chef 3. in screen session: while true; do ohai; done --- We have a number of small and large instances running the release version of 12.04. The small instances have been completely stable. However, every large instance we have has crashed at a seemingly random interval. This is repeatable on individual systems, though not within a defined time period. It appears to be triggered by our half hourly run of OpsCode's chef-client. We tried running the client in a tight loop to recreate the crash but were unable to get it to do so in a short time period. It still took two days to crash again. This was affecting the 3.2.0-23-virtual kernel, so we updated to the 3.2.0-24-virtual kernel but still have found the same crash. The only information available in the system logs is: [17605315.391128] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [17605315.391148] IP: [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391163] PGD 1d2fdc067 PUD 1d0e3c067 PMD 0 [17605315.391172] Oops: 0000 [#1] SMP [17605315.391179] CPU 1 [17605315.391182] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables isofs acpiphp [17605315.391209] [17605315.391214] Pid: 28794, comm: chef-client Not tainted 3.2.0-23-virtual #36-Ubuntu [17605315.391223] RIP: e030:[<ffffffff8130d7f1>] [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391232] RSP: e02b:ffff8801d2659c18 EFLAGS: 00010046 [17605315.391238] RAX: 0000000000000000 RBX: ffff8801d2eb5a00 RCX: 0000000000000000 [17605315.391244] RDX: fffffffffffffff0 RSI: 0000000000000000 RDI: 0000000000000010 [17605315.391250] RBP: ffff8801d2659c48 R08: 0000000000000000 R09: 0000000000000000 [17605315.391255] R10: ffff8801dff866c0 R11: 0000000000000001 R12: 0000000000000000 [17605315.391263] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000033b9e28 [17605315.391274] FS: 00007fee8cc10700(0000) GS:ffff8801dff8f000(0000) knlGS:0000000000000000 [17605315.391281] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [17605315.391287] CR2: 0000000000000010 CR3: 00000001d2a0b000 CR4: 0000000000002660 [17605315.391294] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [17605315.391301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [17605315.391308] Process chef-client (pid: 28794, threadinfo ffff8801d2658000, task ffff8801d0870000) [17605315.391315] Stack: [17605315.391319] ffff8801d2659c48 ffffffff8104ece9 ffff8801d2eb5a00 ffff8801dffa26c0 [17605315.391331] ffff8801d2eb5200 0000000000000000 ffff8801d2659c78 ffffffff810544b8 [17605315.391343] ffff8801d2659c78 ffff8801dffa26c0 0000000000000001 ffff8801d08703a8 [17605315.391354] Call Trace: [17605315.391364] [<ffffffff8104ece9>] ? pick_next_entity+0xb9/0xe0 [17605315.391373] [<ffffffff810544b8>] pick_next_task_fair+0x38/0x70 [17605315.391382] [<ffffffff81652ddc>] __schedule+0x14c/0x6f0 [17605315.391391] [<ffffffff816554ee>] ? _raw_spin_unlock_irqrestore+0x1e/0x30 [17605315.391399] [<ffffffff8165344f>] schedule+0x3f/0x60 [17605315.391408] [<ffffffff8117e119>] pipe_wait+0x59/0x80 [17605315.391417] [<ffffffff81089340>] ? add_wait_queue+0x60/0x60 [17605315.391425] [<ffffffff8117e87a>] pipe_read+0x1da/0x330 [17605315.391433] [<ffffffff81174522>] do_sync_read+0xd2/0x110 [17605315.391443] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10 [17605315.391451] [<ffffffff8100aa32>] ? check_events+0x12/0x20 [17605315.391459] [<ffffffff81298d33>] ? security_file_permission+0x93/0xb0 [17605315.391466] [<ffffffff811749a1>] ? rw_verify_area+0x61/0xf0 [17605315.391473] [<ffffffff81174e80>] vfs_read+0xb0/0x180 [17605315.391479] [<ffffffff81174f9a>] sys_read+0x4a/0x90 [17605315.391488] [<ffffffff8165d8c2>] system_call_fastpath+0x16/0x1b [17605315.391494] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89 46 10 c3 0f 1f 80 00 00 00 00 48 89 32 eb b2 0f 1f 00 48 89 70 10 eb a9 66 90 55 <48> 8b 17 48 89 e5 48 89 d0 48 83 e0 fc 48 39 c7 74 34 48 8b 47 [17605315.391577] RIP [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391583] RSP <ffff8801d2659c18> [17605315.391587] CR2: 0000000000000010 [17605315.391596] ---[ end trace 586cfae3c9e3e67e ]--- The stack trace is identical between the two kernels. I am unable to find any reference to this on Ubuntu, Xen, or kernel forums or mailing lists but it's repeatable even on freshly installed m1.large instances on EC2. ProblemType: Bug DistroRelease: Ubuntu 12.04 Package: linux-image-3.2.0-24-virtual 3.2.0-24.37 ProcVersionSignature: Ubuntu 3.2.0-24.37-virtual 3.2.14 Uname: Linux 3.2.0-24-virtual x86_64 AcpiTables: AlsaDevices:  total 0  crw-rw---T 1 root audio 116, 1 May 7 09:58 seq  crw-rw---T 1 root audio 116, 33 May 7 09:58 timer AplayDevices: aplay: device_list:252: no soundcards found... ApportVersion: 2.0.1-0ubuntu7 Architecture: amd64 ArecordDevices: arecord: device_list:252: no soundcards found... AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found. Date: Tue May 15 15:23:54 2012 Ec2AMI: ami-fd1c2789 Ec2AMIManifest: ubuntu-eu-west-1/images-testing/ubuntu-precise-daily-amd64-desktop-20120420.manifest.xml Ec2AvailabilityZone: eu-west-1b Ec2InstanceType: m1.large Ec2Kernel: aki-62695816 Ec2Ramdisk: unavailable IwConfig:  lo no wireless extensions.  eth0 no wireless extensions. Lspci: Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99 PciMultimedia: ProcEnviron:  TERM=xterm-256color  PATH=(custom, no user)  LANG=en_US.UTF-8  SHELL=/bin/bash ProcFB: ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0 PulseList:  Error: command ['pacmd', 'list'] failed with exit code 1: Home directory /home/mydrive not ours.  No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions:  linux-restricted-modules-3.2.0-24-virtual N/A  linux-backports-modules-3.2.0-24-virtual N/A  linux-firmware 1.79 RfKill: SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) WifiSyslog: SRU Justification: Impact: If tasks use the setsid call a lot (which places them into separate tasks groups), they may trigger a race that can cause a access violation in the scheduler code that crashes the kernel. Kernels after v3.3 avoid inconsistencies and do not crash while the race is still present. Fix: The attached patch resolves the race and should make its way upstream. Proposing to apply it pre-stable due to the potential of crashes and after successful verification locally. Testcase: 1. apt-get install build-essential ruby-1.9.3 screen 2. gem install chef 3. in screen session: while true; do ohai; done --- We have a number of small and large instances running the release version of 12.04. The small instances have been completely stable. However, every large instance we have has crashed at a seemingly random interval. This is repeatable on individual systems, though not within a defined time period. It appears to be triggered by our half hourly run of OpsCode's chef-client. We tried running the client in a tight loop to recreate the crash but were unable to get it to do so in a short time period. It still took two days to crash again. This was affecting the 3.2.0-23-virtual kernel, so we updated to the 3.2.0-24-virtual kernel but still have found the same crash. The only information available in the system logs is: [17605315.391128] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [17605315.391148] IP: [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391163] PGD 1d2fdc067 PUD 1d0e3c067 PMD 0 [17605315.391172] Oops: 0000 [#1] SMP [17605315.391179] CPU 1 [17605315.391182] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables isofs acpiphp [17605315.391209] [17605315.391214] Pid: 28794, comm: chef-client Not tainted 3.2.0-23-virtual #36-Ubuntu [17605315.391223] RIP: e030:[<ffffffff8130d7f1>] [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391232] RSP: e02b:ffff8801d2659c18 EFLAGS: 00010046 [17605315.391238] RAX: 0000000000000000 RBX: ffff8801d2eb5a00 RCX: 0000000000000000 [17605315.391244] RDX: fffffffffffffff0 RSI: 0000000000000000 RDI: 0000000000000010 [17605315.391250] RBP: ffff8801d2659c48 R08: 0000000000000000 R09: 0000000000000000 [17605315.391255] R10: ffff8801dff866c0 R11: 0000000000000001 R12: 0000000000000000 [17605315.391263] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000033b9e28 [17605315.391274] FS: 00007fee8cc10700(0000) GS:ffff8801dff8f000(0000) knlGS:0000000000000000 [17605315.391281] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [17605315.391287] CR2: 0000000000000010 CR3: 00000001d2a0b000 CR4: 0000000000002660 [17605315.391294] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [17605315.391301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [17605315.391308] Process chef-client (pid: 28794, threadinfo ffff8801d2658000, task ffff8801d0870000) [17605315.391315] Stack: [17605315.391319] ffff8801d2659c48 ffffffff8104ece9 ffff8801d2eb5a00 ffff8801dffa26c0 [17605315.391331] ffff8801d2eb5200 0000000000000000 ffff8801d2659c78 ffffffff810544b8 [17605315.391343] ffff8801d2659c78 ffff8801dffa26c0 0000000000000001 ffff8801d08703a8 [17605315.391354] Call Trace: [17605315.391364] [<ffffffff8104ece9>] ? pick_next_entity+0xb9/0xe0 [17605315.391373] [<ffffffff810544b8>] pick_next_task_fair+0x38/0x70 [17605315.391382] [<ffffffff81652ddc>] __schedule+0x14c/0x6f0 [17605315.391391] [<ffffffff816554ee>] ? _raw_spin_unlock_irqrestore+0x1e/0x30 [17605315.391399] [<ffffffff8165344f>] schedule+0x3f/0x60 [17605315.391408] [<ffffffff8117e119>] pipe_wait+0x59/0x80 [17605315.391417] [<ffffffff81089340>] ? add_wait_queue+0x60/0x60 [17605315.391425] [<ffffffff8117e87a>] pipe_read+0x1da/0x330 [17605315.391433] [<ffffffff81174522>] do_sync_read+0xd2/0x110 [17605315.391443] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10 [17605315.391451] [<ffffffff8100aa32>] ? check_events+0x12/0x20 [17605315.391459] [<ffffffff81298d33>] ? security_file_permission+0x93/0xb0 [17605315.391466] [<ffffffff811749a1>] ? rw_verify_area+0x61/0xf0 [17605315.391473] [<ffffffff81174e80>] vfs_read+0xb0/0x180 [17605315.391479] [<ffffffff81174f9a>] sys_read+0x4a/0x90 [17605315.391488] [<ffffffff8165d8c2>] system_call_fastpath+0x16/0x1b [17605315.391494] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89 46 10 c3 0f 1f 80 00 00 00 00 48 89 32 eb b2 0f 1f 00 48 89 70 10 eb a9 66 90 55 <48> 8b 17 48 89 e5 48 89 d0 48 83 e0 fc 48 39 c7 74 34 48 8b 47 [17605315.391577] RIP [<ffffffff8130d7f1>] rb_next+0x1/0x50 [17605315.391583] RSP <ffff8801d2659c18> [17605315.391587] CR2: 0000000000000010 [17605315.391596] ---[ end trace 586cfae3c9e3e67e ]--- The stack trace is identical between the two kernels. I am unable to find any reference to this on Ubuntu, Xen, or kernel forums or mailing lists but it's repeatable even on freshly installed m1.large instances on EC2. ProblemType: Bug DistroRelease: Ubuntu 12.04 Package: linux-image-3.2.0-24-virtual 3.2.0-24.37 ProcVersionSignature: Ubuntu 3.2.0-24.37-virtual 3.2.14 Uname: Linux 3.2.0-24-virtual x86_64 AcpiTables: AlsaDevices:  total 0  crw-rw---T 1 root audio 116, 1 May 7 09:58 seq  crw-rw---T 1 root audio 116, 33 May 7 09:58 timer AplayDevices: aplay: device_list:252: no soundcards found... ApportVersion: 2.0.1-0ubuntu7 Architecture: amd64 ArecordDevices: arecord: device_list:252: no soundcards found... AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found. Date: Tue May 15 15:23:54 2012 Ec2AMI: ami-fd1c2789 Ec2AMIManifest: ubuntu-eu-west-1/images-testing/ubuntu-precise-daily-amd64-desktop-20120420.manifest.xml Ec2AvailabilityZone: eu-west-1b Ec2InstanceType: m1.large Ec2Kernel: aki-62695816 Ec2Ramdisk: unavailable IwConfig:  lo no wireless extensions.  eth0 no wireless extensions. Lspci: Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99 PciMultimedia: ProcEnviron:  TERM=xterm-256color  PATH=(custom, no user)  LANG=en_US.UTF-8  SHELL=/bin/bash ProcFB: ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0 PulseList:  Error: command ['pacmd', 'list'] failed with exit code 1: Home directory /home/mydrive not ours.  No PulseAudio daemon running, or not running as session daemon. RelatedPackageVersions:  linux-restricted-modules-3.2.0-24-virtual N/A  linux-backports-modules-3.2.0-24-virtual N/A  linux-firmware 1.79 RfKill: SourcePackage: linux UpgradeStatus: No upgrade log present (probably fresh install) WifiSyslog:
2012-07-03 12:50:18 Stefan Bader bug added subscriber Ubuntu Stable Release Updates Team
2012-07-10 08:00:36 Sascha Konietzke bug added subscriber Sascha Konietzke
2012-07-17 19:09:49 Tim Gardner linux (Ubuntu Oneiric): status New Fix Committed
2012-07-17 19:09:54 Tim Gardner linux (Ubuntu Precise): status New Fix Committed
2012-07-17 19:10:16 Tim Gardner linux (Ubuntu Natty): status New In Progress
2012-07-17 19:10:33 Tim Gardner linux (Ubuntu Natty): assignee Stefan Bader (stefan-bader-canonical)
2012-07-19 14:30:01 Armin Bauer bug added subscriber Armin Bauer
2012-07-26 02:28:43 Launchpad Janitor branch linked lp:ubuntu/lucid-proposed/linux-lts-backport-oneiric
2012-07-30 09:23:40 Luis Henriques tags amd64 apport-bug ec2-images precise amd64 apport-bug ec2-images precise verification-needed-oneiric
2012-07-30 09:32:01 Luis Henriques tags amd64 apport-bug ec2-images precise verification-needed-oneiric amd64 apport-bug ec2-images precise verification-needed-oneiric verification-needed-precise
2012-07-30 14:24:24 Launchpad Janitor branch linked lp:ubuntu/precise-proposed/linux-armadaxp
2012-07-31 04:03:13 Launchpad Janitor branch linked lp:ubuntu/oneiric-proposed/linux-ti-omap4
2012-07-31 21:10:18 Karl Matthias tags amd64 apport-bug ec2-images precise verification-needed-oneiric verification-needed-precise amd64 apport-bug ec2-images precise verification-done-precise verification-needed-oneiric
2012-08-01 15:34:56 Launchpad Janitor branch linked lp:ubuntu/precise-proposed/linux-ti-omap4
2012-08-01 16:59:14 Luis Henriques bug added subscriber Luis Henriques
2012-08-03 09:22:52 Luis Henriques tags amd64 apport-bug ec2-images precise verification-done-precise verification-needed-oneiric amd64 apport-bug ec2-images precise verification-done-oneiric verification-done-precise
2012-08-10 00:35:07 Adam Conrad removed subscriber Ubuntu Stable Release Updates Team
2012-08-10 00:35:11 Launchpad Janitor linux (Ubuntu Oneiric): status Fix Committed Fix Released
2012-08-10 00:35:11 Launchpad Janitor cve linked 2012-2136
2012-08-10 00:35:11 Launchpad Janitor cve linked 2012-3375
2012-08-10 00:46:10 Launchpad Janitor linux (Ubuntu Precise): status Fix Committed Fix Released
2012-08-10 00:46:10 Launchpad Janitor cve linked 2012-2372
2012-08-10 00:46:10 Launchpad Janitor cve linked 2012-2669
2012-08-10 04:19:45 Ubuntu Foundations Team Bug Bot tags amd64 apport-bug ec2-images precise verification-done-oneiric verification-done-precise amd64 apport-bug ec2-images patch precise verification-done-oneiric verification-done-precise
2012-10-23 14:14:49 Stefan Bader linux (Ubuntu Natty): status In Progress Won't Fix
2012-10-23 14:15:01 Stefan Bader linux (Ubuntu Natty): assignee Stefan Bader (stefan-bader-canonical)
2012-10-23 14:15:25 Stefan Bader linux (Ubuntu): status Triaged Fix Released
2012-11-14 21:30:44 Launchpad Janitor branch linked lp:ubuntu/precise-proposed/linux-lowlatency
2013-05-07 17:21:21 Launchpad Janitor branch linked lp:ubuntu/lucid-security/linux-lts-backport-oneiric
2014-01-15 21:39:30 Launchpad Janitor branch linked lp:ubuntu/lucid-updates/linux-lts-backport-oneiric
2014-04-28 01:49:39 Angel Olivera bug added subscriber Angel Olivera
2016-11-28 13:35:14 MaxWellxxx bug added subscriber MaxWellxxx