Kernel crash in rb_next doing ohai loops

Bug #999755 reported by Karl Matthias on 2012-05-15
36
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Natty
Undecided
Unassigned
Oneiric
Undecided
Unassigned
Precise
Undecided
Unassigned

Bug Description

SRU Justification:

Impact:
If tasks use the setsid call a lot (which places them into separate tasks groups), they may trigger a race that can cause a access violation in the scheduler code that crashes the kernel. Kernels after v3.3 avoid inconsistencies and do not crash while the race is still present.

Fix:
The attached patch resolves the race and should make its way upstream. Proposing to apply it pre-stable due to the potential of crashes and after successful verification locally.

Testcase:
1. apt-get install build-essential ruby-1.9.3 screen
2. gem install chef
3. in screen session: while true; do ohai; done

---

We have a number of small and large instances running the release version of 12.04. The small instances have been completely stable. However, every large instance we have has crashed at a seemingly random interval. This is repeatable on individual systems, though not within a defined time period. It appears to be triggered by our half hourly run of OpsCode's chef-client. We tried running the client in a tight loop to recreate the crash but were unable to get it to do so in a short time period. It still took two days to crash again.

This was affecting the 3.2.0-23-virtual kernel, so we updated to the 3.2.0-24-virtual kernel but still have found the same crash. The only information available in the system logs is:

[17605315.391128] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[17605315.391148] IP: [<ffffffff8130d7f1>] rb_next+0x1/0x50
[17605315.391163] PGD 1d2fdc067 PUD 1d0e3c067 PMD 0
[17605315.391172] Oops: 0000 [#1] SMP
[17605315.391179] CPU 1
[17605315.391182] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables isofs acpiphp
[17605315.391209]
[17605315.391214] Pid: 28794, comm: chef-client Not tainted 3.2.0-23-virtual #36-Ubuntu
[17605315.391223] RIP: e030:[<ffffffff8130d7f1>] [<ffffffff8130d7f1>] rb_next+0x1/0x50
[17605315.391232] RSP: e02b:ffff8801d2659c18 EFLAGS: 00010046
[17605315.391238] RAX: 0000000000000000 RBX: ffff8801d2eb5a00 RCX: 0000000000000000
[17605315.391244] RDX: fffffffffffffff0 RSI: 0000000000000000 RDI: 0000000000000010
[17605315.391250] RBP: ffff8801d2659c48 R08: 0000000000000000 R09: 0000000000000000
[17605315.391255] R10: ffff8801dff866c0 R11: 0000000000000001 R12: 0000000000000000
[17605315.391263] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000033b9e28
[17605315.391274] FS: 00007fee8cc10700(0000) GS:ffff8801dff8f000(0000) knlGS:0000000000000000
[17605315.391281] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[17605315.391287] CR2: 0000000000000010 CR3: 00000001d2a0b000 CR4: 0000000000002660
[17605315.391294] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[17605315.391301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[17605315.391308] Process chef-client (pid: 28794, threadinfo ffff8801d2658000, task ffff8801d0870000)
[17605315.391315] Stack:
[17605315.391319] ffff8801d2659c48 ffffffff8104ece9 ffff8801d2eb5a00 ffff8801dffa26c0
[17605315.391331] ffff8801d2eb5200 0000000000000000 ffff8801d2659c78 ffffffff810544b8
[17605315.391343] ffff8801d2659c78 ffff8801dffa26c0 0000000000000001 ffff8801d08703a8
[17605315.391354] Call Trace:
[17605315.391364] [<ffffffff8104ece9>] ? pick_next_entity+0xb9/0xe0
[17605315.391373] [<ffffffff810544b8>] pick_next_task_fair+0x38/0x70
[17605315.391382] [<ffffffff81652ddc>] __schedule+0x14c/0x6f0
[17605315.391391] [<ffffffff816554ee>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[17605315.391399] [<ffffffff8165344f>] schedule+0x3f/0x60
[17605315.391408] [<ffffffff8117e119>] pipe_wait+0x59/0x80
[17605315.391417] [<ffffffff81089340>] ? add_wait_queue+0x60/0x60
[17605315.391425] [<ffffffff8117e87a>] pipe_read+0x1da/0x330
[17605315.391433] [<ffffffff81174522>] do_sync_read+0xd2/0x110
[17605315.391443] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10
[17605315.391451] [<ffffffff8100aa32>] ? check_events+0x12/0x20
[17605315.391459] [<ffffffff81298d33>] ? security_file_permission+0x93/0xb0
[17605315.391466] [<ffffffff811749a1>] ? rw_verify_area+0x61/0xf0
[17605315.391473] [<ffffffff81174e80>] vfs_read+0xb0/0x180
[17605315.391479] [<ffffffff81174f9a>] sys_read+0x4a/0x90
[17605315.391488] [<ffffffff8165d8c2>] system_call_fastpath+0x16/0x1b
[17605315.391494] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89 46 10 c3 0f 1f 80 00 00 00 00 48 89 32 eb b2 0f 1f 00 48 89 70 10 eb a9 66 90 55 <48> 8b 17 48 89 e5 48 89 d0 48 83 e0 fc 48 39 c7 74 34 48 8b 47
[17605315.391577] RIP [<ffffffff8130d7f1>] rb_next+0x1/0x50
[17605315.391583] RSP <ffff8801d2659c18>
[17605315.391587] CR2: 0000000000000010
[17605315.391596] ---[ end trace 586cfae3c9e3e67e ]---

The stack trace is identical between the two kernels. I am unable to find any reference to this on Ubuntu, Xen, or kernel forums or mailing lists but it's repeatable even on freshly installed m1.large instances on EC2.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.2.0-24-virtual 3.2.0-24.37
ProcVersionSignature: Ubuntu 3.2.0-24.37-virtual 3.2.14
Uname: Linux 3.2.0-24-virtual x86_64
AcpiTables:

AlsaDevices:
 total 0
 crw-rw---T 1 root audio 116, 1 May 7 09:58 seq
 crw-rw---T 1 root audio 116, 33 May 7 09:58 timer
AplayDevices: aplay: device_list:252: no soundcards found...
ApportVersion: 2.0.1-0ubuntu7
Architecture: amd64
ArecordDevices: arecord: device_list:252: no soundcards found...
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Date: Tue May 15 15:23:54 2012
Ec2AMI: ami-fd1c2789
Ec2AMIManifest: ubuntu-eu-west-1/images-testing/ubuntu-precise-daily-amd64-desktop-20120420.manifest.xml
Ec2AvailabilityZone: eu-west-1b
Ec2InstanceType: m1.large
Ec2Kernel: aki-62695816
Ec2Ramdisk: unavailable
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
PciMultimedia:

ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0
PulseList:
 Error: command ['pacmd', 'list'] failed with exit code 1: Home directory /home/mydrive not ours.
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-24-virtual N/A
 linux-backports-modules-3.2.0-24-virtual N/A
 linux-firmware 1.79
RfKill:

SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
WifiSyslog:

Karl Matthias (0k53-karl) wrote :

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 999755

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → Medium

The crash itself seems to point at a broken task structures tree while scheduling during a pipe read. Though this is rather cause not reason. Right now, nothing obvious strikes. There were a few Xen related patches waiting to come in through stable but I am not sure those would help here.
Meanwhile it would be good if we could gather one or two more crash stack traces along with "Xen version" grabbed from dmesg (which is the version of the hypervisor running on the host).

Karl Matthias (0k53-karl) wrote :
Download full text (3.4 KiB)

Hi Stefan, OK. Here's another stack trace:

[47708.053788] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[47708.053810] IP: [<ffffffff8130d7f1>] rb_next+0x1/0x50
[47708.053824] PGD 1d0b83067 PUD 1d0e64067 PMD 0
[47708.053833] Oops: 0000 [#1] SMP
[47708.053840] CPU 1
[47708.053843] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables parport_pc ppdev lp parport isofs acpiphp
[47708.053874]
[47708.053878] Pid: 1190, comm: chef-client Not tainted 3.2.0-24-virtual #37-Ubuntu
[47708.053888] RIP: e030:[<ffffffff8130d7f1>] [<ffffffff8130d7f1>] rb_next+0x1/0x50
[47708.053896] RSP: e02b:ffff8801d2e99c18 EFLAGS: 00010046
[47708.053901] RAX: 0000000000000000 RBX: ffff8801d2745800 RCX: 0000000000000000
[47708.053907] RDX: fffffffffffffff0 RSI: 0000000000000000 RDI: 0000000000000010
[47708.053913] RBP: ffff8801d2e99c48 R08: 0000000000000000 R09: 0000000000000000
[47708.053919] R10: ffff8801dff866c0 R11: 0000000000000001 R12: 0000000000000000
[47708.053925] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000240db78
[47708.053935] FS: 00007fc872779700(0000) GS:ffff8801dff8f000(0000) knlGS:0000000000000000
[47708.053943] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[47708.053948] CR2: 0000000000000010 CR3: 00000001d0b3d000 CR4: 0000000000002660
[47708.053955] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[47708.053961] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[47708.053967] Process chef-client (pid: 1190, threadinfo ffff8801d2e98000, task ffff8801d24d2dc0)
[47708.053974] Stack:
[47708.053978] ffff8801d2e99c48 ffffffff8104ece9 ffff8801d2745800 ffff8801dffa26c0
[47708.053990] ffff8801d2744a00 0000000000000000 ffff8801d2e99c78 ffffffff810544b8
[47708.054001] ffff8801d2e99c78 ffff8801dffa26c0 0000000000000001 ffff8801d24d3168
[47708.054013] Call Trace:
[47708.054022] [<ffffffff8104ece9>] ? pick_next_entity+0xb9/0xe0
[47708.054030] [<ffffffff810544b8>] pick_next_task_fair+0x38/0x70
[47708.054038] [<ffffffff81652ddc>] __schedule+0x14c/0x6f0
[47708.054045] [<ffffffff816554ee>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[47708.054052] [<ffffffff8165344f>] schedule+0x3f/0x60
[47708.054060] [<ffffffff8117e119>] pipe_wait+0x59/0x80
[47708.054067] [<ffffffff81089340>] ? add_wait_queue+0x60/0x60
[47708.054074] [<ffffffff8117e87a>] pipe_read+0x1da/0x330
[47708.054081] [<ffffffff81174522>] do_sync_read+0xd2/0x110
[47708.054088] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10
[47708.054095] [<ffffffff8100aa32>] ? check_events+0x12/0x20
[47708.054103] [<ffffffff81298d33>] ? security_file_permission+0x93/0xb0
[47708.054110] [<ffffffff811749a1>] ? rw_verify_area+0x61/0xf0
[47708.054116] [<ffffffff81174e80>] vfs_read+0xb0/0x180
[47708.054122] [<ffffffff81174f9a>] sys_read+0x4a/0x90
[47708.054129] [<ffffffff8165d8c2>] system_call_fastpath+0x16/0x1b
[47708.054134] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89 46 10 c3 0f 1f 80 00 00 00 00 48 89 32 eb b2 0f 1f 00 48 89 70 10 eb a9 66 90 55 <48> 8b 17 48 89 e5 48 89 d0 48 83 e0 fc 48 39 c7 74 34 48 8b 47
[47708.054224] RIP [<fff...

Read more...

Karl Matthias (0k53-karl) wrote :
Download full text (3.3 KiB)

Here's another one if it helps:

[848423.023156] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[848423.023180] IP: [<ffffffff8130d7f1>] rb_next+0x1/0x50
[848423.023194] PGD 18ad83067 PUD 18ad82067 PMD 0
[848423.023203] Oops: 0000 [#1] SMP
[848423.023210] CPU 1
[848423.023213] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables parport_pc ppdev lp parport isofs acpiphp
[848423.023245]
[848423.023249] Pid: 18572, comm: chef-client Not tainted 3.2.0-24-virtual #37-Ubuntu
[848423.023258] RIP: e030:[<ffffffff8130d7f1>] [<ffffffff8130d7f1>] rb_next+0x1/0x50
[848423.023267] RSP: e02b:ffff88018af21c18 EFLAGS: 00010046
[848423.023273] RAX: 0000000000000000 RBX: ffff8801a125ce00 RCX: 0000000000000000
[848423.023278] RDX: fffffffffffffff0 RSI: 0000000000000000 RDI: 0000000000000010
[848423.023285] RBP: ffff88018af21c48 R08: 0000000000000000 R09: 0000000000000000
[848423.023291] R10: ffff8801dff866c0 R11: 0000000000000001 R12: 0000000000000000
[848423.023299] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000003a5a9d0
[848423.023311] FS: 00007fb6bdc16700(0000) GS:ffff8801dff8f000(0000) knlGS:0000000000000000
[848423.023319] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[848423.023325] CR2: 0000000000000010 CR3: 00000001a12e8000 CR4: 0000000000002660
[848423.023332] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[848423.023339] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[848423.023346] Process chef-client (pid: 18572, threadinfo ffff88018af20000, task ffff8801a12f8000)
[848423.023353] Stack:
[848423.023356] ffff88018af21c48 ffffffff8104ece9 ffff8801a125ce00 ffff8801dffa26c0
[848423.023369] ffff8801d29cc600 0000000000000000 ffff88018af21c78 ffffffff810544b8
[848423.023380] ffff88018af21c78 ffff8801dffa26c0 0000000000000001 ffff8801a12f83a8
[848423.023392] Call Trace:
[848423.023402] [<ffffffff8104ece9>] ? pick_next_entity+0xb9/0xe0
[848423.023410] [<ffffffff810544b8>] pick_next_task_fair+0x38/0x70
[848423.023419] [<ffffffff81652ddc>] __schedule+0x14c/0x6f0
[848423.023426] [<ffffffff816554ee>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[848423.023433] [<ffffffff8165344f>] schedule+0x3f/0x60
[848423.023441] [<ffffffff8117e119>] pipe_wait+0x59/0x80
[848423.023449] [<ffffffff81089340>] ? add_wait_queue+0x60/0x60
[848423.023456] [<ffffffff8117e87a>] pipe_read+0x1da/0x330
[848423.023463] [<ffffffff81174522>] do_sync_read+0xd2/0x110
[848423.023471] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10
[848423.023478] [<ffffffff8100aa32>] ? check_events+0x12/0x20
[848423.023486] [<ffffffff81298d33>] ? security_file_permission+0x93/0xb0
[848423.023493] [<ffffffff811749a1>] ? rw_verify_area+0x61/0xf0
[848423.023499] [<ffffffff81174e80>] vfs_read+0xb0/0x180
[848423.023505] [<ffffffff81174f9a>] sys_read+0x4a/0x90
[848423.023513] [<ffffffff8165d8c2>] system_call_fastpath+0x16/0x1b
[848423.023518] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89 46 10 c3 0f 1f 80 00 00 00 00 48 89 32 eb b2 0f 1f 00 48 89 70 10 eb a9 66 90 55 <48> 8b 17 48 89 e5 48 89 d0 48 83 e0 fc 48 39 c7 74 34 48...

Read more...

Karl Matthias (0k53-karl) wrote :

Forgot to post this earlier. Re: Brad's request for crash logs. apport-cli says "No pending crash reports. Try --help for more information."

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Gavin Heavyside (mydrive) wrote :
Download full text (4.8 KiB)

I've reproduced this by running the OHAI command from the OpsCode Chef ohai gem (0.6.12) in a loop, although it took nearly 2 days before it triggered. Basically I ran `gem install ohai; while true; do ohai; done` in a screen session.

 The stack trace is:

[18362917.357055] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[18362917.357079] IP: [<ffffffff8130d7f1>] rb_next+0x1/0x50
[18362917.357098] PGD 1d098d067 PUD 1d045b067 PMD 0
[18362917.357110] Oops: 0000 [#1] SMP
[18362917.357122] CPU 0
[18362917.357126] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables isofs acpiphp
[18362917.357152]
[18362917.357157] Pid: 21217, comm: ohai Not tainted 3.2.0-23-virtual #36-Ubuntu
[18362917.357166] RIP: e030:[<ffffffff8130d7f1>] [<ffffffff8130d7f1>] rb_next+0x1/0x50
[18362917.357176] RSP: e02b:ffff8801d22f1808 EFLAGS: 00010046
[18362917.357181] RAX: 0000000000000000 RBX: ffff8801d0842600 RCX: 0000000000000000
[18362917.357187] RDX: fffffffffffffff0 RSI: 0000000000000000 RDI: 0000000000000010
[18362917.357193] RBP: ffff8801d22f1838 R08: 0000000000000000 R09: 0000000000000000
[18362917.357199] R10: ffff8801dffa26c0 R11: 0000000000000001 R12: 0000000000000000
[18362917.357207] R13: 0000000000000000 R14: 0000000000000008 R15: ffff8801d0dc2d00
[18362917.357218] FS: 00007fcdb3810700(0000) GS:ffff8801dff73000(0000) knlGS:0000000000000000
[18362917.357225] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[18362917.357232] CR2: 0000000000000010 CR3: 00000001d2641000 CR4: 0000000000002660
[18362917.357240] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[18362917.357246] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[18362917.357253] Process ohai (pid: 21217, threadinfo ffff8801d22f0000, task ffff8801d0ad44a0)
[18362917.357261] Stack:
[18362917.357265] ffff8801d22f1838 ffffffff8104ece9 ffff8801d0842600 ffff8801dff866c0
[18362917.357277] ffff8801d0842e00 0000000000000008 ffff8801d22f1868 ffffffff810544b8
[18362917.357289] ffff8801d22f1868 ffff8801dff866c0 0000000000000000 ffff8801d0ad4848
[18362917.357301] Call Trace:
[18362917.357314] [<ffffffff8104ece9>] ? pick_next_entity+0xb9/0xe0
[18362917.357322] [<ffffffff810544b8>] pick_next_task_fair+0x38/0x70
[18362917.357331] [<ffffffff81652ddc>] __schedule+0x14c/0x6f0
[18362917.357341] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10
[18362917.357348] [<ffffffff8165344f>] schedule+0x3f/0x60
[18362917.357355] [<ffffffff8165456d>] schedule_hrtimeout_range_clock+0x14d/0x170
[18362917.357365] [<ffffffff8100aa1f>] ? xen_restore_fl_direct_reloc+0x4/0x4
[18362917.357373] [<ffffffff816554ee>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[18362917.357383] [<ffffffff8108932d>] ? add_wait_queue+0x4d/0x60
[18362917.357389] [<ffffffff816545a3>] schedule_hrtimeout_range+0x13/0x20
[18362917.357400] [<ffffffff811877a9>] poll_schedule_timeout+0x49/0x70
[18362917.357408] [<ffffffff81188326>] do_select+0x4d6/0x600
[18362917.357414] [<ffffffff811878b0>] ? poll_freewait+0xe0/0xe0
[18362917.357422] [<ffffffff811879a0>] ? __pollwait+0xf0/0xf0
[18362917.357431] [<fffffff...

Read more...

Gavin Heavyside (mydrive) wrote :

BTW Xen version from dmesg is:

 Xen version: 3.4.3-2.6.18 (preserve-AD)

This is on EC2 so we have no control over this.

Gavin Heavyside (mydrive) wrote :
Download full text (4.7 KiB)

Triggered this again by running ohai in a continuous loop, took about 24 hours to occur:

[18438803.627371] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[18438803.627388] IP: [<ffffffff8130d7f1>] rb_next+0x1/0x50
[18438803.627402] PGD 1d0efa067 PUD 1d232d067 PMD 0
[18438803.627411] Oops: 0000 [#1] SMP
[18438803.627419] CPU 1
[18438803.627422] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables isofs acpiphp
[18438803.627447]
[18438803.627452] Pid: 29083, comm: ohai Not tainted 3.2.0-23-virtual #36-Ubuntu
[18438803.627460] RIP: e030:[<ffffffff8130d7f1>] [<ffffffff8130d7f1>] rb_next+0x1/0x50
[18438803.627469] RSP: e02b:ffff8801d225d808 EFLAGS: 00010046
[18438803.627473] RAX: 0000000000000000 RBX: ffff8801d2232400 RCX: 0000000000000000
[18438803.627479] RDX: fffffffffffffff0 RSI: ffff8801dffa2760 RDI: 0000000000000010
[18438803.627485] RBP: ffff8801d225d838 R08: 0000000000000000 R09: 0000000000000000
[18438803.627490] R10: ffff8801dff866c0 R11: 0000000000000000 R12: 0000000000000000
[18438803.627497] R13: 0000000000000000 R14: 0000000000000280 R15: ffff8801d0992300
[18438803.627508] FS: 00007f34206c2700(0000) GS:ffff8801dff8f000(0000) knlGS:0000000000000000
[18438803.627515] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[18438803.627521] CR2: 0000000000000010 CR3: 00000001d0e9e000 CR4: 0000000000002660
[18438803.627527] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[18438803.627534] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[18438803.627541] Process ohai (pid: 29083, threadinfo ffff8801d225c000, task ffff8801d260adc0)
[18438803.627547] Stack:
[18438803.627551] ffff8801d225d838 ffffffff8104ece9 ffff8801d2232400 ffff8801dffa26c0
[18438803.627562] ffff8801d0f8fc00 0000000000000280 ffff8801d225d868 ffffffff810544b8
[18438803.627573] ffff8801d225d868 ffff8801dffa26c0 0000000000000001 ffff8801d260b168
[18438803.627584] Call Trace:
[18438803.627596] [<ffffffff8104ece9>] ? pick_next_entity+0xb9/0xe0
[18438803.627604] [<ffffffff810544b8>] pick_next_task_fair+0x38/0x70
[18438803.627861] [<ffffffff81652ddc>] __schedule+0x14c/0x6f0
[18438803.627874] [<ffffffff8111d335>] ? prep_new_page+0x145/0x1e0
[18438803.627881] [<ffffffff8165344f>] schedule+0x3f/0x60
[18438803.627889] [<ffffffff8165454c>] schedule_hrtimeout_range_clock+0x12c/0x170
[18438803.627901] [<ffffffff8108c890>] ? update_rmtp+0x70/0x70
[18438803.627908] [<ffffffff8108d684>] ? hrtimer_start_range_ns+0x14/0x20
[18438803.627916] [<ffffffff816545a3>] schedule_hrtimeout_range+0x13/0x20
[18438803.627927] [<ffffffff811877a9>] poll_schedule_timeout+0x49/0x70
[18438803.627934] [<ffffffff81188326>] do_select+0x4d6/0x600
[18438803.627942] [<ffffffff811878b0>] ? poll_freewait+0xe0/0xe0
[18438803.627949] [<ffffffff811879a0>] ? __pollwait+0xf0/0xf0
[18438803.627956] [<ffffffff811879a0>] ? __pollwait+0xf0/0xf0
[18438803.627966] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10
[18438803.627974] [<ffffffff8100aa32>] ? check_events+0x12/0x20
[18438803.627981] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10
[1843880...

Read more...

Stefan Bader (smb) wrote :

The Xen version is mostly to check for correlations. So far both times those have been reported it seemed to be a 3.4.3 version (not sure whether one can still hit other versions, but if then it would be great to know whether those are affected the same way). I know its not possible to know what one gets beforehand.

At least in all cases it is the same fatal condition. I suspect some pointer is unexpectedly NULL and then trying to reference an element at offset 0x10. It is in the scheduling code but at least there were two paths that seemed to lead there. So maybe it is not so much a specific action than scheduling new processes a lot...

This likely will be a beast to find as whatever is the cause is not the place it breaks. I'll try to isolate the exact instruction that breaks. Maybe it is possible to find out more by detecting the corruption and dumping more information. And also try to re-create this on a local system hopefully close enough to ec2...

Meanwhile if anybody can confirm breakage (or not) with other versions of Xen that may or may not be seen, please add to this report.

Stefan Bader (smb) wrote :

Also I see that a new kernel has just been pushed out. It cannot hurt doing tests running that (3.2.0-24.38).

Stefan Bader (smb) wrote :

Writing that I noticed that the traces differ based on different kernels. One set has -24.37 and the other -23.36. So that may just cause a slightly different timing or location of certain things...

Karl Matthias (0k53-karl) wrote :

Thanks, we're running tests on the 3.2.0-24.38 kernel now to see if we can get it to crash in the same way. Yes, this affected both the previous versions: -24.37 and -23.36.

Stefan Bader (smb) wrote :

So from the disassembly and the registers of the crash, it is clear that both variations at some point do a schedule which calls on to pick_next_task_fair(). As that calls into pick_next_entity() it can be assumed that (struct cfs_rq *)->nr_running is not 0.
But then (struct cfs_rq *)->rb_leftmost seem to be NULL (as well as (struct cfs_rq *)->skip).
The code in pick_next_entity() never checks for rb_leftmost to be NULL but I assume that this should never happen as long as the number of running entities is not 0, too.

This does not really help in understanding what goes wrong, but it should allow to create a special kernel that would dump the memory of the affected structure before dying. Might be interesting to see whether the areas around that pointer seem wrong as well. I have not heard, yet, of similar issues when not running as PVM guests on EC2. That points to some Xen interference but how I cannot explain.

Karl Matthias (0k53-karl) wrote :

Stefan, if you have a kernel build with memory dumps to help debug this, I'm happy to try it.

Stefan Bader (smb) wrote :

A first attempt to get more information can be found at http://people.canonical.com/~smb/lp999755/ (I took the older -24.37 version as that at least is confirmed to be broken). Up to now I was not able to cause a similar crash (or in fact any crash) locally. But then I am not sure I how to get the same environment (lots of things I never used before).

Karl Matthias (0k53-karl) wrote :

Excellent, thanks for that. I'll drop it on a box tomorrow (UK standard time).

Gavin Heavyside (mydrive) wrote :
Download full text (3.3 KiB)

We've also seen this on the -24.38 and -24.39 kernels now:

[56843.390534] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[56843.390551] IP: [<ffffffff8130d901>] rb_next+0x1/0x50
[56843.390566] PGD 1d20a7067 PUD 1d29a2067 PMD 0
[56843.390575] Oops: 0000 [#1] SMP
[56843.390583] CPU 1
[56843.390585] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables parport_pc ppdev lp parport isofs acpiphp
[56843.390616]
[56843.390621] Pid: 1340, comm: chef-client Not tainted 3.2.0-24-virtual #39-Ubuntu
[56843.390629] RIP: e030:[<ffffffff8130d901>] [<ffffffff8130d901>] rb_next+0x1/0x50
[56843.390638] RSP: e02b:ffff8801d050dc18 EFLAGS: 00010046
[56843.390642] RAX: 0000000000000000 RBX: ffff8801d24c8e00 RCX: 0000000000000000
[56843.390648] RDX: fffffffffffffff0 RSI: 0000000000000000 RDI: 0000000000000010
[56843.390654] RBP: ffff8801d050dc48 R08: 0000000000000000 R09: 0000000000000000
[56843.390659] R10: ffff8801dff866c0 R11: 0000000000000001 R12: 0000000000000000
[56843.390664] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000011168a8
[56843.390674] FS: 00007fda5feaa700(0000) GS:ffff8801dff8f000(0000) knlGS:0000000000000000
[56843.390681] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[56843.390686] CR2: 0000000000000010 CR3: 00000001d28cc000 CR4: 0000000000002660
[56843.390692] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[56843.390698] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[56843.390704] Process chef-client (pid: 1340, threadinfo ffff8801d050c000, task ffff8801d2405b80)
[56843.390711] Stack:
[56843.390714] ffff8801d050dc48 ffffffff8104ee19 ffff8801d24c8e00 ffff8801dffa26c0
[56843.390726] ffff8801d24c9400 0000000000000000 ffff8801d050dc78 ffffffff810545e8
[56843.390737] ffff8801d050dc78 ffff8801dffa26c0 0000000000000001 ffff8801d2405f28
[56843.390748] Call Trace:
[56843.390758] [<ffffffff8104ee19>] ? pick_next_entity+0xb9/0xe0
[56843.390765] [<ffffffff810545e8>] pick_next_task_fair+0x38/0x70
[56843.390774] [<ffffffff816530dc>] __schedule+0x14c/0x6f0
[56843.390781] [<ffffffff816557ee>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[56843.390789] [<ffffffff8165374f>] schedule+0x3f/0x60
[56843.390797] [<ffffffff8117e209>] pipe_wait+0x59/0x80
[56843.390804] [<ffffffff81089470>] ? add_wait_queue+0x60/0x60
[56843.390810] [<ffffffff8117e96a>] pipe_read+0x1da/0x330
[56843.390817] [<ffffffff81174612>] do_sync_read+0xd2/0x110
[56843.390824] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10
[56843.390831] [<ffffffff8100aa32>] ? check_events+0x12/0x20
[56843.390838] [<ffffffff81298dc3>] ? security_file_permission+0x93/0xb0
[56843.390845] [<ffffffff81174a91>] ? rw_verify_area+0x61/0xf0
[56843.390851] [<ffffffff81174f70>] vfs_read+0xb0/0x180
[56843.390857] [<ffffffff8117508a>] sys_read+0x4a/0x90
[56843.390863] [<ffffffff8165dbc2>] system_call_fastpath+0x16/0x1b
[56843.390869] Code: 89 06 48 8b 47 08 48 89 46 08 48 8b 47 10 48 89 46 10 c3 0f 1f 80 00 00 00 00 48 89 32 eb b2 0f 1f 00 48 89 70 10 eb a9 66 90 55 <48> 8b 17 48 89 e5 48 89 d0 48 83 e0 fc 48 39 c7 74 34 48 8b 47
[56843.390...

Read more...

Karl Matthias (0k53-karl) wrote :

We have the debugging kernel installed on a box that has been happily crashing for the last few days. We'll see if we can get a good debug for you from it. What would you like us to gather the next time it crashes?

Stefan Bader (smb) wrote :

It hopefully prints out the hex content of the data structure I want to check. But that should all be in the crash console output. Apart from that just the xen version is still the same 3.4.3-2.6.18 (preserve-AD) as before.

Gavin Heavyside (mydrive) wrote :
Download full text (4.0 KiB)

I've just reproduced this crash using the stock 3.2.0-24-39 kernel on VirtualBox on OS X (Lion).

I created a 2-CPU VM using the latest VirtualBox (4.1.16 r78094), for Ubuntu 64-bit, default 8GB disk. The steps I followed were:

* Install 64-bit 12.04 Server LTS, minimal install from ISO downloaded from ubuntu.com/download/server
* apt-get dist-upgrade and reboot
* apt-get install ruby1.9.3 screen
* gem install chef
* In a screen session:
  * while true; do ohai; done

After a while (less than an hour) I had reproduced the stack trace from this bug. I added "console=ttyS0" to the startup parameters and had VirtualBox log the serial output to a file so I could capture the stack trace (below).

This means this bug is nothing to do with Xen, and hopefully can be reproduced by developers more easily.

Stefan: If you could build a regular kernel I would be happy to try and capture more info for you.

Thanks,

Gavin.

[ 1119.141743] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[ 1119.145307] IP: [<ffffffff81311c81>] rb_next+0x1/0x50
[ 1119.145307] PGD 1be80067 PUD 1f17d067 PMD 0
[ 1119.145307] Oops: 0000 [#1] SMP
[ 1119.145307] CPU 0
[ 1119.145307] Modules linked in: vesafb ext2 snd_intel8x0 psmouse snd_ac97_codec ac97_bus snd_pcm snd_timer ppdev snd soundcore joydev snd_page_alloc serio_raw i2c_piix4 parport_pc mac_hid lp parport usbhid hid e1000
[ 1119.145307]
[ 1119.145307] Pid: 6768, comm: ohai Not tainted 3.2.0-24-generic #39-Ubuntu innotek GmbH VirtualBox
[ 1119.145307] RIP: 0010:[<ffffffff81311c81>] [<ffffffff81311c81>] rb_next+0x1/0x50
[ 1119.145307] RSP: 0018:ffff88001bd8dc18 EFLAGS: 00010046
[ 1119.145307] RAX: 0000000000000000 RBX: ffff88001f232400 RCX: 0000000000000000
[ 1119.145307] RDX: fffffffffffffff0 RSI: 0000000000000000 RDI: 0000000000000010
[ 1119.145307] RBP: ffff88001bd8dc48 R08: 0000000000000001 R09: 0000000000000000
[ 1119.145307] R10: ffff88001fd13780 R11: 0000000000000001 R12: 0000000000000000
[ 1119.145307] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000015010f8
[ 1119.145307] FS: 00007f6a721d7700(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
[ 1119.145307] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1119.145307] CR2: 0000000000000010 CR3: 000000001ce72000 CR4: 00000000000006f0
[ 1119.145307] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1119.145307] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1119.145307] Process ohai (pid: 6768, threadinfo ffff88001bd8c000, task ffff88001cafdbc0)
[ 1119.145307] Stack:
[ 1119.145307] ffff88001bd8dc48 ffffffff8104ff39 ffff88001f232400 ffff88001fc13780
[ 1119.145307] ffff88001f233a00 0000000000000000 ffff88001bd8dc78 ffffffff810561d8
[ 1119.145307] ffff88001bd8dc78 ffff88001fc13780 0000000000000000 ffff88001cafdf80
[ 1119.145307] Call Trace:
[ 1119.145307] [<ffffffff8104ff39>] ? pick_next_entity+0xb9/0xe0
[ 1119.145307] [<ffffffff810561d8>] pick_next_task_fair+0x38/0x70
[ 1119.145307] [<ffffffff81659fbc>] __schedule+0x14c/0x6f0
[ 1119.145307] [<ffffffff81045d18>] ? flush_tlb_page+0x48/0xb0
[ 1119.145307] [<ffffffff8165a88f>] schedule+0x3f/0x60
[ 1119.145307] [<ffffffff8118...

Read more...

We've got small EC2 instances (single processor) that haven't exhibited this behaviour, but we get it with large EC2 instances (2 CPUs); the VirtualBox machine I just reproduced it with was specifically set to 2 CPUS.

It seems to me that this bug might only occur on multi-cpu boxes?

summary: - Kernel crash on EC2 m1.large instances
+ Kernel crash on EC2 & VirtualBox
Gavin Heavyside (mydrive) wrote :
Download full text (3.2 KiB)

And we've just reproduced on EC2 with the debug kernel:

[248587.286290] ------------[ cut here ]------------
[248587.286765] kernel BUG at /home/smb/precise-amd64/ubuntu-2.6/kernel/sched_fair.c:1239!
[248587.286775] invalid opcode: 0000 [#1] SMP
[248587.286783] CPU 0
[248587.286786] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables bnep rfcomm bluetooth parport_pc ppdev lp parport isofs acpiphp
[248587.286822]
[248587.286827] Pid: 18805, comm: ohai Not tainted 3.2.0-24-virtual #37+lp999755dbg1
[248587.286836] RIP: e030:[<ffffffff81050cd5>] [<ffffffff81050cd5>] pick_next_entity+0x105/0x110
[248587.286849] RSP: e02b:ffff8801d02d7c28 EFLAGS: 00010096
[248587.286854] RAX: 000000000000002d RBX: ffff8801d20df800 RCX: 000000000003ffff
[248587.286860] RDX: 0000000000000000 RSI: ffffffff81e000a0 RDI: 0000000000000004
[248587.286866] RBP: ffff8801d02d7c48 R08: 000000000000000a R09: 0000000000000000
[248587.286872] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801dff866c0
[248587.286878] R13: ffff8801d20de600 R14: 0000000000000000 R15: 0000000001f41018
[248587.286889] FS: 00007f5ec2010700(0000) GS:ffff8801dff73000(0000) knlGS:0000000000000000
[248587.286895] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[248587.286901] CR2: 0000000001f09f30 CR3: 00000001cec3e000 CR4: 0000000000002660
[248587.286908] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[248587.286914] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[248587.286921] Process ohai (pid: 18805, threadinfo ffff8801d02d6000, task ffff8801cf10db80)
[248587.286928] Stack:
[248587.286931] ffff8801d20df800 ffff8801dff866c0 ffff8801d20de600 0000000000000000
[248587.286944] ffff8801d02d7c78 ffffffff810544e8 ffff8801d02d7c78 ffff8801dff866c0
[248587.286956] 0000000000000000 ffff8801cf10df28 ffff8801d02d7cf8 ffffffff81652f3c
[248587.286968] Call Trace:
[248587.286976] [<ffffffff810544e8>] pick_next_task_fair+0x38/0x70
[248587.286984] [<ffffffff81652f3c>] __schedule+0x14c/0x6f0
[248587.286992] [<ffffffff8165564e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[248587.286999] [<ffffffff816535af>] schedule+0x3f/0x60
[248587.287006] [<ffffffff8117e149>] pipe_wait+0x59/0x80
[248587.287014] [<ffffffff81089370>] ? add_wait_queue+0x60/0x60
[248587.287021] [<ffffffff8117e8aa>] pipe_read+0x1da/0x330
[248587.287028] [<ffffffff81174552>] do_sync_read+0xd2/0x110
[248587.287036] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10
[248587.287043] [<ffffffff8100aa32>] ? check_events+0x12/0x20
[248587.287051] [<ffffffff81298d63>] ? security_file_permission+0x93/0xb0
[248587.287057] [<ffffffff811749d1>] ? rw_verify_area+0x61/0xf0
[248587.287063] [<ffffffff81174eb0>] vfs_read+0xb0/0x180
[248587.287069] [<ffffffff81174fca>] sys_read+0x4a/0x90
[248587.287076] [<ffffffff8165da42>] system_call_fastpath+0x16/0x1b
[248587.287081] Code: 89 df e8 69 c1 5e 00 48 8b 73 38 48 c7 c7 7b ef 9f 81 31 c0 e8 98 c9 5e 00 48 8b 73 10 48 c7 c7 98 ef 9f 81 31 c0 e8 86 c9 5e 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89
[248587.287170] RIP [<ffffffff81050cd5>] pick_next_en...

Read more...

Karl Matthias (0k53-karl) wrote :
Download full text (3.3 KiB)

We have had a crash with Stefan's debugging kernel running. Here is the output (doesn't look like it contains any more information).

[248587.286290] ------------[ cut here ]------------
[248587.286765] kernel BUG at /home/smb/precise-amd64/ubuntu-2.6/kernel/sched_fair.c:1239!
[248587.286775] invalid opcode: 0000 [#1] SMP
[248587.286783] CPU 0
[248587.286786] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables bnep rfcomm bluetooth parport_pc ppdev lp parport isofs acpiphp
[248587.286822]
[248587.286827] Pid: 18805, comm: ohai Not tainted 3.2.0-24-virtual #37+lp999755dbg1
[248587.286836] RIP: e030:[<ffffffff81050cd5>] [<ffffffff81050cd5>] pick_next_entity+0x105/0x110
[248587.286849] RSP: e02b:ffff8801d02d7c28 EFLAGS: 00010096
[248587.286854] RAX: 000000000000002d RBX: ffff8801d20df800 RCX: 000000000003ffff
[248587.286860] RDX: 0000000000000000 RSI: ffffffff81e000a0 RDI: 0000000000000004
[248587.286866] RBP: ffff8801d02d7c48 R08: 000000000000000a R09: 0000000000000000
[248587.286872] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801dff866c0
[248587.286878] R13: ffff8801d20de600 R14: 0000000000000000 R15: 0000000001f41018
[248587.286889] FS: 00007f5ec2010700(0000) GS:ffff8801dff73000(0000) knlGS:0000000000000000
[248587.286895] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[248587.286901] CR2: 0000000001f09f30 CR3: 00000001cec3e000 CR4: 0000000000002660
[248587.286908] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[248587.286914] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[248587.286921] Process ohai (pid: 18805, threadinfo ffff8801d02d6000, task ffff8801cf10db80)
[248587.286928] Stack:
[248587.286931] ffff8801d20df800 ffff8801dff866c0 ffff8801d20de600 0000000000000000
[248587.286944] ffff8801d02d7c78 ffffffff810544e8 ffff8801d02d7c78 ffff8801dff866c0
[248587.286956] 0000000000000000 ffff8801cf10df28 ffff8801d02d7cf8 ffffffff81652f3c
[248587.286968] Call Trace:
[248587.286976] [<ffffffff810544e8>] pick_next_task_fair+0x38/0x70
[248587.286984] [<ffffffff81652f3c>] __schedule+0x14c/0x6f0
[248587.286992] [<ffffffff8165564e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[248587.286999] [<ffffffff816535af>] schedule+0x3f/0x60
[248587.287006] [<ffffffff8117e149>] pipe_wait+0x59/0x80
[248587.287014] [<ffffffff81089370>] ? add_wait_queue+0x60/0x60
[248587.287021] [<ffffffff8117e8aa>] pipe_read+0x1da/0x330
[248587.287028] [<ffffffff81174552>] do_sync_read+0xd2/0x110
[248587.287036] [<ffffffff8100a25d>] ? xen_force_evtchn_callback+0xd/0x10
[248587.287043] [<ffffffff8100aa32>] ? check_events+0x12/0x20
[248587.287051] [<ffffffff81298d63>] ? security_file_permission+0x93/0xb0
[248587.287057] [<ffffffff811749d1>] ? rw_verify_area+0x61/0xf0
[248587.287063] [<ffffffff81174eb0>] vfs_read+0xb0/0x180
[248587.287069] [<ffffffff81174fca>] sys_read+0x4a/0x90
[248587.287076] [<ffffffff8165da42>] system_call_fastpath+0x16/0x1b
[248587.287081] Code: 89 df e8 69 c1 5e 00 48 8b 73 38 48 c7 c7 7b ef 9f 81 31 c0 e8 98 c9 5e 00 48 8b 73 10 48 c7 c7 98 ef 9f 81 31 c0 e8 86 c9 5e 00 <0f> 0b 66 0f 1f 84 00 00 00 00 00 55 48 8...

Read more...

Gavin Heavyside (mydrive) wrote :
Download full text (4.4 KiB)

And another one from the debug kernel on EC2, with a slightly different call stack:

[ 4389.480352] ------------[ cut here ]------------
[ 4389.480884] kernel BUG at /home/smb/precise-amd64/ubuntu-2.6/kernel/sched_fair.c:1239!
[ 4389.480894] invalid opcode: 0000 [#1] SMP
[ 4389.480902] CPU 0
[ 4389.480905] Modules linked in: ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack iptable_filter ip_tables x_tables bnep rfcomm bluetooth parport_pc ppdev lp parport isofs acpiphp
[ 4389.480945]
[ 4389.480949] Pid: 24612, comm: ohai Not tainted 3.2.0-24-virtual #37+lp999755dbg1
[ 4389.480958] RIP: e030:[<ffffffff81050cd5>] [<ffffffff81050cd5>] pick_next_entity+0x105/0x110
[ 4389.480972] RSP: e02b:ffff8801d0cdd818 EFLAGS: 00010092
[ 4389.480977] RAX: 000000000000002c RBX: ffff8801d1106e00 RCX: 000000000003ffff
[ 4389.480983] RDX: 0000000000000000 RSI: ffffffff81e000a0 RDI: 0000000000000004
[ 4389.480988] RBP: ffff8801d0cdd838 R08: 000000000000000a R09: 0000000000000000
[ 4389.480994] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801dff866c0
[ 4389.481000] R13: ffff8801d1107a00 R14: 0000000000000008 R15: ffff8801d1e97800
[ 4389.481009] FS: 00007f09fcc8a700(0000) GS:ffff8801dff73000(0000) knlGS:0000000000000000
[ 4389.481015] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 4389.481021] CR2: 00000000012dbf30 CR3: 00000001d0702000 CR4: 0000000000002660
[ 4389.481028] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4389.481035] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 4389.481041] Process ohai (pid: 24612, threadinfo ffff8801d0cdc000, task ffff8801d2905b80)
[ 4389.481047] Stack:
[ 4389.481051] ffff8801d1106e00 ffff8801dff866c0 ffff8801d1107a00 0000000000000008
[ 4389.481063] ffff8801d0cdd868 ffffffff810544e8 ffff8801d0cdd868 ffff8801dff866c0
[ 4389.481075] 0000000000000000 ffff8801d2905f28 ffff8801d0cdd8e8 ffffffff81652f3c
[ 4389.481086] Call Trace:
[ 4389.481093] [<ffffffff810544e8>] pick_next_task_fair+0x38/0x70
[ 4389.481101] [<ffffffff81652f3c>] __schedule+0x14c/0x6f0
[ 4389.481108] [<ffffffff816535af>] schedule+0x3f/0x60
[ 4389.481114] [<ffffffff816546cd>] schedule_hrtimeout_range_clock+0x14d/0x170
[ 4389.481124] [<ffffffff8100aa1f>] ? xen_restore_fl_direct_reloc+0x4/0x4
[ 4389.481131] [<ffffffff8165564e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[ 4389.481139] [<ffffffff8108935d>] ? add_wait_queue+0x4d/0x60
[ 4389.481146] [<ffffffff81654703>] schedule_hrtimeout_range+0x13/0x20
[ 4389.481154] [<ffffffff811877d9>] poll_schedule_timeout+0x49/0x70
[ 4389.481160] [<ffffffff81188356>] do_select+0x4d6/0x600
[ 4389.481167] [<ffffffff811878e0>] ? poll_freewait+0xe0/0xe0
[ 4389.481173] [<ffffffff811879d0>] ? __pollwait+0xf0/0xf0
[ 4389.481179] [<ffffffff81005191>] ? __raw_callee_save_xen_pte_val+0x11/0x1e
[ 4389.481186] [<ffffffff81054c3e>] ? update_curr+0x21e/0x230
[ 4389.481192] [<ffffffff8103cc65>] ? pvclock_clocksource_read+0x55/0xf0
[ 4389.481199] [<ffffffff810553db>] ? check_preempt_wakeup+0x15b/0x230
[ 4389.481207] [<ffffffff8104ed74>] ? check_preempt_curr+0x84/0xa0
[ 4389.481214] [<ffffffff8104edcd>] ? ttwu_do_wakeup+0x3d/0x120
[ 4389.481222] ...

Read more...

Stefan Bader (smb) wrote :

Bugger, I made the severity of the messages too low... :/ If at all they might be in /var/log/syslog on instance reboot... I better make a better version. But its interesting that this appears on a completely different virtualization platform. I wonder whether it could even appear on real hardware. Thanks a lot for the detailed instructions for the test case. I was running a test program doing lots of forks and pipe communication over two days without hitting this, so it helps a lot to know exactly what the reproducing steps are.

Stefan Bader (smb) wrote :

So with those exact steps I managed to crash a real dual-core machine. So we can rule out any virtualization. This merely is related to whatever that command does and likely having more than one cpu. (It also worked / crashed on a local xen guest, so I can reproduce things locally). Next step will be to test latest upstream kernels to see whether this persists.

summary: - Kernel crash on EC2 & VirtualBox
+ Kernel crash in rb_next doin ohai loops
description: updated
summary: - Kernel crash in rb_next doin ohai loops
+ Kernel crash in rb_next doing ohai loops
Karl Matthias (0k53-karl) wrote :

Ok, that is "good" news. Happy to help with further testing.

description: updated
Stefan Bader (smb) on 2012-05-29
description: updated
Stefan Bader (smb) wrote :

I did repeat the crash on a local HVM guest with the current precise-proposed kernel. That way I was able to dump the crashed VM. A run with the same kernel in a different guest that had 4 CPUs did not crash for several hours. 2 CPUs seems to make the issue less likely and could be a reason this is not observed more often. Currently running the loop with a 3.4 mainline kernel and 2 CPUs.

Stefan Bader (smb) wrote :

Doh! I meant 4 CPUs makes it less likely and 2 more likely.

Karl Matthias (0k53-karl) wrote :

Stefan, our experience is that it also does not ever happen on just a single core so it's definitely some scheduler interaction on a multi-core box. You probably could get a 4 core to do it as well if the process list were more heavily loaded.

Stefan Bader (smb) wrote :

Unfortunately I was not yet able to pinpoint exactly the code that is wrong. But I think I know a way that can be used as a work-around until I find out the real fix. It looks very much like the problem is related to the autogroup feature which will automatically put tasks in separate task groups based on the session ids. The ohai process is doing this a lot, so I think compared to other work-loads there is a high volume of coming and going task groups, which might be the problem.

At the cost of loosing the fairer scheduling which isolation of high cpu using tasks in their own group brings, it is possible to disable the autogroup feature by either using "noautogroup" as pvgrub kernel argument, or changing
/proc/sys/kernel/sched_autogroup_enabled to 0. I guess that can be a sysctl entry, too. That way the ohai reproducer seems to survive, so production systems won't crash all the time (hopefully).

Ken (kenshi) wrote :

Any chance this is related to this issue? https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1007082

Stefan Bader (smb) wrote :

@Ken, I do not think so. This one is very like related to a race creating and removing task groups from the scheduler. Yours rather seems to have issues with the flags of a page it tries to use. I don't know why but I have the feeling of having seen this before, but maybe it was just looking at your bug. I think I saw some memory related patches being queued up into upstream stable but iirc those are not yet pushed into any next proposed. The log itself is somewhat helpful but with those kind of bugs hard to link to a specific place as the crash just is the result of something else going wrong before. If you could add to your bug more information on generic details. Like what kind of workload is running, is it possible to get any relation on when it happens (what range is occasionally).

Ken (kenshi) wrote :

@Stefan - thanks for the tips, will update the ticket with more info.

Stefan Bader (smb) wrote :

Sorry for not updating this before. So finally the problem has been identified. It is somewhat papered over by a change that went into linux-3.3. Though the potential race was still not fixed. What happens is that when a process calls setsid it is placed into a sperate schduling task group. However that group (a pointer) is changed while holding a different lock than when moving task to another CPU to balance the load. And even worse, while doing so (moving the task). the code before was looking up the task group pointer 4 times in a row while not being protected against changes. So whenever this crash happened, the pointer changed between two lookups to assign values to cfq scheduling elements. There could also be a inconsistency between cfq elements and rt scheduling element or between the two rt elements. Not sure which exact effects this would yield.

There is now a patch making its way upstream that will close the race. While Quantal is not affected that badly, Natty, Oneiric and Precise should be fixed.

Stefan Bader (smb) wrote :

This is the patch that was made upstream and should be queued for mainline.

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Stefan Bader (smb) on 2012-07-03
description: updated
Karl Matthias (0k53-karl) wrote :

Awesome, thanks for the nice explanation and the dedication to working this out. We'll try out the patch on our several test setups as well.

Tim Gardner (timg-tpi) on 2012-07-17
Changed in linux (Ubuntu Oneiric):
status: New → Fix Committed
Changed in linux (Ubuntu Precise):
status: New → Fix Committed
Changed in linux (Ubuntu Natty):
status: New → In Progress
assignee: nobody → Stefan Bader (stefan-bader-canonical)
Armin Bauer (armin-bauer) wrote :

Is there any estimate when this will be released? This bug is causing a lot of troubles for our setup due to random crashed instances.

Stefan Bader (smb) wrote :

This went into the repo yesterday. The current cycle is in testing phase this week and given nothing goes bad there would get released next week. So kernels with this patch should go into the proposed pocket about next week. If there are no problems this should be in the officially released kernels in 3 to 4 weeks. But you could start using the proposed kernels as soon as they are available. Or as I wrote earlier, disable the autogroup feature until then (see comment #32).

Armin Bauer (armin-bauer) wrote :

Thanks for the info. I think I will try a patched ohai version first, with the setsid call removed. The side effects of this change are probably better than changing the sched_autogroup_enabled on our production instances. If this doesnt help, I will try the sched_autogroup_enabled workaround. Thanks for your help and the work on this bug.

Gavin Heavyside (mydrive) wrote :

We think that this seems to be an interplay between the kernel bug, ohai, and Ruby 1.9.3. I was unable to reproduce the crash using the opscode omnibus installer (http://www.opscode.com/blog/2012/06/29/omnibus-chef-packaging/) which uses a bundled ruby 1.9.2, so you could potentially try that as an alternative to patching OHAI.

Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel for Oneiric in -proposed solves the problem (3.0.0-24.40). Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-oneiric' to 'verification-done-oneiric'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-oneiric
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel for Precise in -proposed solves the problem (3.2.0-29.46). Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-precise' to 'verification-done-precise'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-precise
tags: added: verification-done-precise
removed: verification-needed-precise
Karl Matthias (0k53-karl) wrote :

We verified the patched kernel does not exhibit the crashing behavior over a week-long run of Ohai loops on an EC2 large instance. It also behaves correctly on 2-core a VirtualBox running Ohai loops.

Stefan Bader (smb) wrote :

I could confirm the verification for Precise. Though for Oneiric unfortunately, the same ohai loops do not seem to trigger the issue. From the scheduler code it has the same race but than maybe ohai internally is slightly different back there. And it seems to be a very specific pattern of fork and setsid that results in the process getting pulled over to a different CPU right when it moves groups.
So I tend to let Oneiric pass without definite verification and possible ignore the Natty kernel (it will be out of support soon and also has the problem that we have no way to cause the crash right no).

Luis Henriques (henrix) on 2012-08-03
tags: added: verification-done-oneiric
removed: verification-needed-oneiric
Luis Henriques (henrix) wrote :

Tagged as verified in Oneiric as per comment #46 and IRC chat.

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Launchpad Janitor (janitor) wrote :
Download full text (7.5 KiB)

This bug was fixed in the package linux - 3.0.0-24.40

---------------
linux (3.0.0-24.40) oneiric-proposed; urgency=low

  [Luis Henriques]

  * Release Tracking Bug
    - LP: #1027841

  [ Andy Whitcroft ]

  * [Packaging] getabis should be extracting all packages
    - LP: #1021174
  * [Config] getabis -- series uses linux-image-extra
    - LP: #1021174

  [ Stefan Bader ]

  * SAUCE: sched: Fix race in task_group()
    - LP: #999755

  [ Tim Gardner ]

  * Extract firmware module info during getabi
    - LP: #1021174

  [ Upstream Kernel Changes ]

  * net: sock: validate data_len before allocating skb in
    sock_alloc_send_pskb()
    - LP: #1006622
    - CVE-2012-2136
  * epoll: clear the tfile_check_list on -ELOOP
    - LP: #1021811
    - CVE-2012-3375
  * ALSA: hda - Add Realtek ALC280 codec support
    - LP: #1017017, #1025406
  * powerpc/xmon: Use cpumask iterator to avoid warning
    - LP: #1025406
  * media: smsusb: add autodetection support for USB ID 2040:f5a0
    - LP: #1025406
  * ARM: fix rcu stalls on SMP platforms
    - LP: #1025406
  * cipso: handle CIPSO options correctly when NetLabel is disabled
    - LP: #1025406
  * net: l2tp_eth: fix kernel panic on rmmod l2tp_eth
    - LP: #1025406
  * dummy: fix rcu_sched self-detected stalls
    - LP: #1025406
  * ethtool: allow ETHTOOL_GSSET_INFO for users
    - LP: #1025406
  * bridge: Assign rtnl_link_ops to bridge devices created via ioctl (v2)
    - LP: #1025406
  * bonding: Fix corrupted queue_mapping
    - LP: #1025406
  * ipv6: Move ipv6 proc file registration to end of init order
    - LP: #1025406
  * sky2: fix checksum bit management on some chips
    - LP: #1025406
  * be2net: fix a race in be_xmit()
    - LP: #1025406
  * netpoll: fix netpoll_send_udp() bugs
    - LP: #1025406
  * hwmon: (applesmc) Limit key length in warning messages
    - LP: #1025406
  * nilfs2: ensure proper cache clearing for gc-inodes
    - LP: #1025406
  * udf: Use 'ret' instead of abusing 'i' in udf_load_logicalvol()
    - LP: #1025406
  * udf: Avoid run away loop when partition table length is corrupted
    - LP: #1025406
  * udf: Fortify loading of sparing table
    - LP: #1025406
  * ath9k: Fix softlockup in AR9485
    - LP: #1025406
  * ath9k_hw: avoid possible infinite loop in ar9003_get_pll_sqsum_dvc
    - LP: #1025406
  * ath9k: enable serialize_regmode for non-PCIE AR9287
    - LP: #1025406
  * ASoC: tlv320aic3x: Fix codec pll configure bug
    - LP: #1025406
  * Btrfs: run delayed directory updates during log replay
    - LP: #1025406
  * drm/edid: don't return stack garbage from supports_rb
    - LP: #1025406
  * drm/nouveau/fbcon: using nv_two_heads is not a good idea
    - LP: #1025406
  * drm/i915: Fix eDP blank screen after S3 resume on HP desktops
    - LP: #1025406
  * acpi_pad: fix power_saving thread deadlock
    - LP: #1025406
  * ACPI: Add a quirk for "AMILO PRO V2030" to ignore the timer overriding
    - LP: #1025406
  * ACPI, x86: fix Dell M6600 ACPI reboot regression via DMI
    - LP: #1025406
  * ACPI sysfs.c strlen fix
    - LP: #1025406
  * stable: Allow merging of backports for serious user-visible performance
    issues
    - LP: #1025406
  * USB: option: add id for...

Read more...

Changed in linux (Ubuntu Oneiric):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (16.5 KiB)

This bug was fixed in the package linux - 3.2.0-29.46

---------------
linux (3.2.0-29.46) precise-proposed; urgency=low

  [ Tim Gardner ]

  * No change upload to fix armel/armhf FTBS caused by
    'UBUNTU: [Config] SND_OMAP_SOC, SND_OMAP_SOC_MCBSP and SND_OMAP_SOC_OMAP3_BEAGLE =y'
    Added ignore and ignore.module files to ABI directories.

  [Luis Henriques]

  * Release Tracking Bug
    - LP: #1029507

linux (3.2.0-29.45) precise-proposed; urgency=low

  [Luis Henriques]

  * Release Tracking Bug
    - LP: #1029507

  [ Andy Whitcroft ]

  * SAUCE: rds_ib_send() -- prevent local pings triggering BUG_ON()
    - LP: #1016299
    - CVE-2012-2372

  [ Upstream Kernel Changes ]

  * Revert "samsung-laptop: make the dmi check less strict"
    - LP: #1029431
  * samsung-laptop: make the dmi check less strict
    - LP: #1029431
  * raid5: delayed stripe fix
    - LP: #1029431
  * tcp: drop SYN+FIN messages
    - LP: #1029431
  * tg3: Apply short DMA frag workaround to 5906
    - LP: #1029431
  * rtl8187: ->brightness_set can not sleep
    - LP: #1029431
  * net/wireless: ipw2x00: add supported cipher suites to wiphy
    initialization
    - LP: #1029431
  * kbuild: do not check for ancient modutils tools
    - LP: #1029431
  * brcmsmac: "INTERMEDIATE but not AMPDU" only when tracing
    - LP: #1029431
  * ext4: Report max_batch_time option correctly
    - LP: #1029431
  * NFSv4: Reduce the footprint of the idmapper
    - LP: #1029431
  * NFSv4: Further reduce the footprint of the idmapper
    - LP: #1029431
  * macvtap: zerocopy: fix offset calculation when building skb
    - LP: #1029431
  * macvtap: zerocopy: fix truesize underestimation
    - LP: #1029431
  * macvtap: zerocopy: put page when fail to get all requested user pages
    - LP: #1029431
  * macvtap: zerocopy: set SKBTX_DEV_ZEROCOPY only when skb is built
    successfully
    - LP: #1029431
  * macvtap: zerocopy: validate vectors before building skb
    - LP: #1029431
  * KVM: Fix buffer overflow in kvm_set_irq()
    - LP: #1029431
  * scsi: Silence unnecessary warnings about ioctl to partition
    - LP: #1029431
  * iommu/amd: Fix missing iommu_shutdown initialization in passthrough
    mode
    - LP: #1029431
  * iommu/amd: Initialize dma_ops for hotplug and sriov devices
    - LP: #1029431
  * usb: Add support for root hub port status CAS
    - LP: #1029431
  * gpiolib: wm8994: Pay attention to the value set when enabling as output
    - LP: #1029431
  * sched/nohz: Rewrite and fix load-avg computation -- again
    - LP: #1029431
  * USB: option: add ZTE MF60
    - LP: #1029431
  * USB: option: Add MEDIATEK product ids
    - LP: #1029431
  * USB: cdc-wdm: fix lockup on error in wdm_read
    - LP: #1029431
  * mtd: nandsim: don't open code a do_div helper
    - LP: #1029431
  * dvb-core: Release semaphore on error path dvb_register_device()
    - LP: #1029431
  * hwspinlock/core: use global ID to register hwspinlocks on multiple
    devices
    - LP: #1029431
  * libsas: fix taskfile corruption in sas_ata_qc_fill_rtf
    - LP: #1029431
  * md/raid1: fix use-after-free bug in RAID1 data-check code.
    - LP: #1029431
  * PCI: EHCI: fix crash during suspend on ASUS computers
    - L...

Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
tags: added: patch
Stefan Bader (smb) on 2012-10-23
Changed in linux (Ubuntu Natty):
status: In Progress → Won't Fix
assignee: Stefan Bader (stefan-bader-canonical) → nobody
Changed in linux (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers