Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance types

Bug #1011792 reported by Matt Wilson
112
This bug affects 16 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Unassigned
Lucid
Fix Released
Medium
Unassigned
Oneiric
Fix Released
Critical
Unassigned
Precise
Fix Released
Critical
Unassigned
Quantal
Fix Released
High
Unassigned
Raring
Fix Released
High
Unassigned
linux-lts-backport-oneiric (Ubuntu)
Invalid
Undecided
Unassigned
Lucid
Fix Released
Undecided
Unassigned
Oneiric
Won't Fix
Critical
Unassigned
Precise
Invalid
Undecided
Unassigned
Quantal
Invalid
Undecided
Unassigned
Raring
Invalid
Undecided
Unassigned

Bug Description

SRU Justification:

Impact: Running lots of threads which utilize spinlocks (the pgslam testcase is quite successful in causing this), we hit a stage where the spinlock is still locked but none of the CPUs seem to be actively holding it. The reason for this is not yet completely understood. From evidence it looks like at least one CPU tries to get the lock, fails and goes into the pv-spinlocks special case of doing a hv call. This call seems to return, but somehow not to where it left, so the lock is never taken and other cpus are sometimes stuck in the hv call.

Fix: This problem seems to be avoidable by not enabling the cpu interrupts/event channel before doing the hv call. This may have some performance penalties but on the other hand testing did not seem to be much worse and does actually survive the run.

Testcase: pgslam test script below.

---

Scheduler deadlocks have been observed on c1.xlarge EC2 instances running 10.04.3 LTS with the 3.0.0-20-virtual Oneiric backport kernel. The symptoms appear similar to bug 929941, where multiple CPUs are waiting on scheduler runqueue locks. But in this case, only a few CPUs are stuck.

A typical set of stack traces from the guest state looks like:

VCPU0
rip: ffffffff810013aa hypercall_page+0x3aa
flags: 00001202 i nz
rsp: ffff8801b3c27910
rax: 0000000000000000 rcx: ffffffff810013aa rdx: ffff8801b3c27954
rbx: ffff88000265cb30 rsi: ffff8801b3c27938 rdi: 0000000000000003
rbp: ffff8801b3c27958 r8: 0000000000000001 r9: 0000000000000001
r10: 0000000000000000 r11: 0000000000000202 r12: 0000000000000011
r13: 0000000000000001 r14: 0000000000000001 r15: 0000000000000000
 cs: e033 ss: e02b ds: 0000 es: 0000
 fs: 0000 @ 00007f4ce223f700
 gs: 0000 @ ffff8801bfed4000/0000000000000000

cr0: 80050033
cr2: 0061ade0
cr3: 0e93d000
cr4: 00002660

dr0: 00000000
dr1: 00000000
dr2: 00000000
dr3: 00000000
dr6: ffff0ff0
dr7: 00000400
Code (instr addr ffffffff810013aa)
cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc

Stack:
 0000000000000246 0000000000000000 ffffffff81394b42 ffff8801b3c27938
 0000000000000000 ffff8801b3c27954 ffffffff00000001 0000000000000000
 0000000481394ad6 ffff8801b3c27968 ffffffff81394b60 ffff8801b3c279b8
 ffffffff8100933f ffff8801b3c27a48 0000000000000000 ffff8801b3c27998

Call Trace:
  [<ffffffff810013aa>] hypercall_page+0x3aa <--
  [<ffffffff81394b42>] xen_poll_irq_timeout+0x42
  [<ffffffff81394b60>] xen_poll_irq+0x10
  [<ffffffff8100933f>] xen_spin_lock_slow+0x7f
  [<ffffffff81009435>] xen_spin_lock_flags+0x75
  [<ffffffff8160365f>] _raw_spin_lock_irqsave+0x2f
  [<ffffffff8104ee50>] task_rq_lock+0x40
  [<ffffffff8104f069>] task_sched_runtime+0x29
  [<ffffffff81085f38>] thread_group_cputime+0x88
  [<ffffffff812c4dc9>] apparmor_ptrace_access_check+0x39
  [<ffffffff81051593>] thread_group_times+0x33
  [<ffffffff811d4192>] do_task_stat+0x6d2
  [<ffffffff8160361e>] _raw_spin_lock+0xe
  [<ffffffff8119165f>] seq_open+0x4f
  [<ffffffff811d0380>] sched_autogroup_show+0x70
  [<ffffffff811d0380>] sched_autogroup_show+0x70
  [<ffffffff8119173a>] single_open+0x7a
  [<ffffffff811cdbf0>] sched_open+0x20
  [<ffffffff811cdc0b>] proc_single_open+0x1b
  [<ffffffff8118e460>] mntput_no_expire+0x60
  [<ffffffff8118e5ad>] mntput+0x1d
  [<ffffffff811d4624>] proc_tgid_stat+0x14
  [<ffffffff811d03e1>] proc_single_show+0x61
  [<ffffffff81191be2>] seq_read+0xf2
  [<ffffffff8116fe35>] vfs_read+0xc5
  [<ffffffff81170001>] sys_read+0x51
  [<ffffffff8160ba02>] system_call_fastpath+0x16

VCPU1
rip: ffffffff8105a777 try_to_wake_up+0xd7
flags: 00001202 i nz
rsp: ffff8801bfef28f0
rax: 0000000000000003 rcx: 0000000000000000 rdx: 0000000000000001
rbx: 0000000000012980 rsi: ffff8801b1990078 rdi: 0000000000000000
rbp: ffff8801bfef2950 r8: 0000000000000000 r9: 0000000000000000
r10: 0000000000000000 r11: 00000000fb981853 r12: ffff88000265c530
r13: 0000000000000000 r14: ffff88000265cb30 r15: 0000000000000000
 cs: e033 ss: e02b ds: 0000 es: 0000
 fs: 0000 @ 00007ff68926c700
 gs: 0000 @ ffff8801bfeef000/0000000000000000

cr0: 8005003b
cr2: 00441d80
cr3: 1174cb000
cr4: 00002660

dr0: 00000000
dr1: 00000000
dr2: 00000000
dr3: 00000000
dr6: ffff0ff0
dr7: 00000400
Code (instr addr ffffffff8105a777)
00 00 eb 0c 66 2e 0f 1f 84 00 00 00 00 00 f3 90 41 8b 54 24 28 <85> d2 75 f5 49 8b 14 24 31 c0 83

Stack:
 0000000000000000 ffff8801b41d2858 ffff8801bfef2950 ffffffff8153e51e
 0000000300000004 ffff8801b1990078 ffff8801bfef2930 ffff8800026f6c18
 0000000000000001 ffff8800026f6c30 0000000000000000 0000000000000000
 ffff8801bfef2960 ffffffff8105a962 ffff8801bfef29b0 ffffffff81049709

Call Trace:
  [<ffffffff8105a777>] try_to_wake_up+0xd7 <--
  [<ffffffff8153e51e>] ip_finish_output+0x16e
  [<ffffffff8105a962>] default_wake_function+0x12
  [<ffffffff81049709>] __wake_up_common+0x59
  [<ffffffff81049758>] __wake_up_locked+0x18
  [<ffffffff811afcf4>] ep_poll_callback+0xa4
  [<ffffffff81049709>] __wake_up_common+0x59
  [<ffffffff8104ed53>] __wake_up_sync_key+0x53
  [<ffffffff814f3d2e>] sock_def_readable+0x3e
  [<ffffffff8155255a>] tcp_rcv_established+0x26a
  [<ffffffff8100742d>] xen_force_evtchn_callback+0xd
  [<ffffffff81007b72>] check_events+0x12
  [<ffffffff8155a9a5>] tcp_v4_do_rcv+0x125
  [<ffffffff8155c169>] tcp_v4_rcv+0x5a9
  [<ffffffff8153854d>] ip_local_deliver_finish+0xdd
  [<ffffffff81538790>] ip_local_deliver+0x80
  [<ffffffff81537db9>] ip_rcv_finish+0x119
  [<ffffffff815383a8>] ip_rcv+0x228
  [<ffffffff815d0afd>] packet_rcv_spkt+0x4d
  [<ffffffff815036b0>] __netif_receive_skb+0x1e0
  [<ffffffff81506790>] netif_receive_skb+0x80
  [<ffffffff814614f4>] handle_incoming_queue+0x134
  [<ffffffff81461fd7>] xennet_poll+0x277
  [<ffffffff81506ff8>] net_rx_action+0x108
  [<ffffffff8160361e>] _raw_spin_lock+0xe
  [<ffffffff8106889f>] __do_softirq+0xbf
  [<ffffffff810d3f7d>] handle_edge_irq+0x9d
  [<ffffffff8160cc1c>] call_softirq+0x1c
  [<ffffffff8100d3d5>] do_softirq+0x65
  [<ffffffff8106869d>] irq_exit+0xbd
  [<ffffffff81394ee5>] xen_evtchn_do_upcall+0x35
  [<ffffffff8160cc6e>] xen_do_hypervisor_callback+0x1e

VCPU2
rip: ffffffff810013aa hypercall_page+0x3aa
flags: 00001202 i nz
rsp: ffff8801bff0da00
rax: 0000000000000000 rcx: ffffffff810013aa rdx: ffff8801bff0da44
rbx: ffff8800026f6c00 rsi: ffff8801bff0da28 rdi: 0000000000000003
rbp: ffff8801bff0da48 r8: 00000000000000c3 r9: 000000000000c110
r10: 0000000000100000 r11: 0000000000000202 r12: 000000000000001d
r13: 0000000000000001 r14: 0000000000000001 r15: 0000000000000000
 cs: e033 ss: e02b ds: 0000 es: 0000
 fs: 0000 @ 00007f408128b700
 gs: 0000 @ ffff8801bff0a000/0000000000000000

cr0: 80050033
cr2: 0061ade0
cr3: 289f2000
cr4: 00002660

dr0: 00000000
dr1: 00000000
dr2: 00000000
dr3: 00000000
dr6: ffff0ff0
dr7: 00000400
Code (instr addr ffffffff810013aa)
cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc

Stack:
 0000000000112000 0000000000000000 ffffffff81394b42 ffff8801bff0da28
 0000000000000000 ffff8801bff0da44 ffffffff00000001 0000000000000000
 0000001081394ad6 ffff8801bff0da58 ffffffff81394b60 ffff8801bff0daa8
 ffffffff8100933f ffff8801bff0da88 0000000000000000 ffff8801b3d14530

Call Trace:
  [<ffffffff810013aa>] hypercall_page+0x3aa <--
  [<ffffffff81394b42>] xen_poll_irq_timeout+0x42
  [<ffffffff81394b60>] xen_poll_irq+0x10
  [<ffffffff8100933f>] xen_spin_lock_slow+0x7f
  [<ffffffff81009435>] xen_spin_lock_flags+0x75
  [<ffffffff8160365f>] _raw_spin_lock_irqsave+0x2f
  [<ffffffff811afca9>] ep_poll_callback+0x59
  [<ffffffff816036ae>] _raw_spin_unlock_irqrestore+0x1e
  [<ffffffff81049709>] __wake_up_common+0x59
  [<ffffffff8104ed53>] __wake_up_sync_key+0x53
  [<ffffffff814f3d2e>] sock_def_readable+0x3e
  [<ffffffff8154ef50>] tcp_data_queue+0x300
  [<ffffffff81552639>] tcp_rcv_established+0x349
  [<ffffffff8155a9a5>] tcp_v4_do_rcv+0x125
  [<ffffffff8155c169>] tcp_v4_rcv+0x5a9
  [<ffffffff81054e18>] enqueue_sleeper+0x188
  [<ffffffff8153854d>] ip_local_deliver_finish+0xdd
  [<ffffffff81538790>] ip_local_deliver+0x80
  [<ffffffff81537db9>] ip_rcv_finish+0x119
  [<ffffffff815383a8>] ip_rcv+0x228
  [<ffffffff815036b0>] __netif_receive_skb+0x1e0
  [<ffffffff81007b5f>] xen_restore_fl_direct_reloc+0x4
  [<ffffffff810d7af4>] rcu_enter_nohz+0x44
  [<ffffffff81503aeb>] process_backlog+0x10b
  [<ffffffff810d10b5>] handle_irq_event_percpu+0xb5
  [<ffffffff81506ff8>] net_rx_action+0x108
  [<ffffffff8100742d>] xen_force_evtchn_callback+0xd
  [<ffffffff8106889f>] __do_softirq+0xbf
  [<ffffffff81007b5f>] xen_restore_fl_direct_reloc+0x4
  [<ffffffff8160cc1c>] call_softirq+0x1c

VCPU3
rip: ffffffff810013aa hypercall_page+0x3aa
flags: 00001202 i nz
rsp: ffff8801b1993b10
rax: 0000000000000000 rcx: ffffffff810013aa rdx: ffff8801b1993b54
rbx: ffff8801bff01980 rsi: ffff8801b1993b38 rdi: 0000000000000003
rbp: ffff8801b1993b58 r8: ffff8801bf004760 r9: 0000000000000040
r10: 000000000000001d r11: 0000000000000202 r12: 0000000000000023
r13: 0000000000000001 r14: 0000000000000001 r15: 0000000000000000
 cs: e033 ss: e02b ds: 0000 es: 0000
 fs: 0000 @ 00007f1a15c06700
 gs: 0000 @ ffff8801bff25000/0000000000000000

cr0: 80050033
cr2: 7f78b037de20
cr3: 0e9c9000
cr4: 00002660

dr0: 00000000
dr1: 00000000
dr2: 00000000
dr3: 00000000
dr6: ffff0ff0
dr7: 00000400
Code (instr addr ffffffff810013aa)
cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc

Stack:
 0000000000000001 0000000000000000 ffffffff81394b42 ffff8801b1993b38
 0000000000000000 ffff8801b1993b54 ffffffff00000001 0000000000000000
 0000001681394ad6 ffff8801b1993b68 ffffffff81394b60 ffff8801b1993bb8
 ffffffff8100933f 0000000000000000 0000000000000000 0000000000000000

Call Trace:
  [<ffffffff810013aa>] hypercall_page+0x3aa <--
  [<ffffffff81394b42>] xen_poll_irq_timeout+0x42
  [<ffffffff81394b60>] xen_poll_irq+0x10
  [<ffffffff8100933f>] xen_spin_lock_slow+0x7f
  [<ffffffff81009496>] xen_spin_lock+0x56
  [<ffffffff8160361e>] _raw_spin_lock+0xe
  [<ffffffff8104d90d>] double_rq_lock+0x2d
  [<ffffffff81059f15>] load_balance+0x1f5
  [<ffffffff81601576>] __schedule+0x7e6
  [<ffffffff8160165f>] schedule+0x3f
  [<ffffffff816025b5>] schedule_hrtimeout_range_clock+0xc5
  [<ffffffff810881c0>] update_rmtp+0x80
  [<ffffffff81089524>] hrtimer_start_range_ns+0x14
  [<ffffffff81602663>] schedule_hrtimeout_range+0x13
  [<ffffffff811af9f8>] ep_poll+0x2d8
  [<ffffffff8105a950>] try_to_wake_up+0x2b0
  [<ffffffff81290c2b>] security_file_permission+0x8b
  [<ffffffff8116fee0>] vfs_read+0x170
  [<ffffffff811afb35>] sys_epoll_wait+0xc5
  [<ffffffff8160ba02>] system_call_fastpath+0x16

VCPU4
rip: ffffffff810013aa hypercall_page+0x3aa
flags: 00001246 i z p
rsp: ffff8801b45f5ed8
rax: 0000000000000000 rcx: ffffffff810013aa rdx: 0000000000000000
rbx: ffff8801b45f4010 rsi: 0000000000000000 rdi: 0000000000000001
rbp: ffff8801b45f5ef0 r8: 0000000000000000 r9: 0000000000000000
r10: 0000000000000000 r11: 0000000000000246 r12: 0000000000000004
r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000000
 cs: e033 ss: e02b ds: 002b es: 002b
 fs: 0000 @ 00007fe085364700
 gs: 0000 @ ffff8801bff40000/0000000000000000

cr0: 8005003b
cr2: 7f7127c10000
cr3: 29c76000
cr4: 00002660

dr0: 00000000
dr1: 00000000
dr2: 00000000
dr3: 00000000
dr6: ffff0ff0
dr7: 00000400
Code (instr addr ffffffff810013aa)
cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc

Stack:
 0000000000000000 0000000000000000 ffffffff81007490 ffff8801b45f5f10
 ffffffff81013c3d ffff8801b45f4010 ffffffff81cc7f48 ffff8801b45f5f40
 ffffffff8100b0c7 ffffffff81007b19 6811f30c74562efe 0000000000000000
 0000000000000000 ffff8801b45f5f50 ffffffff815f523d 0000000000000000

Call Trace:
  [<ffffffff810013aa>] hypercall_page+0x3aa <--
  [<ffffffff81007490>] xen_safe_halt+0x10
  [<ffffffff81013c3d>] default_idle+0x5d
  [<ffffffff8100b0c7>] cpu_idle+0xb7
  [<ffffffff81007b19>] xen_irq_enable_direct_reloc+0x4
  [<ffffffff815f523d>] cpu_bringup_and_idle+0xe

VCPU5
rip: ffffffff810013aa hypercall_page+0x3aa
flags: 00001246 i z p
rsp: ffff8801b45f7ed8
rax: 0000000000000000 rcx: ffffffff810013aa rdx: 0000000000000000
rbx: ffff8801b45f6010 rsi: 0000000000000000 rdi: 0000000000000001
rbp: ffff8801b45f7ef0 r8: 0000000000000000 r9: 0000000000000000
r10: 0000000000000000 r11: 0000000000000246 r12: 0000000000000005
r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000000
 cs: e033 ss: e02b ds: 002b es: 002b
 fs: 0000 @ 00007fcf9cf7a700
 gs: 0000 @ ffff8801bff5b000/0000000000000000

cr0: 8005003b
cr2: 7faf54aac360
cr3: 325e8000
cr4: 00002660

dr0: 00000000
dr1: 00000000
dr2: 00000000
dr3: 00000000
dr6: ffff0ff0
dr7: 00000400
Code (instr addr ffffffff810013aa)
cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc

Stack:
 0000000000000000 0000000000000000 ffffffff81007490 ffff8801b45f7f10
 ffffffff81013c3d ffff8801b45f6010 ffffffff81cc7f48 ffff8801b45f7f40
 ffffffff8100b0c7 ffffffff81007b19 afec8dc6140892fe 0000000000000000
 0000000000000000 ffff8801b45f7f50 ffffffff815f523d 0000000000000000

Call Trace:
  [<ffffffff810013aa>] hypercall_page+0x3aa <--
  [<ffffffff81007490>] xen_safe_halt+0x10
  [<ffffffff81013c3d>] default_idle+0x5d
  [<ffffffff8100b0c7>] cpu_idle+0xb7
  [<ffffffff81007b19>] xen_irq_enable_direct_reloc+0x4
  [<ffffffff815f523d>] cpu_bringup_and_idle+0xe

VCPU6
rip: ffffffff810013aa hypercall_page+0x3aa
flags: 00001246 i z p
rsp: ffff8801b4601ed8
rax: 0000000000000000 rcx: ffffffff810013aa rdx: 0000000000000000
rbx: ffff8801b4600010 rsi: 0000000000000000 rdi: 0000000000000001
rbp: ffff8801b4601ef0 r8: 0000000000000000 r9: 0000000000000000
r10: 0000000000000000 r11: 0000000000000246 r12: 0000000000000006
r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000000
 cs: e033 ss: e02b ds: 002b es: 002b
 fs: 0000 @ 00007ff5deb9b700
 gs: 0000 @ ffff8801bff76000/0000000000000000

cr0: 8005003b
cr2: 7faf5345dff8
cr3: 10160000
cr4: 00002660

dr0: 00000000
dr1: 00000000
dr2: 00000000
dr3: 00000000
dr6: ffff0ff0
dr7: 00000400
Code (instr addr ffffffff810013aa)
cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc

Stack:
 0000000000000000 0000000000000000 ffffffff81007490 ffff8801b4601f10
 ffffffff81013c3d ffff8801b4600010 ffffffff81cc7f48 ffff8801b4601f40
 ffffffff8100b0c7 ffffffff81007b19 f6fcdef4e1be86f4 0000000000000000
 0000000000000000 ffff8801b4601f50 ffffffff815f523d 0000000000000000

Call Trace:
  [<ffffffff810013aa>] hypercall_page+0x3aa <--
  [<ffffffff81007490>] xen_safe_halt+0x10
  [<ffffffff81013c3d>] default_idle+0x5d
  [<ffffffff8100b0c7>] cpu_idle+0xb7
  [<ffffffff81007b19>] xen_irq_enable_direct_reloc+0x4
  [<ffffffff815f523d>] cpu_bringup_and_idle+0xe

VCPU7
rip: ffffffff810013aa hypercall_page+0x3aa
flags: 00001246 i z p
rsp: ffff8801b4603ed8
rax: 0000000000000000 rcx: ffffffff810013aa rdx: 0000000000000000
rbx: ffff8801b4602010 rsi: 0000000000000000 rdi: 0000000000000001
rbp: ffff8801b4603ef0 r8: 0000000000000000 r9: 0000000000000000
r10: 0000000000000000 r11: 0000000000000246 r12: 0000000000000007
r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000000
 cs: e033 ss: e02b ds: 002b es: 002b
 fs: 0000 @ 00007fe081751700
 gs: 0000 @ ffff8801bff91000/0000000000000000

cr0: 8005003b
cr2: 7faf54bdfde0
cr3: 29c76000
cr4: 00002660

dr0: 00000000
dr1: 00000000
dr2: 00000000
dr3: 00000000
dr6: ffff0ff0
dr7: 00000400
Code (instr addr ffffffff810013aa)
cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc

Stack:
 0000000000000000 0000000000000000 ffffffff81007490 ffff8801b4603f10
 ffffffff81013c3d ffff8801b4602010 ffffffff81cc7f48 ffff8801b4603f40
 ffffffff8100b0c7 ffffffff81007b19 0c1d63da394a80a3 0000000000000000
 0000000000000000 ffff8801b4603f50 ffffffff815f523d 0000000000000000

Call Trace:
  [<ffffffff810013aa>] hypercall_page+0x3aa <--
  [<ffffffff81007490>] xen_safe_halt+0x10
  [<ffffffff81013c3d>] default_idle+0x5d
  [<ffffffff8100b0c7>] cpu_idle+0xb7
  [<ffffffff81007b19>] xen_irq_enable_direct_reloc+0x4
  [<ffffffff815f523d>] cpu_bringup_and_idle+0xe

Revision history for this message
Matt Wilson (msw-amazon) wrote :

vCPUs 0, 2 and 3 are stuck waiting on a spinlock. vCPU 1 is running with the EIP showing various values inside try_to_wake_up()

Revision history for this message
Matt Wilson (msw-amazon) wrote :
tags: added: oneiric
summary: - Scheduler deadlock running 3.0.0-20-virtual on c1.xlarge EC2 instance
+ Scheduler deadlock running 3.0.0 on multiple EC2 instance types
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1011792

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Matt Wilson (msw-amazon) wrote : Re: Scheduler deadlock running 3.0.0 on multiple EC2 instance types

Due to the nature of the issue encountered, we cannot run this command.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Matt Wilson (msw-amazon) wrote :
Download full text (34.4 KiB)

CPU stack traces from a hi1.4xlarge PV instance running ami-8baa73e2:

CPU 0 is the only running CPU. The others are blocked.

CPU0
rip: ffffffff8105711a try_to_wake_up+0xca
flags: 00001202 i nz
rsp: ffff880f22dfc870
rax: 0000000000000008 rcx: 0000000000000000 rdx: 0000000000000002
rbx: ffff880012ca8000 rsi: ffff880e6af80048 rdi: 0000000000000001
rbp: ffff880f22dfc8c0 r8: 0000000000000000 r9: 0000000000000000
r10: 0000000000000000 r11: 0000000000000000 r12: ffff880012ca85f8
r13: 0000000000000000 r14: ffff880e6af80048 r15: 0000000000000003
 cs: e033 ss: e02b ds: 0000 es: 0000
 fs: 0000 @ 00007fd01a7e7700
 gs: 0000 @ ffff880f22df9000/0000000000000000
Code (instr addr ffffffff8105711a)
66 90 45 85 ff 74 0e 44 8b 7d cc e9 a8 00 00 00 0f 1f 00 f3 90 <8b> 7b 28 85 ff 75 f7 48 8b 13 31

Stack:
 ffff880ec61086c0 ffff8801a1f5dde8 ffff880f22dfc890 0000000881079a45
 0000000000000000 ffff880e6af81648 0000000000000001 ffff880ec6058708
 0000000000000000 0000000000000000 ffff880f22dfc8d0 ffffffff81057262
 ffff880f22dfc8f0 ffffffff810813c6 ffffea0015215a40 ffff880ec60586f0

Call Trace:
  [<ffffffff8105711a>] try_to_wake_up+0xca <--
  [<ffffffff81057262>] default_wake_function+0x12
  [<ffffffff810813c6>] autoremove_wake_function+0x16
  [<ffffffff81044be8>] __wake_up_common+0x58
  [<ffffffff81007aef>] xen_restore_fl_direct_reloc+0x4
  [<ffffffff810466d8>] __wake_up+0x48
  [<ffffffff812ceee6>] __freed_request+0x66
  [<ffffffff812ceffa>] freed_request+0x3a
  [<ffffffff812cf083>] __blk_put_request.part.53+0x63
  [<ffffffff812cf100>] __blk_put_request+0x50
  [<ffffffff812cf1d1>] blk_finish_request+0xb1
  [<ffffffff812cf26b>] __blk_end_request_all+0x4b
  [<ffffffff813e3677>] blkif_interrupt+0x167
  [<ffffffff810ccb65>] handle_irq_event_percpu+0x55
  [<ffffffff812ec66b>] radix_tree_lookup+0xb
  [<ffffffff810ccd7b>] handle_irq_event+0x4b
  [<ffffffff810cf53c>] handle_edge_irq+0x7c
  [<ffffffff81384a69>] __xen_evtchn_do_upcall+0x199
  [<ffffffff813869ef>] xen_evtchn_do_upcall+0x2f
  [<ffffffff8160f62e>] xen_do_hypervisor_callback+0x1e
  [<ffffffff810013aa>] hypercall_page+0x3aa
  [<ffffffff810013aa>] hypercall_page+0x3aa
  [<ffffffff8138512e>] xen_poll_irq_timeout+0x3e
  [<ffffffff81386b80>] xen_poll_irq+0x10
  [<ffffffff815ecdcd>] xen_spin_lock_slow+0x98
  [<ffffffff81008e23>] xen_spin_lock_flags+0x63
  [<ffffffff8160619e>] _raw_spin_lock_irqsave+0x2e
  [<ffffffff8104e662>] update_shares+0x92
  [<ffffffff81056a68>] rebalance_domains+0x48
  [<ffffffff81056d08>] run_rebalance_domains+0x48
  [<ffffffff81065bf8>] __do_softirq+0xa8
  [<ffffffff81384ad7>] __xen_evtchn_do_upcall+0x207
  [<ffffffff8160f5dc>] call_softirq+0x1c
  [<ffffffff8100c265>] do_softirq+0x65
  [<ffffffff81065fde>] irq_exit+0x8e
  [<ffffffff813869f5>] xen_evtchn_do_upcall+0x35
  [<ffffffff8160f62e>] xen_do_hypervisor_callback+0x1e
CPU1
rip: ffffffff810013aa hypercall_page+0x3aa
flags: 00001206 i nz p
rsp: ffff88004e92dbf0
rax: 0000000000000000 rcx: ffffffff810013aa rdx: 0000000000000000
rbx: 0000000000000000 rsi: ffff88004e92dc08 rdi: 0000000000000003
rbp: ffff88004e92dc38 r8: ffff880ec85a3000 r9: ffff880f21c000d0
r10: 0000000000000001 r11: 00...

Revision history for this message
Matt Wilson (msw-amazon) wrote :
Download full text (34.5 KiB)

Stack traces from a second hi1.4xlarge running ami-8baa73e2:

CPU0
rip: ffffffff8105711a try_to_wake_up+0xca
flags: 00001202 i nz
rsp: ffff880f22dfc740
rax: 0000000000000006 rcx: ffff880f22dfc870 rdx: 0000000000000082
rbx: ffff880b13d90000 rsi: 0000000000000000 rdi: 0000000000000001
rbp: ffff880f22dfc790 r8: ffffea0008b0b118 r9: 0000000000000000
r10: 57ffc73d4450b118 r11: 0000000000000202 r12: ffff880b13d905f8
r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000003
 cs: e033 ss: e02b ds: 0000 es: 0000
 fs: 0000 @ 00007f6132beb700
 gs: 0000 @ ffff880f22df9000/0000000000000000
Code (instr addr ffffffff8105711a)
66 90 45 85 ff 74 0e 44 8b 7d cc e9 a8 00 00 00 0f 1f 00 f3 90 <8b> 7b 28 85 ff 75 f7 48 8b 13 31

Stack:
 ffffffff81007b02 0000000000012940 0000000000012940 0000000621c005e8
 ffffffff8110b7f7 ffff880b13dbfc30 0000000000000000 ffff880f22fc47d0
 ffff880f22dfc870 0000000000000000 ffff880f22dfc7a0 ffffffff81057262
 ffff880f22dfc7c0 ffffffff810813c6 ffffffff810073ed ffff880f22fc47b8

Call Trace:
  [<ffffffff8105711a>] try_to_wake_up+0xca <--
  [<ffffffff81007b02>] check_events+0x12
  [<ffffffff8110b7f7>] mempool_free_slab+0x17
  [<ffffffff81057262>] default_wake_function+0x12
  [<ffffffff810813c6>] autoremove_wake_function+0x16
  [<ffffffff810073ed>] xen_force_evtchn_callback+0xd
  [<ffffffff8108142b>] wake_bit_function+0x3b
  [<ffffffff81044be8>] __wake_up_common+0x58
  [<ffffffff8110b7f7>] mempool_free_slab+0x17
  [<ffffffff810466d8>] __wake_up+0x48
  [<ffffffff81081061>] __wake_up_bit+0x31
  [<ffffffff81108bda>] unlock_page+0x2a
  [<ffffffff8119f726>] mpage_end_io+0x46
  [<ffffffff8119996d>] bio_endio+0x1d
  [<ffffffff814b24d7>] dec_pending+0x87
  [<ffffffff814b2833>] clone_endio+0xa3
  [<ffffffff8119996d>] bio_endio+0x1d
  [<ffffffff812cd413>] req_bio_endio.isra.45+0xa3
  [<ffffffff812cdcf5>] blk_update_request+0xf5
  [<ffffffff812ce041>] blk_update_bidi_request+0x31
  [<ffffffff812cf257>] __blk_end_request_all+0x37
  [<ffffffff813e3677>] blkif_interrupt+0x167
  [<ffffffff810ccb65>] handle_irq_event_percpu+0x55
  [<ffffffff812ec66b>] radix_tree_lookup+0xb
  [<ffffffff810ccd7b>] handle_irq_event+0x4b
  [<ffffffff810cf53c>] handle_edge_irq+0x7c
  [<ffffffff81384a69>] __xen_evtchn_do_upcall+0x199
  [<ffffffff813869ef>] xen_evtchn_do_upcall+0x2f
  [<ffffffff8160f62e>] xen_do_hypervisor_callback+0x1e
  [<ffffffff810013aa>] hypercall_page+0x3aa
  [<ffffffff810013aa>] hypercall_page+0x3aa
  [<ffffffff8138512e>] xen_poll_irq_timeout+0x3e
  [<ffffffff81386b80>] xen_poll_irq+0x10
  [<ffffffff815ecdcd>] xen_spin_lock_slow+0x98
  [<ffffffff81008e23>] xen_spin_lock_flags+0x63
  [<ffffffff8160619e>] _raw_spin_lock_irqsave+0x2e
  [<ffffffff8104e662>] update_shares+0x92
  [<ffffffff81544540>] tcp_init_xmit_timers+0x30
  [<ffffffff81056a68>] rebalance_domains+0x48
  [<ffffffff810076ac>] xen_timer_interrupt+0x2c
  [<ffffffff81056d08>] run_rebalance_domains+0x48
  [<ffffffff81065bf8>] __do_softirq+0xa8
  [<ffffffff81384ad7>] __xen_evtchn_do_upcall+0x207
  [<ffffffff8160f5dc>] call_softirq+0x1c
  [<ffffffff8100c265>] do_softirq+0x65
  [<ffffffff81065fde>] irq_exit+0x8e
  [<ffffffff813869f5>] xen_evtchn_...

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-lts-backport-oneiric (Ubuntu):
status: New → Confirmed
Revision history for this message
Matt Wilson (msw-amazon) wrote :
Download full text (35.6 KiB)

We've observed this on another instance running 12.04, ami-3c994355, with a read-heavy postgresql load.

CPU0
rip: ffffffff8105e51a try_to_wake_up+0xca
flags: 00001202 i nz
rsp: ffff880f22deb7d0
rax: 0000000000000004 rcx: ffff880f22deb900 rdx: 0000000000000082
rbx: ffff880c633196e0 rsi: 0000000000000000 rdi: ffff880c63319cc0
rbp: ffff880f22deb820 r8: ffffea0024739ec0 r9: 0000000000000001
r10: 57ff5831a9739ec0 r11: 0000000000000000 r12: ffff880c63319cc0
r13: 0000000000000000 r14: 0000000000000000 r15: 0000000000000003
 cs: e033 ss: e02b ds: 0000 es: 0000
 fs: 0000 @ 00007ff790365740
 gs: 0000 @ ffff880f22de8000/0000000000000000
Code (instr addr ffffffff8105e51a)
66 90 45 85 ff 74 0e 44 8b 7d cc e9 aa 00 00 00 0f 1f 00 f3 90 <44> 8b 4b 28 45 85 c9 75 f5 48 8b

Stack:
 ffffffff81c70004 ffffffff81c70004 0000000000000001 000000041e7b3969
 0000000000000000 ffff880a931f9c70 0000000000000001 ffff880f22fbcaf0
 ffff880f22deb900 0000000000000000 ffff880f22deb830 ffffffff8105e662
 ffff880f22deb850 ffffffff81089356 0000007fff570000 ffff880f22fbcad8

Call Trace:
  [<ffffffff8105e51a>] try_to_wake_up+0xca <--
  [<ffffffff8105e662>] default_wake_function+0x12
  [<ffffffff81089356>] autoremove_wake_function+0x16
  [<ffffffff810893bb>] wake_bit_function+0x3b
  [<ffffffff8104b0a8>] __wake_up_common+0x58
  [<ffffffff8104cc78>] __wake_up+0x48
  [<ffffffff81088ff1>] __wake_up_bit+0x31
  [<ffffffff811147ca>] unlock_page+0x2a
  [<ffffffff811b05d6>] mpage_end_io+0x46
  [<ffffffff811a928d>] bio_endio+0x1d
  [<ffffffff812ebf43>] req_bio_endio.isra.45+0xa3
  [<ffffffff812ec695>] blk_update_request+0xf5
  [<ffffffff812ec9d1>] blk_update_bidi_request+0x31
  [<ffffffff812eea90>] __blk_end_bidi_request+0x20
  [<ffffffff812eeadf>] __blk_end_request_all+0x1f
  [<ffffffff814093dc>] blkif_interrupt+0x1cc
  [<ffffffff810d6925>] handle_irq_event_percpu+0x55
  [<ffffffff8130bbdb>] radix_tree_lookup+0xb
  [<ffffffff810d6b3e>] handle_irq_event+0x4e
  [<ffffffff810d9ae4>] handle_edge_irq+0x84
  [<ffffffff813a3eb9>] __xen_evtchn_do_upcall+0x199
  [<ffffffff813a5f9f>] xen_evtchn_do_upcall+0x2f
  [<ffffffff8165fb7e>] xen_do_hypervisor_callback+0x1e
  [<ffffffff810013aa>] hypercall_page+0x3aa
  [<ffffffff810013aa>] hypercall_page+0x3aa
  [<ffffffff813a464e>] xen_poll_irq_timeout+0x3e
  [<ffffffff813a6130>] xen_poll_irq+0x10
  [<ffffffff8163a7a1>] xen_spin_lock_slow+0x97
  [<ffffffff81011b33>] xen_spin_lock_flags+0x63
  [<ffffffff8165556e>] _raw_spin_lock_irqsave+0x2e
  [<ffffffff8105507e>] update_shares+0x9e
  [<ffffffff8105ddd8>] rebalance_domains+0x48
  [<ffffffff8105e078>] run_rebalance_domains+0x48
  [<ffffffff8106d318>] __do_softirq+0xa8
  [<ffffffff813a3f27>] __xen_evtchn_do_upcall+0x207
  [<ffffffff8165fb2c>] call_softirq+0x1c
  [<ffffffff81015295>] do_softirq+0x65
  [<ffffffff8106d6fe>] irq_exit+0x8e
  [<ffffffff813a5fa5>] xen_evtchn_do_upcall+0x35
  [<ffffffff8165fb7e>] xen_do_hypervisor_callback+0x1e
CPU1
rip: ffffffff810013aa hypercall_page+0x3aa
flags: 00001246 i z p
rsp: ffff880ec1225ec8
rax: 0000000000000000 rcx: ffffffff810013aa rdx: 0000000000000000
rbx: ffff880ec1225fd8 rsi: 0000000000000000 rdi: 0000000000000001
rbp: ...

summary: - Scheduler deadlock running 3.0.0 on multiple EC2 instance types
+ Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance types
Changed in linux-lts-backport-oneiric (Ubuntu):
importance: Undecided → Critical
Changed in linux (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

Matt,
Can you have customer follow the instructions here: https://wiki.ubuntu.com/Kernel/Bugs to provide information about the kernel and packages? We really have nothing to go on here.

Also, can you have the customer try out the latest AMI -- preferrably ami-895e7acc (64-bit instance store daily build from today) and see if they have the problem?

With out more information, we have a contextless stack trace.

Thanks,
Ben

Revision history for this message
Stefan Bader (smb) wrote :

For any confirmations I would like to make sure that affected parties make sure to have kernels updated to at least those versions:

- 3.0.0-24.40 (for Oneiric)
- 3.2.0-28.44 (for Precise)

This would be the first kernel versions that have "sched: Fix race in task_group()" applied. While the reported symptom is not the one reported in the associated bug report I, it is likely that there could be other symptoms caused by this.

Revision history for this message
Mike Krieger (mike-nw2wga2s5w6wnovqwws3u9) wrote : apport information

AcpiTables: sudo: unable to resolve host moonshine4-readslave-ssdtest2
AlsaDevices:
 total 0
 crw-rw---T 1 root audio 116, 1 Aug 17 00:39 seq
 crw-rw---T 1 root audio 116, 33 Aug 17 00:39 timer
AplayDevices: aplay: device_list:252: no soundcards found...
ApportVersion: 2.0.1-0ubuntu12
Architecture: amd64
ArecordDevices: arecord: device_list:252: no soundcards found...
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
DistroRelease: Ubuntu 12.04
Ec2AMI: ami-7dae1b14
Ec2AMIManifest: ubuntu-us-east-1/images-testing/ubuntu-precise-daily-amd64-desktop-20120815.manifest.xml
Ec2AvailabilityZone: us-east-1e
Ec2InstanceType: hi1.4xlarge
Ec2Kernel: aki-825ea7eb
Ec2Ramdisk: unavailable
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
Lspci:

Lsusb: Error: command ['lsusb'] failed with exit code 1: unable to initialize libusb: -99
Package: linux-lts-backport-oneiric
PciMultimedia:

ProcEnviron:
 LC_CTYPE=en_US.UTF-8
 TERM=xterm-256color
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: root=LABEL=cloudimg-rootfs ro console=hvc0
ProcVersionSignature: User Name 3.2.0-29.46-virtual 3.2.24
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.2.0-29-virtual N/A
 linux-backports-modules-3.2.0-29-virtual N/A
 linux-firmware 1.79
RfKill:

Tags: precise ec2-images
Uname: Linux 3.2.0-29-virtual x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm admin audio cdrom dialout dip floppy netdev plugdev video
WifiSyslog:

tags: added: apport-collected ec2-images precise
Revision history for this message
Mike Krieger (mike-nw2wga2s5w6wnovqwws3u9) wrote : BootDmesg.txt

apport information

Revision history for this message
Mike Krieger (mike-nw2wga2s5w6wnovqwws3u9) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Mike Krieger (mike-nw2wga2s5w6wnovqwws3u9) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Mike Krieger (mike-nw2wga2s5w6wnovqwws3u9) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Mike Krieger (mike-nw2wga2s5w6wnovqwws3u9) wrote : ProcModules.txt

apport information

Revision history for this message
Mike Krieger (mike-nw2wga2s5w6wnovqwws3u9) wrote : UdevDb.txt

apport information

Revision history for this message
Mike Krieger (mike-nw2wga2s5w6wnovqwws3u9) wrote : UdevLog.txt

apport information

Revision history for this message
Mike Krieger (mike-nw2wga2s5w6wnovqwws3u9) wrote :

Hi Stefan,

Thanks. We just tried with ami-7dae1b14 and it froze up within a few minutes of having traffic thrown at the instance.

Are there any other diagnostics we can run either right before or after it freezes? Thanks.

Revision history for this message
Stefan Bader (smb) wrote :

Thanks, Mike for the details. Just to make sure, you collected the info from the same instance that locked up (either before or after a reboot)? That would make sure that whatever information about the host is really belonging to the host where the problem happened.

As for more details, not right now as we need first to understand more about the problem. But for a general feeling:
- looking at the same instance type, does it happen on all of them sooner or later or are there exceptions?
- did the same kind of workload run without issues in a previous Ubuntu release or were those new projects
  starting with Onerirc/Precise
- Probably more to Matt, are there issues with other Linux distros running comparable kernel versions?
- It might be worth trying a kernel from mainline (http://kernel.ubuntu.com/~kernel-ppa/mainline/).
  Right now I probably would go for a generic 64bit 3.5.2 and maybe 3.6-rc2 kernel. Not sure whether
  update-grub in Precise already picks up generic kernel, so one might need to fiddle with /boot/grub/menu.cfg
  manually after installing the packages.

As Matt wrote above, when looking at the traces a bit more in detail, there are some cpus stuck in entering the hypervisor call to wait for a spinlock and others seem to have come out of that and trying to wake up some waiters.
@Matt, when you produce those cpu stacktraces, how do you do that? Is that from a dump or somehow tapping into the still running instance?
Right now it is hard to say whether this may be a real deadlock (probably the types of locks can be obtained by checking the backtrace for every cpu, but could be hard when it comes to locks of individual structures/devices). Or it is some problem of delivery of the spinlock event (be it the wrong cpu was notified or for some reason the event never happened). Also not easy to get hold of.

The best chances we would have, if it would be possible to re-create this on an isolated test system. And for that I would need some relative simple to follow steps that allow me to create that workload that is causing the issue. Still, I only got an 8-core which I would have to overcommit to 16 and if that is giving the same results...

Revision history for this message
Mike Krieger (mike-nw2wga2s5w6wnovqwws3u9) wrote :

Hi Stefan,

Yes, the same instance that froze (collected it after a reboot).

- looking at the same instance type, does it happen on all of them sooner or later or are there exceptions?

There is one of our instances of that type that is under the same load but hasn't frozen in weeks. Since they're part of the same sharded cluster, they receive a very similar amount of queries. Not sure why that one is different. We've rolled out about 20 instances of this type, and other than that one, they all freeze. Things that we've tried: LVM vs MDADM (still freezes), XFS vs ext4, not raiding the ephemeral storage at all, all of them led to freezes. The one that has only frozen once is non-RAIDed and uses XFS, but that same combination led to freezes in other instances.

- did the same kind of workload run without issues in a previous Ubuntu release or were those new projects starting with Onerirc/Precise

We usually run Natty, we only upgraded to Precise to see if it would fix the issue (it actually happens much more quickly on Precise). We did similar workloads at smaller rates on Natty, but with the SSD instances we're throwing 3x the traffic to each instance.

Revision history for this message
Mike Krieger (mike-nw2wga2s5w6wnovqwws3u9) wrote :

One more note--I tried stressing out the same instance using bonnie++ and some CPU burning "yes" processes, but it stood up okay. It does seem somehow related to throwing heavy read traffic at PostgreSQL on these instances.

Revision history for this message
Stefan Bader (smb) wrote :

Hi Mike,

could you post the dmesg of that instance? Or actually if it is running for a while, boot messages may be gone from the ring buffer. Probably "sudo grep -r . /sys/hypervisor" in the guest is good enough.

So the issue was already there with Natty (2.6.38) but happens more often since Oneiric (3.0)/Precise (3.2). It sounds a bit problematic to create the same setup. At least I would not really know how to configure PostgreSQL and then create the necessary clients to hammer it.

Generally from the traces provided by Matt in comment #8, there seems to be quite a bit going on doing process management. Like balancing the cpu load. I have not gone through all the 16 cpu traces (it is a bit tedious) but a few of them seem to be on task structures. I will post info when I find something.

Meanwhile, even with the latest kernels not solving this, it would be interesting to find out whether it might be something (else) related to autogrouping. If you put the following either in /etc/sysctl.conf or create a new /etc/sysctl.d/60-autogroup.conf and add it there:

kernel.sched_autogroup_enabled = 0

Then reboot and check whether cat /proc/sys/kernel/sched_autogroup_enabled is indeed 0 before running the workload. That should show whether it has effects on the issue.

Revision history for this message
Stefan Bader (smb) wrote :

In the hope that maybe this catches something I put some Precise kernels to http://people.canonical.com/~smb/lp1011792/. Those have lock debugging enabled. If you could install the virtual packages (the extras package is not required) to one Precise ami and let it run. If that locks up without any hints it still would be helpful to get another set of stack traces from Matt as I at least have matching debug enabled kernel images to compare the assembly against. The official debug packages tend to get thrown away due to their size whenever new kernels hit updates. So that is a bit of a never ending chase.

Revision history for this message
Rick Branson (rbranson) wrote :

We see the same clock jump issues and lockup issues as in this thread:

http://lists.xen.org/archives/html/xen-devel/2012-04/msg00888.html

Could it be related?

Revision history for this message
Stefan Bader (smb) wrote :

It is hard to say for sure. Indeed we have the same jump of the time. Though the problem on the lkml report seemed more like one task going into schedule and never really getting scheduled. While here it looks like a real locking issue. All CPUs (except CPU1) look to be waiting on some spinlock. Several on runqueue locks (which are trying to avoid potential a-b locks). CPU0 waits on some block request queue lock. From just the traces it is not possible to say but it feels like maybe one of the runqueue locks might have got not released on some not so often used special path.
I hope that the kernels with lock debug enabled may show something here. At least having new stack traces with those would be known to not have the task_group race. And it will help for the case where we need to think of a special way to catch / print info on the problem. For any further debug kernels I try to include something as mentioned on lkml to see whether this too avoids time jumps and issues. Though another suspect could be a higher rate of looking at /proc task info (which one of the CPUs runs on). That might be something that a database does more often than any other work load.

tags: added: kernel-da-key
tags: added: kernel-key
Revision history for this message
Matt Wilson (msw-amazon) wrote :

"@Matt, when you produce those cpu stacktraces, how do you do that? Is that from a dump or somehow tapping into the still running instance?"

@smb, these are traces from running, but unresponsive, instances. I pull the traces from the vCPU context in the hypervisor, then resolve symbols from the System.map of the kernel running in the guest.

Revision history for this message
Stefan Bader (smb) wrote :

This has been quiet for a while. Was there chance to try the debug kernels or with autogroup disabled?

Revision history for this message
Rick Branson (rbranson) wrote :

(I work with Mike Krieger.)

Our "quick fix" for this was to use HVM guests.

Unfortunately the only way for us to reproduce this bug at the moment is to put a database box under production load, so we have to be judicious in our approach. It also takes several hours for us to bring up a new guest up to sync and ready for the load, so it's not something that's easy to iterate on. We've tried running several different torture tests & I/O benchmarks with no success.

Our next step is to run a PV guest with with a sched_clock_stable=0 kernel, and if that still reproduces the issue we'll try turning autogroups off.

Revision history for this message
Rick Branson (rbranson) wrote :

We've tried both the autogroups disabled and a kernel with sched_clock_stable=0 forced. Both crashed. Going to try a lock debugging kernel next.

Revision history for this message
Rick Branson (rbranson) wrote :

We've been able to reproduce the bug in a more isolated environment.

I wrote a Python script (pgslam.py) that generates the (correct enough) similar load to our production traffic. In addition, I wrote a bash script that will setup a hi1.4xlarge EC2 instance to reproduce the issue. During the tests, I launched the pgslam.py script from another instance and pointed it at the instance prepared with the bash script:

This command results in the EC2 instance built with that script locking up in under a minute:

$ python pgslam.py 'host=10.10.10.10 user=pgslam password=pgslam' 800

These messages appear in the console log:

706342.844192] BUG: soft lockup - CPU#7 stuck for 23s! [postgres:9266]
[706342.844272] Stack:
[706342.844296] Call Trace:
[706342.844409] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
[706370.844190] BUG: soft lockup - CPU#7 stuck for 23s! [postgres:9266]
[706370.844519] Stack:
[706370.844549] Call Trace:
[706370.844916] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
[706371.320186] INFO: rcu_sched detected stalls on CPUs/tasks: { 0 11 13} (detected by 7, t=15002 jiffies)
[706406.844191] BUG: soft lockup - CPU#7 stuck for 24s! [postgres:9266]
[706406.844293] Stack:
[706406.844330] Call Trace:
[706406.844461] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
[706434.844191] BUG: soft lockup - CPU#7 stuck for 22s! [postgres:9266]
[706434.844273] Stack:
[706434.844297] Call Trace:
[706434.844411] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
[706462.844192] BUG: soft lockup - CPU#7 stuck for 22s! [postgres:9266]
[706462.844273] Stack:
[706462.844297] Call Trace:
[706462.844412] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc

Revision history for this message
Rick Branson (rbranson) wrote :
Revision history for this message
Rick Branson (rbranson) wrote :
Revision history for this message
Rick Branson (rbranson) wrote :

FYI, the above repro was done with ami-3c994355.

Revision history for this message
Rick Branson (rbranson) wrote :

Stefan -- we are trying to repro with the debug kernel image you linked. So far it survived the initial one minute death, but we'll run the test for several more hours and see if it still crashes. Right now the database isn't very large, so it's not doing any read I/O. Once the database starts to become larger it will have to start doing more read I/O.

Revision history for this message
Stefan Bader (smb) wrote :

Rick, thanks for the scripts. The good news is that I seem to be able to reproduce the lockup on a local machine with less memory and CPUs (not having a really big box at my hands). Now I should be able to get more info out of that (given a bit of time).

Revision history for this message
Matt Wilson (msw-amazon) wrote :

For what it's worth, I started running this test case on the Amazon Linux AMI (ami-aecd60c7) yesterday. It hasn't crashed. The DB is now >96 GiB.

Revision history for this message
Rick Branson (rbranson) wrote :

We've ran it all the way up to 334GB on the debug kernel and it's fine. Early next week we're going to deploy the debug kernel onto a production host.

Revision history for this message
Stefan Bader (smb) wrote :

Matt, and what kernel version would the Amazon Linux AMI be?

Rick, that could in the most annoying way confirm a race somewhere as we suspected. Add some code that makes things slower and it becomes rare or goes way completely.

Revision history for this message
Rick Branson (rbranson) wrote :

Stefan,

We threw the debug kernel into production and it ended up dying after a few hours so it's highly likely you're right about it just slowing down the race behavior.

What was the setup you used to repro the bug on your side with the scripts I gave you?

Rick

tags: removed: kernel-key
25 comments hidden view all 105 comments
Revision history for this message
Steven Noonan (steven-valvesoftware) wrote :

Stefan,

I did our internal SSD and network performance testing qualifications on hi1.4xlarge with CONFIG_PARAVIRT_SPINLOCKS=n in the kernel build. There's very little discernible difference in performance (within statistical noise ranges), and the benefits of disabling it are pretty clear. Can CONFIG_PARAVIRT_SPINLOCKS be disabled in the Ubuntu -virtual kernel so we can close out this bug, please?

Revision history for this message
Stefan Bader (smb) wrote :

Steven, as I said before, that option affects both Xen and KVM behaviour. So I want to avoid that as much as possible. I you could please give the test1 kernel a try which "only" does not re-enable interrupts of the guest while doing the hypercall. In my testing this also would not cause the hang and would only be affecting Xen which has the issue first place.

Revision history for this message
Konrad Rzeszutek Wilk (konrad-wilk) wrote : Re: [Bug 1011792] Re: Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance types
Download full text (18.0 KiB)

On Mon, Nov 12, 2012 at 08:47:11PM -0000, Steven Noonan wrote:
> Stefan,
>
> I did our internal SSD and network performance testing qualifications on
> hi1.4xlarge with CONFIG_PARAVIRT_SPINLOCKS=n in the kernel build.
> There's very little discernible difference in performance (within
> statistical noise ranges), and the benefits of disabling it are pretty
> clear. Can CONFIG_PARAVIRT_SPINLOCKS be disabled in the Ubuntu -virtual
> kernel so we can close out this bug, please?

Yikes, that would be quite bad for overcommitted scenarios and
for running the guest on non-PLE hardware.

I am looking at the bug now to get an idea of what is happening
and how to reproduce it.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1011792
>
> Title:
> Kernel lockup running 3.0.0 and 3.2.0 on multiple EC2 instance types
>
> Status in “linux” package in Ubuntu:
> Confirmed
> Status in “linux-lts-backport-oneiric” package in Ubuntu:
> Confirmed
>
> Bug description:
> Scheduler deadlocks have been observed on c1.xlarge EC2 instances
> running 10.04.3 LTS with the 3.0.0-20-virtual Oneiric backport
> kernel. The symptoms appear similar to bug 929941, where multiple CPUs
> are waiting on scheduler runqueue locks. But in this case, only a few
> CPUs are stuck.
>
> A typical set of stack traces from the guest state looks like:
>
> VCPU0
> rip: ffffffff810013aa hypercall_page+0x3aa
> flags: 00001202 i nz
> rsp: ffff8801b3c27910
> rax: 0000000000000000 rcx: ffffffff810013aa rdx: ffff8801b3c27954
> rbx: ffff88000265cb30 rsi: ffff8801b3c27938 rdi: 0000000000000003
> rbp: ffff8801b3c27958 r8: 0000000000000001 r9: 0000000000000001
> r10: 0000000000000000 r11: 0000000000000202 r12: 0000000000000011
> r13: 0000000000000001 r14: 0000000000000001 r15: 0000000000000000
> cs: e033 ss: e02b ds: 0000 es: 0000
> fs: 0000 @ 00007f4ce223f700
> gs: 0000 @ ffff8801bfed4000/0000000000000000
>
> cr0: 80050033
> cr2: 0061ade0
> cr3: 0e93d000
> cr4: 00002660
>
> dr0: 00000000
> dr1: 00000000
> dr2: 00000000
> dr3: 00000000
> dr6: ffff0ff0
> dr7: 00000400
> Code (instr addr ffffffff810013aa)
> cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc
>
>
> Stack:
> 0000000000000246 0000000000000000 ffffffff81394b42 ffff8801b3c27938
> 0000000000000000 ffff8801b3c27954 ffffffff00000001 0000000000000000
> 0000000481394ad6 ffff8801b3c27968 ffffffff81394b60 ffff8801b3c279b8
> ffffffff8100933f ffff8801b3c27a48 0000000000000000 ffff8801b3c27998
>
> Call Trace:
> [<ffffffff810013aa>] hypercall_page+0x3aa <--
> [<ffffffff81394b42>] xen_poll_irq_timeout+0x42
> [<ffffffff81394b60>] xen_poll_irq+0x10
> [<ffffffff8100933f>] xen_spin_lock_slow+0x7f
> [<ffffffff81009435>] xen_spin_lock_flags+0x75
> [<ffffffff8160365f>] _raw_spin_lock_irqsave+0x2f
> [<ffffffff8104ee50>] task_rq_lock+0x40
> [<ffffffff8104f069>] task_sched_runtime+0x29
> [<ffffffff81085f38>] thread_group_cputime+0x88
> [<ffffffff812c4dc9>] appa...

Revision history for this message
Steven Noonan (steven-valvesoftware) wrote :

I've got 'test1' running on 19 hi1.4xlarge instances using the pgslam.py workload. They're all up to 40G of capacity used right now, still no lockups.

I'll leave this running for a while longer and see what happens (9 hours so far).

Revision history for this message
Steven Noonan (steven-valvesoftware) wrote :

Well, the hosts are still running fine. pgslam.py eventually bombed out on all of them with "InternalError: could not open relation with OID". Not sure why, but either way there were no lockups. It's possible "test1" resolves the issue. Konrad?

Revision history for this message
Rick Branson (rbranson) wrote :

FWIW -- we have dozens of SSD instances running the nopvspinlock patched kernel that each serve ~1000 connections doing 10k+ QPS / 10k+ IOPS and have uptimes in the weeks range now. I'll talk to the team about getting some time to slot in the "test1" kernel.

Revision history for this message
Konrad Rzeszutek Wilk (konrad-wilk) wrote :

Stefan,
Is your patch somewhere accessible? Thx

Revision history for this message
Stefan Bader (smb) wrote :

It was circulated on the mailing list, but for simpler reference I am adding it here (ok, I did clean up the comment section a bit).

tags: added: patch
Revision history for this message
Jeffrey Gelens (jgelens) wrote :

Any idea when this will patch will arrive in the precise kernel packages?

Revision history for this message
Stefan Bader (smb) wrote :

Somehow I had hoped to find some way to understand why the fix works (or what exactly goes wrong without it). But then other things and bad long-time memory (sort of) came into play and this has not really progressed much. So I try to actually get this into Precise at least (since no testing seems to have found any bad side effects). Still need to follow-up on this for current kernels (if the testcase still fails there).

Changed in linux (Ubuntu Precise):
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Stefan Bader (stefan-bader-canonical)
Changed in linux-lts-backport-oneiric (Ubuntu Precise):
status: New → Invalid
Stefan Bader (smb)
description: updated
Revision history for this message
Jeffrey Gelens (jgelens) wrote :

Thanks for the update, hope this works as this is a big problem for our servers. We're thinking about downgrading to Lucid, but I'm eager to try this out first. When do you think it can it can be available as normal package update?

Revision history for this message
Stefan Bader (smb) wrote :

Started to push for Precise now. Depending on whether I get into this or the next cycle it could be 3 or 6 weeks. There will be requests for testing and release notifications posted to this bug when it happens.

Stefan Bader (smb)
Changed in linux (Ubuntu Precise):
status: In Progress → Fix Committed
Revision history for this message
Steven Noonan (steven-valvesoftware) wrote :

Stefan, let's be sure this fix gets upstream as well.

Revision history for this message
Stefan Bader (smb) wrote :

After finally having a breakthrough in understanding the source of the lockup and further discussions upstream, the proper turns out to be to change the way waiters are woken when a spinlock gets freed. A slightly more verbose explanation of this is in the attached patch that likely goes upstream. So there is a chance that relatively soon the work-around gets replaced. I pre-compiled a current version of kernels with that change and uploaded them to (http://people.canonical.com/~smb/lp1011792/). I have been running the pgslam testcase on those without experiencing any hangs. If anybody wants to give them a early try in production that would be appreciated.

Revision history for this message
Mike Heffner (mikeh-fesnel) wrote :

What is the best way to test if we are impacted by this bug? We run the following image on c1.xlarge's and see nodes die about every 1-2 days now under a continuous >50% CPU load. The nodes will fail the EC2 instance status check and all monitoring daemons on the node will stop reporting.

However, there is no backtrace or any information in the logs after a reboot to diagnose what failed. Is there a change required to dump the stacktraces seen above in order to tell if we are impacted by this same bug?

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 11.10
Release: 11.10
Codename: oneiric

# uname -a
Linux ip-10-191-55-199 3.0.0-30-virtual #47-Ubuntu SMP Wed Jan 2 23:54:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

IMAGE ami-57bd2a3e 099720109477/ubuntu/images/ubuntu-oneiric-11.10-amd64-server-20130203 099720109477 available public x86_64 machine aki-88aa75e1 instance-store paravirtual xen

Revision history for this message
Stefan Bader (smb) wrote :

The change from comment #79 is now upstream and also on its way to come back into releases via upstream stable. So one of the future uploads would be carrying that change. Note that Oneiric is quite close to be without further support. It might be wise to think about upgrading to an LTS. But ok, I compiled a 3.0 kernel package for 64bit and placed it on the same people page. So you can try to find out whether this helps in your case as well.

Stefan Bader (smb)
Changed in linux-lts-backport-oneiric (Ubuntu Lucid):
status: New → Invalid
Changed in linux-lts-backport-oneiric (Ubuntu Quantal):
status: New → Invalid
Changed in linux-lts-backport-oneiric (Ubuntu Raring):
status: Confirmed → Invalid
Changed in linux-lts-backport-oneiric (Ubuntu Oneiric):
importance: Undecided → Critical
status: New → Confirmed
Changed in linux-lts-backport-oneiric (Ubuntu Raring):
importance: Critical → Undecided
Changed in linux (Ubuntu Quantal):
importance: Undecided → Critical
status: New → Confirmed
importance: Critical → High
Changed in linux (Ubuntu Oneiric):
importance: Undecided → Critical
status: New → Confirmed
Changed in linux (Ubuntu Lucid):
importance: Undecided → Medium
status: New → Triaged
Changed in linux (Ubuntu Raring):
importance: Critical → High
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.8.0-9.18

---------------
linux (3.8.0-9.18) raring; urgency=low

  [Tim Gardner]

  * Release Tracking Bug
    - LP: #1135937
  * [Config] CONFIG_PATA_ACPI=m
    - LP: #1084783

  [ Upstream Kernel Changes ]

  * intel_idle: stop using driver_data for static flags
    - LP: #1083993
  * intel_idle: support Haswell
    - LP: #1083993
  * i2c: i801: SMBus patch for Intel Avoton DeviceIDs
    - LP: #1083965
  * ahci: AHCI-mode SATA patch for Intel Avoton DeviceIDs
    - LP: #1083965
  * ata_piix: IDE-mode SATA patch for Intel Avoton DeviceIDs
    - LP: #1083965

  [ Upstream Kernel Changes ]

  * rebase to v3.8.1
    - LP: #1011792
 -- Tim Gardner <email address hidden> Wed, 27 Feb 2013 06:10:52 -0700

Changed in linux (Ubuntu Raring):
status: Confirmed → Fix Released
Brad Figg (brad-figg)
tags: added: verification-needed-precise
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed' to 'verification-done'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Mike Heffner (mikeh-fesnel) wrote :

Stefan: Thanks for the building the oneiric backport kernel. We have taken your advice though and updated to precise running the 3.2.0-38 patched kernel you uploaded.

Revision history for this message
Fleish (lasnchpad) wrote :

I've been having similar lockups on dom0's running Ubuntu 12.04 LTS w/kernel linux-image-3.2.0-32-generic as a dom0. Below is output from the last one, which shows the same stack trace being seen in the virtual kernel image. I took the pgslam/setup_hi1_4xlarge_for_crash_test.sh test scripts and modified the setup to not setup md0 and am just using my local storage instead. On 3.2.0-32-generic, the system locked up repeatedly when pgslam was run. I then updated the kernel to 3.2.0-39-generic found on precise-proposed and re-tested without any issues. I wanted to report my findings here since I haven't seen anyone confirm the fix for precise. Not sure if I can do that based on my findings, but would be happy to if possible to help move the fix along to precise-updates.

[70987.896664] BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[70987.896822] Stack:
[70987.896857] Call Trace:
[70987.896880] <IRQ>
[70987.897138] <EOI>
[70987.897185] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
[71015.896695] BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[71015.896940] Stack:
[71015.896985] Call Trace:
[71015.897012] <IRQ>
[71015.897420] <EOI>
[71015.897487] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
[71043.464982] igb 0000:01:00.0: eth0: Reset adapter
[71043.896750] BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[71043.896981] Stack:
[71043.897026] Call Trace:
[71043.897053] <IRQ>
[71043.897443] <EOI>
[71043.897507] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
[71048.180847] INFO: task kworker/u:2:19718 blocked for more than 120 seconds.
[71048.180893] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[71071.896790] BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]
[71071.897023] Stack:
[71071.897068] Call Trace:
[71071.897095] <IRQ>
[71071.897491] <EOI>
[71071.897555] Code: cc 51 41 53 b8 1c 00 00 00 0f 05 41 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 51 41 53 b8 1d 00 00 00 0f 05 <41> 5b 59 c3 cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
[71088.072808] INFO: rcu_sched detected stalls on CPUs/tasks: { 4} (detected by 3, t=60034 jiffies)

Revision history for this message
Stefan Bader (smb) wrote :

It is not completely surprising as dom0 actually is a PV guest. One with special privileges though. But it is good to have confirmation that this also would affect dom0 and is also fixed by the same change.
As said in comment #83, there is currently a Precise kernel in proposed that will contain the first approach on fixing this (which is not to enable interrupts during the hv call).
This should get replaced by the upstream fix (which is to wake up all spinners and not only the first found). This already made it into Raring and will sooner or later get into all supported releases. It is a bit unfortunate that Precise goes through both stages but it was impossible to tell in advance whether and how quickly an upstream fix would take. And given the process cadence I wanted Precise working as soon as possible.
Right now, since the first version is in verification phase, it would be great if anybody could confirm that this kernel also prevents the hang. Then with the next round of stable updates, I will make sure the old fix is replaced by the new one.

Revision history for this message
Jeffrey Gelens (jgelens) wrote :

Testing the fix in precise. I'll update this bug report with my findings later this week to be sure it works.

Revision history for this message
Jeffrey Gelens (jgelens) wrote :

I'm testing 4 machines with pgslam with the kernel in precise-proposed. There have been no issues for several days, it would be great if someone could change the tag 'verification-needed' to 'verification-done', so that it can be in the main repos.

Thanks!

Stefan Bader (smb)
tags: added: verification-done-precise
removed: verification-needed-precise
Revision history for this message
Jérôme Petazzoni (jerome-petazzoni) wrote :

For the record: we also experienced this issue on EC2 PV instances (m2.2xlarge in us-east-1), using a very intensive workload (~1000 LXC containers running a mix of web apps and databases). Running any kind of 3.X kernel (3.2, 3.4, 3.6) causes our test workload to crash in less than one hour.

The same workload, with the same kernel, on bare metal, works fine.
The same workload, with the same kernel, on HVM instances, works fine.
The same workload, with the 3.2.0-39-virtual kernel in "proposed", on PV instances (m3.2xlarge), works fine.
The same workload, with a 3.2.40+grsec kernel custom kernel including the spinlock IPI patch, on PV instances, works fine.

We'll update this ticket if we see other problems appear, but it looks like this (1 line!) patch did the trick.

Congratulations to those involved with the debugging—this thing has been locking us with pre-3.0 kernels on our old fleet of EC2 instances, and seeing it go away is a huge relief!

Revision history for this message
Adam Conrad (adconrad) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (15.4 KiB)

This bug was fixed in the package linux - 3.2.0-39.62

---------------
linux (3.2.0-39.62) precise-proposed; urgency=low

  [Brad Figg]

  * Release Tracking Bug
    - LP: #1134424

  [ Herton Ronaldo Krzesinski ]

  * Revert "SAUCE: samsung-laptop: disable in UEFI mode"
    - LP: #1117693
  * d-i: Add mellanox ethernet drivers to nic-modules
    - LP: #1015339

  [ Ian Campbell ]

  * SAUCE: xen/netback: shutdown the ring if it contains garbage.
    - LP: #1117325
    - CVE-2013-0216
  * SAUCE: netback: correct netbk_tx_err to handle wrap around.
    - LP: #1117325
    - CVE-2013-0216
  * SAUCE: xen/netback: don't leak pages on failure in
    xen_netbk_tx_check_gop.
    - LP: #1117331
    - CVE-2013-0217
  * SAUCE: xen/netback: free already allocated memory on failure in
    xen_netbk_get_requests
    - LP: #1117331
    - CVE-2013-0217

  [ Jan Beulich ]

  * SAUCE: xen-pciback: rate limit error messages from
    xen_pcibk_enable_msi{, x}()
    - LP: #1117336
    - CVE-2013-0231

  [ Tim Gardner ]

  * [debian] Remove dangling symlink from headers package
    - LP: #1112442
  * SAUCE: rt2x00: rt2x00pci_regbusy_read() - only print register access
    failure once
    - LP: #1128840

  [ Upstream Kernel Changes ]

  * Revert "ALSA: hda - Shut up pins at power-saving mode with Conexnat
    codecs"
    - LP: #1117693, #886975
  * Revert "drm/i915: no lvds quirk for Zotac ZDBOX SD ID12/ID13"
    - LP: #1117693
  * Revert "ptrace: ensure arch_ptrace/ptrace_request can never race with
    SIGKILL"
    - LP: #1131218
  * SAUCE: xen/pv-spinlock: Never enable interrupts in xen_spin_lock_slow()
    - LP: #1011792
  * usb: gadget: dummy: fix enumeration with g_multi
    - LP: #1117693
  * usb: musb: core: print new line in the driver banner again
    - LP: #1117693
  * virtio-blk: Don't free ida when disk is in use
    - LP: #1117693
  * mac80211: use del_timer_sync for final sta cleanup timer deletion
    - LP: #1117693
  * xhci: Handle HS bulk/ctrl endpoints that don't NAK.
    - LP: #1117693
  * USB: Handle auto-transition from hot to warm reset.
    - LP: #1117693
  * USB: Ignore xHCI Reset Device status.
    - LP: #1117693
  * USB: Allow USB 3.0 ports to be disabled.
    - LP: #1117693
  * USB: Increase reset timeout.
    - LP: #1117693
  * USB: Ignore port state until reset completes.
    - LP: #1117693
  * USB: Handle warm reset failure on empty port.
    - LP: #1117693
  * xhci: Avoid "dead ports", add roothub port polling.
    - LP: #1117693
  * ASoC: wm5100: Remove DSP B and left justified formats
    - LP: #1117693
  * mwifiex: handle association failure case correctly
    - LP: #1117693
  * mwifiex: check wait_event_interruptible return value
    - LP: #1117693
  * ASoC: wm2000: Fix sense of speech clarity enable
    - LP: #1117693
  * ioat: Fix DMA memory sync direction correct flag
    - LP: #1117693
  * drm/i915; Only increment the user-pin-count after successfully pinning
    the bo
    - LP: #1117693
  * staging: r8712u: Add new device ID
    - LP: #1117693
  * staging: speakup: avoid out-of-range access in synth_init()
    - LP: #1117693
  * staging: speakup: avoid out-of-range access in synth_add()
    - LP: #1117693
  * staging: come...

Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
Nathan O'Sullivan (nathan-mammoth) wrote :

Can confirm this affects Dom0; with kernel 3.2.0-38 the attached pgslam and Xen set as:
GRUB_CMDLINE_XEN="dom0_mem=7000M dom0_max_vcpus=24 dom0_vcpus_pin"

I can get a crash within two minutes on 3.2.0-38. Still testing 3.2.0-39 but it certainly gets past the two minute park.

Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed' to 'verification-done'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-quantal
Revision history for this message
Stefan Bader (smb) wrote :

Verified that quantal-proposed passes the pgslam testcase.

tags: added: verification-done-quantal
removed: verification-needed-quantal
Changed in linux (Ubuntu Quantal):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Precise):
assignee: Stefan Bader (stefan-bader-canonical) → nobody
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed' to 'verification-done'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-oneiric
Stefan Bader (smb)
Changed in linux (Ubuntu Oneiric):
status: Confirmed → Fix Committed
Revision history for this message
Stefan Bader (smb) wrote :

Verified on Oneiric.

tags: added: verification-done-oneiric
removed: verification-needed-oneiric
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (7.9 KiB)

This bug was fixed in the package linux - 3.0.0-32.51

---------------
linux (3.0.0-32.51) oneiric-proposed; urgency=low

  [Steve Conklin]

  * Release Tracking Bug
    - LP: #1158340

  [ Upstream Kernel Changes ]

  * printk: fix buffer overflow when calling log_prefix function from
    call_console_drivers
    - LP: #1145196
  * 3.0.66
    - LP: #1145196
  * x86-32, mm: Remove reference to resume_map_numa_kva()
    - LP: #1145196
  * mm: fix pageblock bitmap allocation
    - LP: #1145196
  * timeconst.pl: Eliminate Perl warning
    - LP: #1145196
  * genirq: Avoid deadlock in spurious handling
    - LP: #1145196
  * posix-cpu-timers: Fix nanosleep task_struct leak
    - LP: #1145196
  * hrtimer: Prevent hrtimer_enqueue_reprogram race
    - LP: #1145196
  * ALSA: ali5451: remove irq enabling in pointer callback
    - LP: #1145196
  * ALSA: rme32.c irq enabling after spin_lock_irq
    - LP: #1145196
  * tty: set_termios/set_termiox should not return -EINTR
    - LP: #1145196
  * xen: Send spinlock IPI to all waiters
    - LP: #1011792, #1145196
  * Driver core: treat unregistered bus_types as having no devices
    - LP: #1145196
  * mm: mmu_notifier: have mmu_notifiers use a global SRCU so they may
    safely schedule
    - LP: #1145196
  * mm: mmu_notifier: make the mmu_notifier srcu static
    - LP: #1145196
  * mmu_notifier_unregister NULL Pointer deref and multiple ->release()
    callouts
    - LP: #1145196
  * KVM: s390: Handle hosts not supporting s390-virtio.
    - LP: #1145196
  * s390/kvm: Fix store status for ACRS/FPRS
    - LP: #1145196
  * inotify: remove broken mask checks causing unmount to be EINVAL
    - LP: #1145196
  * ocfs2: unlock super lock if lockres refresh failed
    - LP: #1145196
  * drivers/video/backlight/adp88?0_bl.c: fix resume
    - LP: #1145196
  * tmpfs: fix use-after-free of mempolicy object
    - LP: #1145196
  * mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to
    discard all pages
    - LP: #1145196
  * NLM: Ensure that we resend all pending blocking locks after a reclaim
    - LP: #1145196
  * p54usb: corrected USB ID for T-Com Sinus 154 data II
    - LP: #1145196
  * ALSA: usb-audio: fix Roland A-PRO support
    - LP: #1145196
  * ALSA: usb: Fix Processing Unit Descriptor parsers
    - LP: #1145196
  * ext4: Free resources in some error path in ext4_fill_super
    - LP: #1145196
  * ext4: add missing kfree() on error return path in add_new_gdb()
    - LP: #1145196
  * sunvdc: Fix off-by-one in generic_request().
    - LP: #1145196
  * drm/usb: bind driver to correct device
    - LP: #1145196
  * drm/i915: disable shared panel fitter for pipe
    - LP: #1145196
  * staging: comedi: disallow COMEDI_DEVCONFIG on non-board minors
    - LP: #1145196
  * staging: vt6656: Fix URB submitted while active warning.
    - LP: #1145196
  * ARM: PXA3xx: program the CSMSADRCFG register
    - LP: #1145196
  * powerpc/kexec: Disable hard IRQ before kexec
    - LP: #1145196
  * Purge existing TLB entries in set_pte_at and ptep_set_wrprotect
    - LP: #1145196
  * pcmcia/vrc4171: Add missing spinlock init
    - LP: #1145196
  * fbcon: don't lose the console font across generic->chip driver switch
    - L...

Read more...

Changed in linux (Ubuntu Oneiric):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (7.9 KiB)

This bug was fixed in the package linux-lts-backport-oneiric - 3.0.0-32.51~lucid1

---------------
linux-lts-backport-oneiric (3.0.0-32.51~lucid1) lucid-proposed; urgency=low

  [Steve Conklin]

  * Release Tracking Bug
    - LP: #1158541

  [ Upstream Kernel Changes ]

  * printk: fix buffer overflow when calling log_prefix function from
    call_console_drivers
    - LP: #1145196
  * 3.0.66
    - LP: #1145196
  * x86-32, mm: Remove reference to resume_map_numa_kva()
    - LP: #1145196
  * mm: fix pageblock bitmap allocation
    - LP: #1145196
  * timeconst.pl: Eliminate Perl warning
    - LP: #1145196
  * genirq: Avoid deadlock in spurious handling
    - LP: #1145196
  * posix-cpu-timers: Fix nanosleep task_struct leak
    - LP: #1145196
  * hrtimer: Prevent hrtimer_enqueue_reprogram race
    - LP: #1145196
  * ALSA: ali5451: remove irq enabling in pointer callback
    - LP: #1145196
  * ALSA: rme32.c irq enabling after spin_lock_irq
    - LP: #1145196
  * tty: set_termios/set_termiox should not return -EINTR
    - LP: #1145196
  * xen: Send spinlock IPI to all waiters
    - LP: #1011792, #1145196
  * Driver core: treat unregistered bus_types as having no devices
    - LP: #1145196
  * mm: mmu_notifier: have mmu_notifiers use a global SRCU so they may
    safely schedule
    - LP: #1145196
  * mm: mmu_notifier: make the mmu_notifier srcu static
    - LP: #1145196
  * mmu_notifier_unregister NULL Pointer deref and multiple ->release()
    callouts
    - LP: #1145196
  * KVM: s390: Handle hosts not supporting s390-virtio.
    - LP: #1145196
  * s390/kvm: Fix store status for ACRS/FPRS
    - LP: #1145196
  * inotify: remove broken mask checks causing unmount to be EINVAL
    - LP: #1145196
  * ocfs2: unlock super lock if lockres refresh failed
    - LP: #1145196
  * drivers/video/backlight/adp88?0_bl.c: fix resume
    - LP: #1145196
  * tmpfs: fix use-after-free of mempolicy object
    - LP: #1145196
  * mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to
    discard all pages
    - LP: #1145196
  * NLM: Ensure that we resend all pending blocking locks after a reclaim
    - LP: #1145196
  * p54usb: corrected USB ID for T-Com Sinus 154 data II
    - LP: #1145196
  * ALSA: usb-audio: fix Roland A-PRO support
    - LP: #1145196
  * ALSA: usb: Fix Processing Unit Descriptor parsers
    - LP: #1145196
  * ext4: Free resources in some error path in ext4_fill_super
    - LP: #1145196
  * ext4: add missing kfree() on error return path in add_new_gdb()
    - LP: #1145196
  * sunvdc: Fix off-by-one in generic_request().
    - LP: #1145196
  * drm/usb: bind driver to correct device
    - LP: #1145196
  * drm/i915: disable shared panel fitter for pipe
    - LP: #1145196
  * staging: comedi: disallow COMEDI_DEVCONFIG on non-board minors
    - LP: #1145196
  * staging: vt6656: Fix URB submitted while active warning.
    - LP: #1145196
  * ARM: PXA3xx: program the CSMSADRCFG register
    - LP: #1145196
  * powerpc/kexec: Disable hard IRQ before kexec
    - LP: #1145196
  * Purge existing TLB entries in set_pte_at and ptep_set_wrprotect
    - LP: #1145196
  * pcmcia/vrc4171: Add missing spinlock init
    - LP: #1145196
  * fbcon: don't lose the c...

Read more...

Changed in linux-lts-backport-oneiric (Ubuntu Lucid):
status: Invalid → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (13.2 KiB)

This bug was fixed in the package linux - 3.5.0-27.46

---------------
linux (3.5.0-27.46) quantal-proposed; urgency=low

  [Steve Conklin]

  * Release Tracking Bug
    - LP: #1159991

  [ Steve Conklin ]

  * Start New Release

  [ Upstream Kernel Changes ]

  * crypto: user - fix info leaks in report API
    - LP: #1156790, #1156795, #1156799
    - CVE-2013-2546
  * brcmsmac: fix mismatch in number of custom regulatory rules
    - LP: #1156769

  [ Xiangliang Yu ]

  * SAUCE: PCI: define macro for marvell vendor ID
    - LP: #1159863
  * SAUCE: PCI: fix system hang issue of Marvell SATA host controller
    - LP: #1159863

linux (3.5.0-27.45) quantal-proposed; urgency=low

  [Steve Conklin]

  * no change

linux (3.5.0-27.44) quantal-proposed; urgency=low

  [Steve Conklin]

  * no change

linux (3.5.0-27.43) quantal-proposed; urgency=low

  [Steve Conklin]

  * Release Tracking Bug
    - LP: #1156821

  [ Adam Lee ]

  * SAUCE: Bluetooth: Add support for 04ca:2007
    - LP: #1153448

  [ Andy Whitcroft ]

  * [Config] re-disable CONFIG_SOUND_OSS_PRECLAIM
    - LP: #1105230

  [ Brad Figg ]

  * Start new release

  [ Kamal Mostafa ]

  * SAUCE: alx: use github.com/qca/alx repo
    - LP: #1154238

  [ Luis Henriques ]

  * [Config] CONFIG_NFS_V4_1=y
    - LP: #1111416

  [ Qualcomm Atheros, Inc ]

  * SAUCE: alx: Update to heads/master
    - LP: #1154238

  [ Upstream Kernel Changes ]

  * Revert "drm: Add EDID_QUIRK_FORCE_REDUCED_BLANKING for ASUS VW222S"
    - LP: #1144961
  * genirq: Avoid deadlock in spurious handling
    - LP: #1144961
  * KVM: s390: Handle hosts not supporting s390-virtio.
    - LP: #1144961
  * v4l: Reset subdev v4l2_dev field to NULL if registration fails
    - LP: #1144961
  * powerpc/eeh: Fix crash when adding a device in a slot with DDW
    - LP: #1144961
  * pcmcia/vrc4171: Add missing spinlock init
    - LP: #1144961
  * USB: DWC3: Generate interrupt on each TRB as default option
    - LP: #1144961
  * usb: dwc3: gadget: fix skip LINK_TRB on ISOC
    - LP: #1144961
  * usb: dwc3: gadget: req->queued must be forced to false in cleanup
    - LP: #1144961
  * Purge existing TLB entries in set_pte_at and ptep_set_wrprotect
    - LP: #1144961
  * ARM: PXA3xx: program the CSMSADRCFG register
    - LP: #1144961
  * ab8500-chargalg: Only root should have write permission on sysfs file
    - LP: #1144961
  * USB: option: add and update Alcatel modems
    - LP: #1144961
  * uprobes/powerpc: Add dependency on single step emulation
    - LP: #1144961
  * drivers/video: fsl-diu-fb: fix pixel formats for 24 and 16 bpp
    - LP: #1144961
  * s390/kvm: Fix store status for ACRS/FPRS
    - LP: #1144961
  * staging: comedi: disallow COMEDI_DEVCONFIG on non-board minors
    - LP: #1144961
  * ALSA: usb-audio: fix Roland A-PRO support
    - LP: #1144961
  * x86-32, mm: Rip out x86_32 NUMA remapping code
    - LP: #1144961
  * x86-32, mm: Remove reference to resume_map_numa_kva()
    - LP: #1144961
  * x86-32, mm: Remove reference to alloc_remap()
    - LP: #1144961
  * ALSA: hda - Release assigned pin/cvt at error path of hdmi_pcm_open()
    - LP: #1144961
  * ab8500_btemp: Demote initcall sequence
    - LP: #1144961
  * Driver co...

Changed in linux (Ubuntu Quantal):
status: Fix Committed → Fix Released
Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed' to 'verification-done'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-lucid
Revision history for this message
Stefan Bader (smb) wrote :

Did test runs for the current EC2 kernel (which we actually do not expect to be affected at all) and the proposed virtual/server flavour manually inserted into a Xen PV guest that is based on the cloud-images. Both passed.

tags: added: verification-done-lucid
removed: verification-needed-lucid
Revision history for this message
Brandon (lordnynex-6) wrote :

Am I the only one experiencing this issue in HVM machines? Also, does anyone happen to know if theres a precise AMI that fixes this issue?

Revision history for this message
Stefan Bader (smb) wrote :

No, you are guaranteed *not* to experience _this_ issue on a HVM guest. That is because HVM guests use a completely different spinlock implementation. It is possible that you see hangs/lockups but please open a new bug report for those because it is a different issue.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.32-48.110

---------------
linux (2.6.32-48.110) lucid; urgency=low

  [Steve Conklin]

  * Release Tracking Bug
    - LP: #1186340

  [ Stefan Bader ]

  * (config) Import Xen specific config options from ec2
    - LP: #1177431
  * SAUCE: xen: Send spinlock IPI to all waiters
    - LP: #1011792, #1177431

  [ Upstream Kernel Changes ]

  * ax25: fix info leak via msg_name in ax25_recvmsg()
    - LP: #1172366
    - CVE-2013-3223
  * Bluetooth: fix possible info leak in bt_sock_recvmsg()
    - LP: #1172368
    - CVE-2013-3224
  * tipc: fix info leaks via msg_name in recv_msg/recv_stream
    - LP: #1172403
    - CVE-2013-3235
  * rose: fix info leak via msg_name in rose_recvmsg()
    - LP: #1172394
    - CVE-2013-3234
  * Bluetooth: RFCOMM - Fix missing msg_namelen update in
    rfcomm_sock_recvmsg()
    - LP: #1172369
    - CVE-2013-3225
  * atm: update msg_namelen in vcc_recvmsg()
    - LP: #1172365
    - CVE-2013-3222
  * KVM: Fix bounds checking in ioapic indirect register reads
    (CVE-2013-1798)
    - LP: #1158262
    - CVE-2013-1798
  * llc: Fix missing msg_namelen update in llc_ui_recvmsg()
    - LP: #1172385
    - CVE-2013-3231
  * netrom: fix info leak via msg_name in nr_recvmsg()
    - LP: #1172386
    - CVE-2013-3232
  * irda: Fix missing msg_namelen update in irda_recvmsg_dgram()
    - LP: #1172380
    - CVE-2013-3228
  * iucv: Fix missing msg_namelen update in iucv_sock_recvmsg()
    - LP: #1172381
    - CVE-2013-3229
 -- Steve Conklin <email address hidden> Fri, 31 May 2013 12:03:46 -0500

Changed in linux (Ubuntu Lucid):
status: Triaged → Fix Released
Revision history for this message
Rolf Leggewie (r0lf) wrote :

oneiric has seen the end of its life and is no longer receiving any updates. Marking the oneiric task for this ticket as "Won't Fix".

Changed in linux-lts-backport-oneiric (Ubuntu Oneiric):
status: Confirmed → Won't Fix
Displaying first 40 and last 40 comments. View all 105 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.