Activity log for bug #1168350

Date Who What changed Old value New value Message
2013-04-12 10:24:01 Munehisa Kamata bug added bug
2013-04-12 10:24:01 Munehisa Kamata attachment added apport.linux-image-3.2.0-40-virtual.9wl95P.apport https://bugs.launchpad.net/bugs/1168350/+attachment/3642038/+files/apport.linux-image-3.2.0-40-virtual.9wl95P.apport
2013-04-12 10:30:24 Brad Figg linux (Ubuntu): status New Incomplete
2013-04-12 11:46:50 Munehisa Kamata linux (Ubuntu): status Incomplete Confirmed
2013-04-12 16:52:11 Joseph Salisbury linux (Ubuntu): importance Undecided Medium
2013-04-12 16:58:09 Joseph Salisbury tags precise bot-stop-nagging kernel-da-key precise
2013-04-12 18:24:38 Joseph Salisbury nominated for series Ubuntu Precise
2013-04-12 18:24:38 Joseph Salisbury bug task added linux (Ubuntu Precise)
2013-04-12 18:24:45 Joseph Salisbury linux (Ubuntu Precise): status New Confirmed
2013-04-12 18:24:48 Joseph Salisbury linux (Ubuntu Precise): importance Undecided Medium
2013-04-18 14:11:05 Ben Howard bug added subscriber Antonio Rosales
2013-04-18 14:11:24 Ben Howard bug added subscriber Ben Howard
2013-04-23 08:29:17 Stefan Bader linux (Ubuntu): status Confirmed Fix Released
2013-04-23 08:29:23 Stefan Bader linux (Ubuntu Precise): assignee Stefan Bader (stefan-bader-canonical)
2013-04-23 08:29:34 Stefan Bader linux (Ubuntu Precise): status Confirmed In Progress
2013-05-01 19:56:20 Cristian Gafton bug added subscriber Cristian Gafton
2013-05-08 14:03:24 Stefan Bader description The arch_trigger_all_cpu_backtrace() tries to send NMI to all CPUs via IPI for getting stacktraces from them. But NMI vector is not implemented on virtualized environment(Xen PV) and the function results in Oops. [4746854.099062] INFO: rcu_sched detected stall on CPU 3 (t=15001 jiffies) [4746854.099091] BUG: unable to handle kernel paging request at ffffffffff5fb310 [4746854.099100] IP: [<ffffffff81037cf8>] flat_send_IPI_all+0x98/0xd0 [4746854.099116] PGD 1c07067 PUD 1c08067 PMD 1dd4067 PTE 0 [4746854.099126] Oops: 0002 [#1] SMP [4746854.099134] CPU 3 [4746854.099137] Modules linked in: stallmod(O+) isofs acpiphp [4746854.099150] [4746854.099157] Pid: 4752, comm: insmod Tainted: G O 3.2.0-40-virtual #64-Ubuntu [4746854.099174] RIP: e030:[<ffffffff81037cf8>] [<ffffffff81037cf8>] flat_send_IPI_all+0x98/0xd0 [4746854.099189] RSP: e02b:ffff8803bfd83c68 EFLAGS: 00010046 [4746854.099198] RAX: 0000000000000000 RBX: ffffffff81cd0060 RCX: 000000000003ffff [4746854.099208] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002 [4746854.099219] RBP: ffff8803bfd83c88 R08: 000000000003ffff R09: 0000000000000000 [4746854.099229] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000800 [4746854.099240] R13: 000000000f000000 R14: ffff8803bfd8e700 R15: 0000000000000000 [4746854.099256] FS: 00007f456d441700(0000) GS:ffff8803bfd80000(0000) knlGS:0000000000000000 [4746854.099270] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [4746854.099279] CR2: ffffffffff5fb310 CR3: 00000003a4180000 CR4: 0000000000002660 [4746854.099290] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [4746854.099301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [4746854.099312] Process insmod (pid: 4752, threadinfo ffff8803a48d4000, task ffff8803a6b5c4a0) [4746854.099323] Stack: [4746854.099328] 0000000000000000 0000000000002710 ffffffff81c31000 ffffffff81c31100 [4746854.099346] ffff8803bfd83ca8 ffffffff8103333a ffff8803a6e17b00 ffffffff81c31000 [4746854.099363] ffff8803bfd83cc8 ffffffff810df347 ffff8803bfd8e250 ffff8803bfd8eb80 [4746854.099382] Call Trace: [4746854.099387] <IRQ> [4746854.099401] [<ffffffff8103333a>] arch_trigger_all_cpu_backtrace+0x5a/0x90 [4746854.099416] [<ffffffff810df347>] check_cpu_stall.isra.35+0x97/0xf0 [4746854.099429] [<ffffffff810df3d8>] __rcu_pending+0x38/0x1d0 [4746854.099439] [<ffffffff810df869>] rcu_check_callbacks+0x79/0x1e0 [4746854.099453] [<ffffffff81078098>] update_process_times+0x48/0x90 [4746854.099466] [<ffffffff8109b864>] tick_sched_timer+0x64/0xc0 [4746854.099480] [<ffffffff8108dfe8>] __run_hrtimer+0x78/0x1f0 [4746854.099491] [<ffffffff8109b800>] ? tick_nohz_handler+0x100/0x100 [4746854.099506] [<ffffffff8105e748>] ? load_balance+0x78/0x370 [4746854.099520] [<ffffffff8108e917>] hrtimer_interrupt+0xf7/0x230 [4746854.099535] [<ffffffff8100a817>] xen_timer_interrupt+0x27/0x40 [4746854.099547] [<ffffffff810d7bb5>] handle_irq_event_percpu+0x55/0x210 [4746854.099561] [<ffffffff813a6f7e>] ? info_for_irq+0xe/0x30 [4746854.099572] [<ffffffff810dae67>] handle_percpu_irq+0x47/0x60 [4746854.099583] [<ffffffff813a6de9>] __xen_evtchn_do_upcall+0x199/0x250 [4746854.099596] [<ffffffff813a8ecf>] xen_evtchn_do_upcall+0x2f/0x50 [4746854.099610] [<ffffffff81661b7e>] xen_do_hypervisor_callback+0x1e/0x30 [4746854.099619] <EOI> [4746854.099632] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [4746854.099645] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [4746854.099659] [<ffffffff813a757e>] ? xen_poll_irq_timeout+0x3e/0x50 [4746854.099671] [<ffffffff813a9060>] ? xen_poll_irq+0x10/0x20 [4746854.099683] [<ffffffff8163c200>] ? xen_spin_lock_slow+0x97/0xf2 [4746854.099695] [<ffffffffa000c000>] ? 0xffffffffa000bfff [4746854.099709] [<ffffffff810121da>] ? xen_spin_lock+0x4a/0x50 [4746854.099722] [<ffffffff816572ce>] ? _raw_spin_lock+0xe/0x20 [4746854.099734] [<ffffffffa000702b>] ? stall+0x2b/0x44 [stallmod] [4746854.099746] [<ffffffffa000c009>] ? init_module+0x9/0x1000 [stallmod] [4746854.099758] [<ffffffff81002040>] ? do_one_initcall+0x40/0x180 [4746854.099771] [<ffffffff810a7abe>] ? sys_init_module+0xbe/0x230 [4746854.099783] [<ffffffff8165f8c2>] ? system_call_fastpath+0x16/0x1b In this case, the function is invoked by RCU based stall detector when it detects stalled CPU(i.e. lockup) in an interrupt context. Oops in an interrupt context always causes a kernel panic, so this bug sometimes makes debugging a kernel lockup issue difficult. The function is also invoked from sysrq_handle_showallcpus() that is for getting traces from all active CPUs anytime we want. # echo l > /pros/sysrq-trigger This is the easiest way to reproduce this. [How to fix] As far as I see, one possible solution is to backport the following patch. This patch is already included in Quantal's kernel. http://lists.xen.org/archives/html/xen-devel/2012-04/msg01023.html Another solution is to disable arch_trigger_all_cpu_backtrace() at compile time but I'm still investigating what config is for that. If you need any other information, please feel free to ask me. SRU Justification: Impact: The arch_trigger_all_cpu_backtrace tries to notify all other cpus via ipi. For that it looks up an ipi hook from the apic structure without verifying whether that pointer is NULL or not. Fix: Upstream fixed this by implementing the apic IPI hooks interface. Although some pieces seem to be unclear, this is not changed in upstream kernels since then. So either it does not matter or those pieces are not used. So for now backport the patch introducing the apic interface from upstream (only dropping one unnecessary declaration). This only affects PVM as HVM emulates flat apic completely. Testcase: Cause a call to arch_trigger_all_cpu_backtrace (Munehisa, can you provide a simple trigger?). --- The arch_trigger_all_cpu_backtrace() tries to send NMI to all CPUs via IPI for getting stacktraces from them. But NMI vector is not implemented on virtualized environment(Xen PV) and the function results in Oops. [4746854.099062] INFO: rcu_sched detected stall on CPU 3 (t=15001 jiffies) [4746854.099091] BUG: unable to handle kernel paging request at ffffffffff5fb310 [4746854.099100] IP: [<ffffffff81037cf8>] flat_send_IPI_all+0x98/0xd0 [4746854.099116] PGD 1c07067 PUD 1c08067 PMD 1dd4067 PTE 0 [4746854.099126] Oops: 0002 [#1] SMP [4746854.099134] CPU 3 [4746854.099137] Modules linked in: stallmod(O+) isofs acpiphp [4746854.099150] [4746854.099157] Pid: 4752, comm: insmod Tainted: G O 3.2.0-40-virtual #64-Ubuntu [4746854.099174] RIP: e030:[<ffffffff81037cf8>] [<ffffffff81037cf8>] flat_send_IPI_all+0x98/0xd0 [4746854.099189] RSP: e02b:ffff8803bfd83c68 EFLAGS: 00010046 [4746854.099198] RAX: 0000000000000000 RBX: ffffffff81cd0060 RCX: 000000000003ffff [4746854.099208] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002 [4746854.099219] RBP: ffff8803bfd83c88 R08: 000000000003ffff R09: 0000000000000000 [4746854.099229] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000800 [4746854.099240] R13: 000000000f000000 R14: ffff8803bfd8e700 R15: 0000000000000000 [4746854.099256] FS: 00007f456d441700(0000) GS:ffff8803bfd80000(0000) knlGS:0000000000000000 [4746854.099270] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [4746854.099279] CR2: ffffffffff5fb310 CR3: 00000003a4180000 CR4: 0000000000002660 [4746854.099290] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [4746854.099301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [4746854.099312] Process insmod (pid: 4752, threadinfo ffff8803a48d4000, task ffff8803a6b5c4a0) [4746854.099323] Stack: [4746854.099328] 0000000000000000 0000000000002710 ffffffff81c31000 ffffffff81c31100 [4746854.099346] ffff8803bfd83ca8 ffffffff8103333a ffff8803a6e17b00 ffffffff81c31000 [4746854.099363] ffff8803bfd83cc8 ffffffff810df347 ffff8803bfd8e250 ffff8803bfd8eb80 [4746854.099382] Call Trace: [4746854.099387] <IRQ> [4746854.099401] [<ffffffff8103333a>] arch_trigger_all_cpu_backtrace+0x5a/0x90 [4746854.099416] [<ffffffff810df347>] check_cpu_stall.isra.35+0x97/0xf0 [4746854.099429] [<ffffffff810df3d8>] __rcu_pending+0x38/0x1d0 [4746854.099439] [<ffffffff810df869>] rcu_check_callbacks+0x79/0x1e0 [4746854.099453] [<ffffffff81078098>] update_process_times+0x48/0x90 [4746854.099466] [<ffffffff8109b864>] tick_sched_timer+0x64/0xc0 [4746854.099480] [<ffffffff8108dfe8>] __run_hrtimer+0x78/0x1f0 [4746854.099491] [<ffffffff8109b800>] ? tick_nohz_handler+0x100/0x100 [4746854.099506] [<ffffffff8105e748>] ? load_balance+0x78/0x370 [4746854.099520] [<ffffffff8108e917>] hrtimer_interrupt+0xf7/0x230 [4746854.099535] [<ffffffff8100a817>] xen_timer_interrupt+0x27/0x40 [4746854.099547] [<ffffffff810d7bb5>] handle_irq_event_percpu+0x55/0x210 [4746854.099561] [<ffffffff813a6f7e>] ? info_for_irq+0xe/0x30 [4746854.099572] [<ffffffff810dae67>] handle_percpu_irq+0x47/0x60 [4746854.099583] [<ffffffff813a6de9>] __xen_evtchn_do_upcall+0x199/0x250 [4746854.099596] [<ffffffff813a8ecf>] xen_evtchn_do_upcall+0x2f/0x50 [4746854.099610] [<ffffffff81661b7e>] xen_do_hypervisor_callback+0x1e/0x30 [4746854.099619] <EOI> [4746854.099632] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [4746854.099645] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [4746854.099659] [<ffffffff813a757e>] ? xen_poll_irq_timeout+0x3e/0x50 [4746854.099671] [<ffffffff813a9060>] ? xen_poll_irq+0x10/0x20 [4746854.099683] [<ffffffff8163c200>] ? xen_spin_lock_slow+0x97/0xf2 [4746854.099695] [<ffffffffa000c000>] ? 0xffffffffa000bfff [4746854.099709] [<ffffffff810121da>] ? xen_spin_lock+0x4a/0x50 [4746854.099722] [<ffffffff816572ce>] ? _raw_spin_lock+0xe/0x20 [4746854.099734] [<ffffffffa000702b>] ? stall+0x2b/0x44 [stallmod] [4746854.099746] [<ffffffffa000c009>] ? init_module+0x9/0x1000 [stallmod] [4746854.099758] [<ffffffff81002040>] ? do_one_initcall+0x40/0x180 [4746854.099771] [<ffffffff810a7abe>] ? sys_init_module+0xbe/0x230 [4746854.099783] [<ffffffff8165f8c2>] ? system_call_fastpath+0x16/0x1b In this case, the function is invoked by RCU based stall detector when it detects stalled CPU(i.e. lockup) in an interrupt context. Oops in an interrupt context always causes a kernel panic, so this bug sometimes makes debugging a kernel lockup issue difficult. The function is also invoked from sysrq_handle_showallcpus() that is for getting traces from all active CPUs anytime we want.  # echo l > /pros/sysrq-trigger This is the easiest way to reproduce this. [How to fix] As far as I see, one possible solution is to backport the following patch. This patch is already included in Quantal's kernel.  http://lists.xen.org/archives/html/xen-devel/2012-04/msg01023.html Another solution is to disable arch_trigger_all_cpu_backtrace() at compile time but I'm still investigating what config is for that. If you need any other information, please feel free to ask me.
2013-05-08 16:31:59 Munehisa Kamata description SRU Justification: Impact: The arch_trigger_all_cpu_backtrace tries to notify all other cpus via ipi. For that it looks up an ipi hook from the apic structure without verifying whether that pointer is NULL or not. Fix: Upstream fixed this by implementing the apic IPI hooks interface. Although some pieces seem to be unclear, this is not changed in upstream kernels since then. So either it does not matter or those pieces are not used. So for now backport the patch introducing the apic interface from upstream (only dropping one unnecessary declaration). This only affects PVM as HVM emulates flat apic completely. Testcase: Cause a call to arch_trigger_all_cpu_backtrace (Munehisa, can you provide a simple trigger?). --- The arch_trigger_all_cpu_backtrace() tries to send NMI to all CPUs via IPI for getting stacktraces from them. But NMI vector is not implemented on virtualized environment(Xen PV) and the function results in Oops. [4746854.099062] INFO: rcu_sched detected stall on CPU 3 (t=15001 jiffies) [4746854.099091] BUG: unable to handle kernel paging request at ffffffffff5fb310 [4746854.099100] IP: [<ffffffff81037cf8>] flat_send_IPI_all+0x98/0xd0 [4746854.099116] PGD 1c07067 PUD 1c08067 PMD 1dd4067 PTE 0 [4746854.099126] Oops: 0002 [#1] SMP [4746854.099134] CPU 3 [4746854.099137] Modules linked in: stallmod(O+) isofs acpiphp [4746854.099150] [4746854.099157] Pid: 4752, comm: insmod Tainted: G O 3.2.0-40-virtual #64-Ubuntu [4746854.099174] RIP: e030:[<ffffffff81037cf8>] [<ffffffff81037cf8>] flat_send_IPI_all+0x98/0xd0 [4746854.099189] RSP: e02b:ffff8803bfd83c68 EFLAGS: 00010046 [4746854.099198] RAX: 0000000000000000 RBX: ffffffff81cd0060 RCX: 000000000003ffff [4746854.099208] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002 [4746854.099219] RBP: ffff8803bfd83c88 R08: 000000000003ffff R09: 0000000000000000 [4746854.099229] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000800 [4746854.099240] R13: 000000000f000000 R14: ffff8803bfd8e700 R15: 0000000000000000 [4746854.099256] FS: 00007f456d441700(0000) GS:ffff8803bfd80000(0000) knlGS:0000000000000000 [4746854.099270] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [4746854.099279] CR2: ffffffffff5fb310 CR3: 00000003a4180000 CR4: 0000000000002660 [4746854.099290] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [4746854.099301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [4746854.099312] Process insmod (pid: 4752, threadinfo ffff8803a48d4000, task ffff8803a6b5c4a0) [4746854.099323] Stack: [4746854.099328] 0000000000000000 0000000000002710 ffffffff81c31000 ffffffff81c31100 [4746854.099346] ffff8803bfd83ca8 ffffffff8103333a ffff8803a6e17b00 ffffffff81c31000 [4746854.099363] ffff8803bfd83cc8 ffffffff810df347 ffff8803bfd8e250 ffff8803bfd8eb80 [4746854.099382] Call Trace: [4746854.099387] <IRQ> [4746854.099401] [<ffffffff8103333a>] arch_trigger_all_cpu_backtrace+0x5a/0x90 [4746854.099416] [<ffffffff810df347>] check_cpu_stall.isra.35+0x97/0xf0 [4746854.099429] [<ffffffff810df3d8>] __rcu_pending+0x38/0x1d0 [4746854.099439] [<ffffffff810df869>] rcu_check_callbacks+0x79/0x1e0 [4746854.099453] [<ffffffff81078098>] update_process_times+0x48/0x90 [4746854.099466] [<ffffffff8109b864>] tick_sched_timer+0x64/0xc0 [4746854.099480] [<ffffffff8108dfe8>] __run_hrtimer+0x78/0x1f0 [4746854.099491] [<ffffffff8109b800>] ? tick_nohz_handler+0x100/0x100 [4746854.099506] [<ffffffff8105e748>] ? load_balance+0x78/0x370 [4746854.099520] [<ffffffff8108e917>] hrtimer_interrupt+0xf7/0x230 [4746854.099535] [<ffffffff8100a817>] xen_timer_interrupt+0x27/0x40 [4746854.099547] [<ffffffff810d7bb5>] handle_irq_event_percpu+0x55/0x210 [4746854.099561] [<ffffffff813a6f7e>] ? info_for_irq+0xe/0x30 [4746854.099572] [<ffffffff810dae67>] handle_percpu_irq+0x47/0x60 [4746854.099583] [<ffffffff813a6de9>] __xen_evtchn_do_upcall+0x199/0x250 [4746854.099596] [<ffffffff813a8ecf>] xen_evtchn_do_upcall+0x2f/0x50 [4746854.099610] [<ffffffff81661b7e>] xen_do_hypervisor_callback+0x1e/0x30 [4746854.099619] <EOI> [4746854.099632] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [4746854.099645] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [4746854.099659] [<ffffffff813a757e>] ? xen_poll_irq_timeout+0x3e/0x50 [4746854.099671] [<ffffffff813a9060>] ? xen_poll_irq+0x10/0x20 [4746854.099683] [<ffffffff8163c200>] ? xen_spin_lock_slow+0x97/0xf2 [4746854.099695] [<ffffffffa000c000>] ? 0xffffffffa000bfff [4746854.099709] [<ffffffff810121da>] ? xen_spin_lock+0x4a/0x50 [4746854.099722] [<ffffffff816572ce>] ? _raw_spin_lock+0xe/0x20 [4746854.099734] [<ffffffffa000702b>] ? stall+0x2b/0x44 [stallmod] [4746854.099746] [<ffffffffa000c009>] ? init_module+0x9/0x1000 [stallmod] [4746854.099758] [<ffffffff81002040>] ? do_one_initcall+0x40/0x180 [4746854.099771] [<ffffffff810a7abe>] ? sys_init_module+0xbe/0x230 [4746854.099783] [<ffffffff8165f8c2>] ? system_call_fastpath+0x16/0x1b In this case, the function is invoked by RCU based stall detector when it detects stalled CPU(i.e. lockup) in an interrupt context. Oops in an interrupt context always causes a kernel panic, so this bug sometimes makes debugging a kernel lockup issue difficult. The function is also invoked from sysrq_handle_showallcpus() that is for getting traces from all active CPUs anytime we want.  # echo l > /pros/sysrq-trigger This is the easiest way to reproduce this. [How to fix] As far as I see, one possible solution is to backport the following patch. This patch is already included in Quantal's kernel.  http://lists.xen.org/archives/html/xen-devel/2012-04/msg01023.html Another solution is to disable arch_trigger_all_cpu_backtrace() at compile time but I'm still investigating what config is for that. If you need any other information, please feel free to ask me. SRU Justification: Impact: The arch_trigger_all_cpu_backtrace tries to notify all other cpus via ipi. For that it looks up an ipi hook from the apic structure without verifying whether that pointer is NULL or not. Fix: Upstream fixed this by implementing the apic IPI hooks interface. Although some pieces seem to be unclear, this is not changed in upstream kernels since then. So either it does not matter or those pieces are not used. So for now backport the patch introducing the apic interface from upstream (only dropping one unnecessary declaration). This only affects PVM as HVM emulates flat apic completely. Testcase: To cause a call to arch_trigger_all_cpu_backtrace by: # echo l > /proc/sysrq-trigger --- The arch_trigger_all_cpu_backtrace() tries to send NMI to all CPUs via IPI for getting stacktraces from them. But NMI vector is not implemented on virtualized environment(Xen PV) and the function results in Oops. [4746854.099062] INFO: rcu_sched detected stall on CPU 3 (t=15001 jiffies) [4746854.099091] BUG: unable to handle kernel paging request at ffffffffff5fb310 [4746854.099100] IP: [<ffffffff81037cf8>] flat_send_IPI_all+0x98/0xd0 [4746854.099116] PGD 1c07067 PUD 1c08067 PMD 1dd4067 PTE 0 [4746854.099126] Oops: 0002 [#1] SMP [4746854.099134] CPU 3 [4746854.099137] Modules linked in: stallmod(O+) isofs acpiphp [4746854.099150] [4746854.099157] Pid: 4752, comm: insmod Tainted: G O 3.2.0-40-virtual #64-Ubuntu [4746854.099174] RIP: e030:[<ffffffff81037cf8>] [<ffffffff81037cf8>] flat_send_IPI_all+0x98/0xd0 [4746854.099189] RSP: e02b:ffff8803bfd83c68 EFLAGS: 00010046 [4746854.099198] RAX: 0000000000000000 RBX: ffffffff81cd0060 RCX: 000000000003ffff [4746854.099208] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002 [4746854.099219] RBP: ffff8803bfd83c88 R08: 000000000003ffff R09: 0000000000000000 [4746854.099229] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000800 [4746854.099240] R13: 000000000f000000 R14: ffff8803bfd8e700 R15: 0000000000000000 [4746854.099256] FS: 00007f456d441700(0000) GS:ffff8803bfd80000(0000) knlGS:0000000000000000 [4746854.099270] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b [4746854.099279] CR2: ffffffffff5fb310 CR3: 00000003a4180000 CR4: 0000000000002660 [4746854.099290] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [4746854.099301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [4746854.099312] Process insmod (pid: 4752, threadinfo ffff8803a48d4000, task ffff8803a6b5c4a0) [4746854.099323] Stack: [4746854.099328] 0000000000000000 0000000000002710 ffffffff81c31000 ffffffff81c31100 [4746854.099346] ffff8803bfd83ca8 ffffffff8103333a ffff8803a6e17b00 ffffffff81c31000 [4746854.099363] ffff8803bfd83cc8 ffffffff810df347 ffff8803bfd8e250 ffff8803bfd8eb80 [4746854.099382] Call Trace: [4746854.099387] <IRQ> [4746854.099401] [<ffffffff8103333a>] arch_trigger_all_cpu_backtrace+0x5a/0x90 [4746854.099416] [<ffffffff810df347>] check_cpu_stall.isra.35+0x97/0xf0 [4746854.099429] [<ffffffff810df3d8>] __rcu_pending+0x38/0x1d0 [4746854.099439] [<ffffffff810df869>] rcu_check_callbacks+0x79/0x1e0 [4746854.099453] [<ffffffff81078098>] update_process_times+0x48/0x90 [4746854.099466] [<ffffffff8109b864>] tick_sched_timer+0x64/0xc0 [4746854.099480] [<ffffffff8108dfe8>] __run_hrtimer+0x78/0x1f0 [4746854.099491] [<ffffffff8109b800>] ? tick_nohz_handler+0x100/0x100 [4746854.099506] [<ffffffff8105e748>] ? load_balance+0x78/0x370 [4746854.099520] [<ffffffff8108e917>] hrtimer_interrupt+0xf7/0x230 [4746854.099535] [<ffffffff8100a817>] xen_timer_interrupt+0x27/0x40 [4746854.099547] [<ffffffff810d7bb5>] handle_irq_event_percpu+0x55/0x210 [4746854.099561] [<ffffffff813a6f7e>] ? info_for_irq+0xe/0x30 [4746854.099572] [<ffffffff810dae67>] handle_percpu_irq+0x47/0x60 [4746854.099583] [<ffffffff813a6de9>] __xen_evtchn_do_upcall+0x199/0x250 [4746854.099596] [<ffffffff813a8ecf>] xen_evtchn_do_upcall+0x2f/0x50 [4746854.099610] [<ffffffff81661b7e>] xen_do_hypervisor_callback+0x1e/0x30 [4746854.099619] <EOI> [4746854.099632] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [4746854.099645] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000 [4746854.099659] [<ffffffff813a757e>] ? xen_poll_irq_timeout+0x3e/0x50 [4746854.099671] [<ffffffff813a9060>] ? xen_poll_irq+0x10/0x20 [4746854.099683] [<ffffffff8163c200>] ? xen_spin_lock_slow+0x97/0xf2 [4746854.099695] [<ffffffffa000c000>] ? 0xffffffffa000bfff [4746854.099709] [<ffffffff810121da>] ? xen_spin_lock+0x4a/0x50 [4746854.099722] [<ffffffff816572ce>] ? _raw_spin_lock+0xe/0x20 [4746854.099734] [<ffffffffa000702b>] ? stall+0x2b/0x44 [stallmod] [4746854.099746] [<ffffffffa000c009>] ? init_module+0x9/0x1000 [stallmod] [4746854.099758] [<ffffffff81002040>] ? do_one_initcall+0x40/0x180 [4746854.099771] [<ffffffff810a7abe>] ? sys_init_module+0xbe/0x230 [4746854.099783] [<ffffffff8165f8c2>] ? system_call_fastpath+0x16/0x1b In this case, the function is invoked by RCU based stall detector when it detects stalled CPU(i.e. lockup) in an interrupt context. Oops in an interrupt context always causes a kernel panic, so this bug sometimes makes debugging a kernel lockup issue difficult. The function is also invoked from sysrq_handle_showallcpus() that is for getting traces from all active CPUs anytime we want.  # echo l > /pros/sysrq-trigger This is the easiest way to reproduce this. [How to fix] As far as I see, one possible solution is to backport the following patch. This patch is already included in Quantal's kernel.  http://lists.xen.org/archives/html/xen-devel/2012-04/msg01023.html Another solution is to disable arch_trigger_all_cpu_backtrace() at compile time but I'm still investigating what config is for that. If you need any other information, please feel free to ask me.
2013-05-09 14:34:26 Tim Gardner linux (Ubuntu Precise): status In Progress Fix Committed
2013-06-04 15:22:42 Brad Figg tags bot-stop-nagging kernel-da-key precise bot-stop-nagging kernel-da-key precise verification-needed-precise
2013-06-05 16:04:05 Steve Conklin tags bot-stop-nagging kernel-da-key precise verification-needed-precise bot-stop-nagging kernel-da-key precise verification-done-precise
2013-06-13 18:22:10 Launchpad Janitor linux (Ubuntu Precise): status Fix Committed Fix Released
2013-06-13 18:22:10 Launchpad Janitor cve linked 2013-3076
2013-06-13 18:22:10 Launchpad Janitor cve linked 2013-3222
2013-06-13 18:22:10 Launchpad Janitor cve linked 2013-3223
2013-06-13 18:22:10 Launchpad Janitor cve linked 2013-3224
2013-06-13 18:22:10 Launchpad Janitor cve linked 2013-3225
2013-06-13 18:22:10 Launchpad Janitor cve linked 2013-3234
2013-06-13 18:22:10 Launchpad Janitor cve linked 2013-3235