arch_trigger_all_cpu_backtrace() results in Oops on virtualized guest
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Precise |
Fix Released
|
Medium
|
Stefan Bader |
Bug Description
SRU Justification:
Impact: The arch_trigger_
Fix: Upstream fixed this by implementing the apic IPI hooks interface. Although some pieces seem to be unclear, this is not changed in upstream kernels since then. So either it does not matter or those pieces are not used. So for now backport the patch introducing the apic interface from upstream (only dropping one unnecessary declaration). This only affects PVM as HVM emulates flat apic completely.
Testcase: To cause a call to arch_trigger_
# echo l > /proc/sysrq-trigger
---
The arch_trigger_
[4746854.099062] INFO: rcu_sched detected stall on CPU 3 (t=15001 jiffies)
[4746854.099091] BUG: unable to handle kernel paging request at ffffffffff5fb310
[4746854.099100] IP: [<ffffffff81037
[4746854.099116] PGD 1c07067 PUD 1c08067 PMD 1dd4067 PTE 0
[4746854.099126] Oops: 0002 [#1] SMP
[4746854.099134] CPU 3
[4746854.099137] Modules linked in: stallmod(O+) isofs acpiphp
[4746854.099150]
[4746854.099157] Pid: 4752, comm: insmod Tainted: G O 3.2.0-40-virtual #64-Ubuntu
[4746854.099174] RIP: e030:[<
[4746854.099189] RSP: e02b:ffff8803bf
[4746854.099198] RAX: 0000000000000000 RBX: ffffffff81cd0060 RCX: 000000000003ffff
[4746854.099208] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002
[4746854.099219] RBP: ffff8803bfd83c88 R08: 000000000003ffff R09: 0000000000000000
[4746854.099229] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000800
[4746854.099240] R13: 000000000f000000 R14: ffff8803bfd8e700 R15: 0000000000000000
[4746854.099256] FS: 00007f456d44170
[4746854.099270] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[4746854.099279] CR2: ffffffffff5fb310 CR3: 00000003a4180000 CR4: 0000000000002660
[4746854.099290] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[4746854.099301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[4746854.099312] Process insmod (pid: 4752, threadinfo ffff8803a48d4000, task ffff8803a6b5c4a0)
[4746854.099323] Stack:
[4746854.099328] 0000000000000000 0000000000002710 ffffffff81c31000 ffffffff81c31100
[4746854.099346] ffff8803bfd83ca8 ffffffff8103333a ffff8803a6e17b00 ffffffff81c31000
[4746854.099363] ffff8803bfd83cc8 ffffffff810df347 ffff8803bfd8e250 ffff8803bfd8eb80
[4746854.099382] Call Trace:
[4746854.099387] <IRQ>
[4746854.099401] [<ffffffff81033
[4746854.099416] [<ffffffff810df
[4746854.099429] [<ffffffff810df
[4746854.099439] [<ffffffff810df
[4746854.099453] [<ffffffff81078
[4746854.099466] [<ffffffff8109b
[4746854.099480] [<ffffffff8108d
[4746854.099491] [<ffffffff8109b
[4746854.099506] [<ffffffff8105e
[4746854.099520] [<ffffffff8108e
[4746854.099535] [<ffffffff8100a
[4746854.099547] [<ffffffff810d7
[4746854.099561] [<ffffffff813a6
[4746854.099572] [<ffffffff810da
[4746854.099583] [<ffffffff813a6
[4746854.099596] [<ffffffff813a8
[4746854.099610] [<ffffffff81661
[4746854.099619] <EOI>
[4746854.099632] [<ffffffff81001
[4746854.099645] [<ffffffff81001
[4746854.099659] [<ffffffff813a7
[4746854.099671] [<ffffffff813a9
[4746854.099683] [<ffffffff8163c
[4746854.099695] [<ffffffffa000c
[4746854.099709] [<ffffffff81012
[4746854.099722] [<ffffffff81657
[4746854.099734] [<ffffffffa0007
[4746854.099746] [<ffffffffa000c
[4746854.099758] [<ffffffff81002
[4746854.099771] [<ffffffff810a7
[4746854.099783] [<ffffffff8165f
In this case, the function is invoked by RCU based stall detector when it detects stalled CPU(i.e. lockup) in an interrupt context.
Oops in an interrupt context always causes a kernel panic, so this bug sometimes makes debugging a kernel lockup issue difficult.
The function is also invoked from sysrq_handle_
# echo l > /pros/sysrq-trigger
This is the easiest way to reproduce this.
[How to fix]
As far as I see, one possible solution is to backport the following patch. This patch is already included in Quantal's kernel.
http://
Another solution is to disable arch_trigger_
If you need any other information, please feel free to ask me.
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
tags: | added: bot-stop-nagging kernel-da-key |
Changed in linux (Ubuntu Precise): | |
status: | New → Confirmed |
importance: | Undecided → Medium |
Changed in linux (Ubuntu): | |
status: | Confirmed → Fix Released |
Changed in linux (Ubuntu Precise): | |
assignee: | nobody → Stefan Bader (stefan-bader-canonical) |
status: | Confirmed → In Progress |
description: | updated |
Changed in linux (Ubuntu Precise): | |
status: | In Progress → Fix Committed |
tags: |
added: verification-done-precise removed: verification-needed-precise |
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1168350
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.