arch_trigger_all_cpu_backtrace() results in Oops on virtualized guest

Bug #1168350 reported by Munehisa Kamata
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned
Precise
Fix Released
Medium
Stefan Bader

Bug Description

SRU Justification:

Impact: The arch_trigger_all_cpu_backtrace tries to notify all other cpus via ipi. For that it looks up an ipi hook from the apic structure without verifying whether that pointer is NULL or not.

Fix: Upstream fixed this by implementing the apic IPI hooks interface. Although some pieces seem to be unclear, this is not changed in upstream kernels since then. So either it does not matter or those pieces are not used. So for now backport the patch introducing the apic interface from upstream (only dropping one unnecessary declaration). This only affects PVM as HVM emulates flat apic completely.

Testcase: To cause a call to arch_trigger_all_cpu_backtrace by:

  # echo l > /proc/sysrq-trigger

---

The arch_trigger_all_cpu_backtrace() tries to send NMI to all CPUs via IPI for getting stacktraces from them. But NMI vector is not implemented on virtualized environment(Xen PV) and the function results in Oops.

[4746854.099062] INFO: rcu_sched detected stall on CPU 3 (t=15001 jiffies)
[4746854.099091] BUG: unable to handle kernel paging request at ffffffffff5fb310
[4746854.099100] IP: [<ffffffff81037cf8>] flat_send_IPI_all+0x98/0xd0
[4746854.099116] PGD 1c07067 PUD 1c08067 PMD 1dd4067 PTE 0
[4746854.099126] Oops: 0002 [#1] SMP
[4746854.099134] CPU 3
[4746854.099137] Modules linked in: stallmod(O+) isofs acpiphp
[4746854.099150]
[4746854.099157] Pid: 4752, comm: insmod Tainted: G O 3.2.0-40-virtual #64-Ubuntu
[4746854.099174] RIP: e030:[<ffffffff81037cf8>] [<ffffffff81037cf8>] flat_send_IPI_all+0x98/0xd0
[4746854.099189] RSP: e02b:ffff8803bfd83c68 EFLAGS: 00010046
[4746854.099198] RAX: 0000000000000000 RBX: ffffffff81cd0060 RCX: 000000000003ffff
[4746854.099208] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000002
[4746854.099219] RBP: ffff8803bfd83c88 R08: 000000000003ffff R09: 0000000000000000
[4746854.099229] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000800
[4746854.099240] R13: 000000000f000000 R14: ffff8803bfd8e700 R15: 0000000000000000
[4746854.099256] FS: 00007f456d441700(0000) GS:ffff8803bfd80000(0000) knlGS:0000000000000000
[4746854.099270] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
[4746854.099279] CR2: ffffffffff5fb310 CR3: 00000003a4180000 CR4: 0000000000002660
[4746854.099290] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[4746854.099301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[4746854.099312] Process insmod (pid: 4752, threadinfo ffff8803a48d4000, task ffff8803a6b5c4a0)
[4746854.099323] Stack:
[4746854.099328] 0000000000000000 0000000000002710 ffffffff81c31000 ffffffff81c31100
[4746854.099346] ffff8803bfd83ca8 ffffffff8103333a ffff8803a6e17b00 ffffffff81c31000
[4746854.099363] ffff8803bfd83cc8 ffffffff810df347 ffff8803bfd8e250 ffff8803bfd8eb80
[4746854.099382] Call Trace:
[4746854.099387] <IRQ>
[4746854.099401] [<ffffffff8103333a>] arch_trigger_all_cpu_backtrace+0x5a/0x90
[4746854.099416] [<ffffffff810df347>] check_cpu_stall.isra.35+0x97/0xf0
[4746854.099429] [<ffffffff810df3d8>] __rcu_pending+0x38/0x1d0
[4746854.099439] [<ffffffff810df869>] rcu_check_callbacks+0x79/0x1e0
[4746854.099453] [<ffffffff81078098>] update_process_times+0x48/0x90
[4746854.099466] [<ffffffff8109b864>] tick_sched_timer+0x64/0xc0
[4746854.099480] [<ffffffff8108dfe8>] __run_hrtimer+0x78/0x1f0
[4746854.099491] [<ffffffff8109b800>] ? tick_nohz_handler+0x100/0x100
[4746854.099506] [<ffffffff8105e748>] ? load_balance+0x78/0x370
[4746854.099520] [<ffffffff8108e917>] hrtimer_interrupt+0xf7/0x230
[4746854.099535] [<ffffffff8100a817>] xen_timer_interrupt+0x27/0x40
[4746854.099547] [<ffffffff810d7bb5>] handle_irq_event_percpu+0x55/0x210
[4746854.099561] [<ffffffff813a6f7e>] ? info_for_irq+0xe/0x30
[4746854.099572] [<ffffffff810dae67>] handle_percpu_irq+0x47/0x60
[4746854.099583] [<ffffffff813a6de9>] __xen_evtchn_do_upcall+0x199/0x250
[4746854.099596] [<ffffffff813a8ecf>] xen_evtchn_do_upcall+0x2f/0x50
[4746854.099610] [<ffffffff81661b7e>] xen_do_hypervisor_callback+0x1e/0x30
[4746854.099619] <EOI>
[4746854.099632] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[4746854.099645] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[4746854.099659] [<ffffffff813a757e>] ? xen_poll_irq_timeout+0x3e/0x50
[4746854.099671] [<ffffffff813a9060>] ? xen_poll_irq+0x10/0x20
[4746854.099683] [<ffffffff8163c200>] ? xen_spin_lock_slow+0x97/0xf2
[4746854.099695] [<ffffffffa000c000>] ? 0xffffffffa000bfff
[4746854.099709] [<ffffffff810121da>] ? xen_spin_lock+0x4a/0x50
[4746854.099722] [<ffffffff816572ce>] ? _raw_spin_lock+0xe/0x20
[4746854.099734] [<ffffffffa000702b>] ? stall+0x2b/0x44 [stallmod]
[4746854.099746] [<ffffffffa000c009>] ? init_module+0x9/0x1000 [stallmod]
[4746854.099758] [<ffffffff81002040>] ? do_one_initcall+0x40/0x180
[4746854.099771] [<ffffffff810a7abe>] ? sys_init_module+0xbe/0x230
[4746854.099783] [<ffffffff8165f8c2>] ? system_call_fastpath+0x16/0x1b

In this case, the function is invoked by RCU based stall detector when it detects stalled CPU(i.e. lockup) in an interrupt context.
Oops in an interrupt context always causes a kernel panic, so this bug sometimes makes debugging a kernel lockup issue difficult.

The function is also invoked from sysrq_handle_showallcpus() that is for getting traces from all active CPUs anytime we want.

 # echo l > /pros/sysrq-trigger

This is the easiest way to reproduce this.

[How to fix]
As far as I see, one possible solution is to backport the following patch. This patch is already included in Quantal's kernel.

 http://lists.xen.org/archives/html/xen-devel/2012-04/msg01023.html

Another solution is to disable arch_trigger_all_cpu_backtrace() at compile time but I'm still investigating what config is for that.

If you need any other information, please feel free to ask me.

Revision history for this message
Munehisa Kamata (kamatam-amazon) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1168350

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Munehisa Kamata (kamatam-amazon) wrote :

I attached apport output but it is not meaningful due to the nature of this issue.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: bot-stop-nagging kernel-da-key
Changed in linux (Ubuntu Precise):
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I backported commit f447d56 to Precise and built a test kernel, which can be downloaded from:
http://people.canonical.com/~jsalisbury/lp1168350/

Can you test that kernel and report back if it resolves this bug or not?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Actually, I'm rebuilding the kernels now. They will be posted shortly.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

A test kernel is now available at:

http://people.canonical.com/~jsalisbury/lp1168350/

Can you test that kernel and report back if it resolves this bug or not?

Revision history for this message
Munehisa Kamata (kamatam-amazon) wrote :

Hi Joseph,

It seems that the test kernel resolves this bug.

root@ip-10-121-11-51:~# uname -r
3.2.0-41-generic
root@ip-10-121-11-51:~# dmesg -c > /dev/null
root@ip-10-121-11-51:~# echo l > /proc/sysrq-trigger
root@ip-10-121-11-51:~# dmesg
[4817826.775851] SysRq : Show backtrace of all active CPUs
[4817826.775870] sending NMI to all CPUs:
[4817826.775873] xen: vector 0x2 is not implemented <-------

Same behavior as Quantal's kernel.

Revision history for this message
Munehisa Kamata (kamatam-amazon) wrote :

Just in case, the packages should be linux-image-<version>-virtual, not -generic.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I'm glad to hear that commit fixes this bug. However, there are allot of changes in commit f447d56, so it may not be accepted into a stable kernel.

One other option would be for you to use the Quantal LTS Enablement kernel[0]. Would it be possible for you to test the LTS Enablement kernel to see if its and option for you?

[0] https://wiki.ubuntu.com/KernelTeam/Specs/QuantalLTSEnablementStack

Revision history for this message
Ben Howard (darkmuggle-deactivatedaccount) wrote :

Joseph -- re Comment #9, this stack trace is seen in on the AWS Cloud Images. Per the kernel team's request, the default kernel for the Cloud Images is based on 3.2. Since the majority of our usage for the Cloud Images is on AWS, I think we have a compelling reason to at least heavily consider this change.

Revision history for this message
Munehisa Kamata (kamatam-amazon) wrote :

As Ben mentioned, we found this on EC2 instance and this bug report is intended to 3.2 based LTS kernel.

If smaller and more-specific patch is preferred, please let me know.

Stefan Bader (smb)
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Changed in linux (Ubuntu Precise):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
status: Confirmed → In Progress
Revision history for this message
Stefan Bader (smb) wrote :

Looking at the proposed patch, the main portion is to implement the various IPI functions that get added to the APIC interface. There is on bit in xen_send_IPI_mask_allbutself() which I don't understand, but that is still unchanged in upstream code (I will ask about that on xen-devel). But otherwise either not many other places seem to use the IPI indirection via the APIC structure. Or handle NULL pointers more gracefully. There does not seem to be any related fixup changes in the kernel up today either. So I would think it should be safe for acceptance.

Just as a note, the proposed patch was part 1 of a bundle of two which was claimed to fix a certain hang with perf top. If this is observed on Precise EC2 kernels we should probably track it in another bug report. For that patch, there actually was a fixup, so that would be the following two:
- 1ff2b0c303698e486f1e0886b4d9876200ef8ca5
  xen: implement IRQ_WORK_VECTOR handler
- 2f1bd67d544d3c086fb5101513f4b6c8f4291b43
  xen/smp: unbind irqworkX when unplugging vCPUs

Stefan Bader (smb)
description: updated
Revision history for this message
Munehisa Kamata (kamatam-amazon) wrote :

I've updated the description with the simplest testcase.

description: updated
Revision history for this message
Stefan Bader (smb) wrote :

Thanks. Quickly trying this here with just that one patch applied does not cause an oops, but neither does it seem to produce the requested backtrace. Is that sufficient/expected?

Revision history for this message
Munehisa Kamata (kamatam-amazon) wrote :

Sounds like that is expected behavior. Sysrq 'l' shows stacktraces only for active CPUs. So it does not show anything if your system do nothing.

Revision history for this message
Stefan Bader (smb) wrote :

Actually it seems that no IPI actually was sent. In my local PV domUs as well as on a EC2 64bit micro instance, it looks like the problem does not exist. In dmesg I see:

[ 0.000000] No local APIC present
[ 0.000000] APIC: disable apic facility
[ 0.000000] APIC: switched to apic NOOP

And verifying with a coredump locally, all apic elements are set to the noop_apic. The only PV domU this seems to be relevant is dom0. Was this for dom0 or do I miss something?

Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Precise):
status: In Progress → Fix Committed
Revision history for this message
Stefan Bader (smb) wrote :

Ok, I can verify this on dom0. Though right now, this results in "vector 0x2 not implemented" (this is the NMI_VECTOR). And that is not implemented yet in upstream either. So it does not get you the stack traces but at least prevents the oops.
On another note, the two additional patches I mentioned in comment #12 would fix the "perf top" usecase in dom0. Right now I get "vector 0xf6 not implemented" . But that probably should go into a different bug.

Revision history for this message
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed' to 'verification-done'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-precise
Steve Conklin (sconklin)
tags: added: verification-done-precise
removed: verification-needed-precise
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (17.0 KiB)

This bug was fixed in the package linux - 3.2.0-48.74

---------------
linux (3.2.0-48.74) precise; urgency=low

  [Steve Conklin]

  * Release Tracking Bug
    - LP: #1188206

  [ Upstream Kernel Changes ]

  * iwlwifi: dvm: fix zero LQ CMD sending avoidance
    - LP: #1186932

linux (3.2.0-47.72) precise; urgency=low

  [Steve Conklin]

  * Release Tracking Bug
    - LP: #1187066

  [ Upstream Kernel Changes ]

  * Revert "drm/i915: Fix detection of base of stolen memory"
    - LP: #1186572
  * mmc: at91/avr32/atmel-mci: fix DMA-channel leak on module unload
    - LP: #1186572
  * mmc: core: Fix bit width test failing on old eMMC cards
    - LP: #1186572
  * mfd: adp5520: Restore mode bits on resume
    - LP: #1186572
  * mmc: atmel-mci: pio hang on block errors
    - LP: #1186572
  * x86: Eliminate irq_mis_count counted in arch_irq_stat
    - LP: #1186572
  * ASoC: wm8994: missing break in wm8994_aif3_hw_params()
    - LP: #1186572
  * ath9k: fix key allocation error handling for powersave keys
    - LP: #1186572
  * nfsd4: don't allow owner override on 4.1 CLAIM_FH opens
    - LP: #1186572
  * net/eth/ibmveth: Fixup retrieval of MAC address
    - LP: #1186572
  * ext4: limit group search loop for non-extent files
    - LP: #1186572
  * xen/vcpu/pvhvm: Fix vcpu hotplugging hanging.
    - LP: #1186572
  * autofs - remove autofs dentry mount check
    - LP: #1186572
  * ALSA: HDA: Fix Oops caused by dereference NULL pointer
    - LP: #1186572
  * iscsi-target: Fix processing of OOO commands
    - LP: #1186572
  * ACPICA: Fix possible buffer overflow during a field unit read operation
    - LP: #1186572
  * B43: Handle DMA RX descriptor underrun
    - LP: #1186572
  * mwifiex: clear is_suspended flag when interrupt is received early
    - LP: #1186572
  * mwifiex: fix memory leak issue when driver unload
    - LP: #1186572
  * mwifiex: fix setting of multicast filter
    - LP: #1186572
  * cifs: only set ops for inodes in I_NEW state
    - LP: #1186572
  * hp_accel: Ignore the error from lis3lv02d_poweron() at resume
    - LP: #1186572
  * KVM: VMX: fix halt emulation while emulating invalid guest sate
    - LP: #1186572
  * dm snapshot: fix error return code in snapshot_ctr
    - LP: #1186572
  * dm bufio: avoid a possible __vmalloc deadlock
    - LP: #1186572
  * tick: Cleanup NOHZ per cpu data on cpu down
    - LP: #1186572
  * ACPI / EC: Restart transaction even when the IBF flag set
    - LP: #1186572
  * drm/radeon: check incoming cliprects pointer
    - LP: #1186572
  * staging: vt6656: use free_netdev instead of kfree
    - LP: #1186572
  * hwmon: fix error return code in abituguru_probe()
    - LP: #1186572
  * Kirkwood: Enable PCIe port 1 on QNAP TS-11x/TS-21x
    - LP: #1186572
  * avr32: fix relocation check for signed 18-bit offset
    - LP: #1186572
  * powerpc/pseries: Fix partition migration hang in stop_topology_update
    - LP: #1186572
  * powerpc: Bring all threads online prior to migration/hibernation
    - LP: #1186572
  * timer: Don't reinitialize the cpu base lock during CPU_UP_PREPARE
    - LP: #1186572
  * tg3: Skip powering down function 0 on certain serdes devices
    - LP: #1186572
  * USB: xHCI: override bogus bu...

Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
Revision history for this message
Adam Conrad (adconrad) wrote : Update Released

The verification of this Stable Release Update has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regresssions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.