[Hyper-V] PCI Passthrough kernel hang and explicit barriers

Bug #1581243 reported by Joshua R. Poulson on 2016-05-12
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Joseph Salisbury
Xenial
Medium
Joseph Salisbury
Yakkety
Medium
Joseph Salisbury

Bug Description

Two upstream commits (right now in Bjorn Helgaas's PCI tree, and heading to Linus's tree) address potential hangs in PCI passthrough. Please consider these upstream items for 16.10 and 16.04 (and HWE kernels based on lts-xenial).

https://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?h=pci/host-hv&id=deb22e5c84c884a129d801cf3bfde7411536998d

PCI: hv: Report resources release after stopping the bus
Kernel hang is observed when pci-hyperv module is release with device
drivers still attached. E.g., when I do 'rmmod pci_hyperv' with BCM5720
device pass-through-ed (tg3 module) I see the following:

 NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [rmmod:2104]
 ...
 Call Trace:
  [<ffffffffa0641487>] tg3_read_mem+0x87/0x100 [tg3]
  [<ffffffffa063f000>] ? 0xffffffffa063f000
  [<ffffffffa0644375>] tg3_poll_fw+0x85/0x150 [tg3]
  [<ffffffffa0649877>] tg3_chip_reset+0x357/0x8c0 [tg3]
  [<ffffffffa064ca8b>] tg3_halt+0x3b/0x190 [tg3]
  [<ffffffffa0657611>] tg3_stop+0x171/0x230 [tg3]
  ...
  [<ffffffffa064c550>] tg3_remove_one+0x90/0x140 [tg3]
  [<ffffffff813bee59>] pci_device_remove+0x39/0xc0
  [<ffffffff814a3201>] __device_release_driver+0xa1/0x160
  [<ffffffff814a32e3>] device_release_driver+0x23/0x30
  [<ffffffff813b794a>] pci_stop_bus_device+0x8a/0xa0
  [<ffffffff813b7ab6>] pci_stop_root_bus+0x36/0x60
  [<ffffffffa02c3f38>] hv_pci_remove+0x238/0x260 [pci_hyperv]

The problem seems to be that we report local resources release before
stopping the bus and removing devices from it and device drivers may try to
perform some operations with these resources on shutdown. Move resources
release report after we do pci_stop_root_bus().

Signed-off-by: Vitaly Kuznetsov <email address hidden>
Signed-off-by: Bjorn Helgaas <email address hidden>
Acked-by: Jake Oshins <email address hidden>

https://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?h=pci/host-hv&id=bdd74440d9e887b1fa648eefa17421def5f5243c

PCI: hv: Add explicit barriers to config space accesspci/host-hv
I'm trying to pass-through Broadcom BCM5720 NIC (Dell device 1f5b) on a
Dell R720 server. Everything works fine when the target VM has only one
CPU, but SMP guests reboot when the NIC driver accesses PCI config space
with hv_pcifront_read_config()/hv_pcifront_write_config(). The reboot
appears to be induced by the hypervisor and no crash is observed. Windows
event logs are not helpful at all ('Virtual machine ... has quit
unexpectedly'). The particular access point is always different and
putting debug between them (printk/mdelay/...) moves the issue further
away. The server model affects the issue as well: on Dell R420 I'm able to
pass-through BCM5720 NIC to SMP guests without issues.

While I'm obviously failing to reveal the essence of the issue I was able
to come up with a (possible) solution: if explicit barriers are added to
hv_pcifront_read_config()/hv_pcifront_write_config() the issue goes away.
The essential minimum is rmb() at the end on _hv_pcifront_read_config() and
wmb() at the end of _hv_pcifront_write_config() but I'm not confident it
will be sufficient for all hardware. I suggest the following barriers:

1) wmb()/mb() between choosing the function and writing to its space.
2) mb() before releasing the spinlock in both _hv_pcifront_read_config()/
   _hv_pcifront_write_config() to ensure that consecutive reads/writes to
  the space won't get re-ordered as drivers may count on that.

Config space access is not supposed to be performance-critical so these
explicit barriers should not cause any slowdown.

[bhelgaas: use Linux "barriers" terminology]
Signed-off-by: Vitaly Kuznetsov <email address hidden>
Signed-off-by: Bjorn Helgaas <email address hidden>
Acked-by: Jake Oshins <email address hidden>

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1581243

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Joshua R. Poulson (jrp) wrote :

No log files required, upstream commits cited.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key kernel-hyper-v xenial yakkety
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Changed in linux (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → Medium
Joseph Salisbury (jsalisbury) wrote :

I built a Xenial test kernel with the two requested commits. The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1581243/

Please test the kernel and let us know your results. If we can receive positive test feedback by May 27th, we can proceed to submit the patches for kernel SRU review in an attempt to make the upcoming kernel SRU cycle starting on May 27th. Assuming the patches are accepted for SRU and land in the upcoming SRU cycle, the fix should be available from the -updates pocket by June 20th.

Changed in linux (Ubuntu Xenial):
status: Triaged → In Progress
Changed in linux (Ubuntu Yakkety):
status: Triaged → In Progress
Changed in linux (Ubuntu Xenial):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Yakkety):
assignee: nobody → Joseph Salisbury (jsalisbury)
Long Li (longli) wrote :

Tested. Everything worked as expected.

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Kamal Mostafa (kamalmostafa) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Launchpad Janitor (janitor) wrote :
Download full text (24.1 KiB)

This bug was fixed in the package linux - 4.4.0-25.44

---------------
linux (4.4.0-25.44) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1591289

  * Xenial update to v4.4.13 stable release (LP: #1590455)
    - MIPS64: R6: R2 emulation bugfix
    - MIPS: math-emu: Fix jalr emulation when rd == $0
    - MIPS: MSA: Fix a link error on `_init_msa_upper' with older GCC
    - MIPS: Don't unwind to user mode with EVA
    - MIPS: Avoid using unwind_stack() with usermode
    - MIPS: Fix siginfo.h to use strict posix types
    - MIPS: Fix uapi include in exported asm/siginfo.h
    - MIPS: Fix watchpoint restoration
    - MIPS: Flush highmem pages in __flush_dcache_page
    - MIPS: Handle highmem pages in __update_cache
    - MIPS: Sync icache & dcache in set_pte_at
    - MIPS: ath79: make bootconsole wait for both THRE and TEMT
    - MIPS: Reserve nosave data for hibernation
    - MIPS: Loongson-3: Reserve 32MB for RS780E integrated GPU
    - MIPS: Use copy_s.fmt rather than copy_u.fmt
    - MIPS: Fix MSA ld_*/st_* asm macros to use PTR_ADDU
    - MIPS: Prevent "restoration" of MSA context in non-MSA kernels
    - MIPS: Disable preemption during prctl(PR_SET_FP_MODE, ...)
    - MIPS: ptrace: Fix FP context restoration FCSR regression
    - MIPS: ptrace: Prevent writes to read-only FCSR bits
    - MIPS: Fix sigreturn via VDSO on microMIPS kernel
    - MIPS: Build microMIPS VDSO for microMIPS kernels
    - MIPS: lib: Mark intrinsics notrace
    - MIPS: VDSO: Build with `-fno-strict-aliasing'
    - affs: fix remount failure when there are no options changed
    - ASoC: ak4642: Enable cache usage to fix crashes on resume
    - Input: uinput - handle compat ioctl for UI_SET_PHYS
    - ARM: mvebu: fix GPIO config on the Linksys boards
    - ARM: dts: at91: fix typo in sama5d2 PIN_PD24 description
    - ARM: dts: exynos: Add interrupt line to MAX8997 PMIC on exynos4210-trats
    - ARM: dts: imx35: restore existing used clock enumeration
    - ath9k: Add a module parameter to invert LED polarity.
    - ath9k: Fix LED polarity for some Mini PCI AR9220 MB92 cards.
    - ath10k: fix debugfs pktlog_filter write
    - ath10k: fix firmware assert in monitor mode
    - ath10k: fix rx_channel during hw reconfigure
    - ath10k: fix kernel panic, move arvifs list head init before htt init
    - ath5k: Change led pin configuration for compaq c700 laptop
    - hwrng: exynos - Fix unbalanced PM runtime put on timeout error path
    - rtlwifi: rtl8723be: Add antenna select module parameter
    - rtlwifi: btcoexist: Implement antenna selection
    - rtlwifi: Fix logic error in enter/exit power-save mode
    - rtlwifi: pci: use dev_kfree_skb_irq instead of kfree_skb in
      rtl_pci_reset_trx_ring
    - aacraid: Relinquish CPU during timeout wait
    - aacraid: Fix for aac_command_thread hang
    - aacraid: Fix for KDUMP driver hang
    - hwmon: (ads7828) Enable internal reference
    - mfd: intel-lpss: Save register context on suspend
    - mfd: intel_soc_pmic_core: Terminate panel control GPIO lookup table
      correctly
    - PM / Runtime: Fix error path in pm_runtime_force_resume()
    - cpuidle: Indicate when a device has been unregiste...

Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (26.1 KiB)

This bug was fixed in the package linux - 4.4.0-28.47

---------------
linux (4.4.0-28.47) xenial; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1595874

  * Linux netfilter local privilege escalation issues (LP: #1595350)
    - netfilter: x_tables: don't move to non-existent next rule
    - netfilter: x_tables: validate targets of jumps
    - netfilter: x_tables: add and use xt_check_entry_offsets
    - netfilter: x_tables: kill check_entry helper
    - netfilter: x_tables: assert minimum target size
    - netfilter: x_tables: add compat version of xt_check_entry_offsets
    - netfilter: x_tables: check standard target size too
    - netfilter: x_tables: check for bogus target offset
    - netfilter: x_tables: validate all offsets and sizes in a rule
    - netfilter: x_tables: don't reject valid target size on some architectures
    - netfilter: arp_tables: simplify translate_compat_table args
    - netfilter: ip_tables: simplify translate_compat_table args
    - netfilter: ip6_tables: simplify translate_compat_table args
    - netfilter: x_tables: xt_compat_match_from_user doesn't need a retval
    - netfilter: x_tables: do compat validation via translate_table
    - netfilter: x_tables: introduce and use xt_copy_counters_from_user

  * Linux netfilter IPT_SO_SET_REPLACE memory corruption (LP: #1555338)
    - netfilter: x_tables: validate e->target_offset early
    - netfilter: x_tables: make sure e->next_offset covers remaining blob size
    - netfilter: x_tables: fix unconditional helper

linux (4.4.0-27.46) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1594906

  * Support Edge Gateway's Bluetooth LED (LP: #1512999)
    - Revert "UBUNTU: SAUCE: Bluetooth: Support for LED on Marvell modules"

linux (4.4.0-26.45) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1594442

  * linux: Implement secure boot state variables (LP: #1593075)
    - SAUCE: UEFI: Add secure boot and MOK SB State disabled sysctl

  * failures building userspace packages that include ethtool.h (LP: #1592930)
    - ethtool.h: define INT_MAX for userland

linux (4.4.0-25.44) xenial; urgency=low

  [ Kamal Mostafa ]

  * Release Tracking Bug
    - LP: #1591289

  * Xenial update to v4.4.13 stable release (LP: #1590455)
    - MIPS64: R6: R2 emulation bugfix
    - MIPS: math-emu: Fix jalr emulation when rd == $0
    - MIPS: MSA: Fix a link error on `_init_msa_upper' with older GCC
    - MIPS: Don't unwind to user mode with EVA
    - MIPS: Avoid using unwind_stack() with usermode
    - MIPS: Fix siginfo.h to use strict posix types
    - MIPS: Fix uapi include in exported asm/siginfo.h
    - MIPS: Fix watchpoint restoration
    - MIPS: Flush highmem pages in __flush_dcache_page
    - MIPS: Handle highmem pages in __update_cache
    - MIPS: Sync icache & dcache in set_pte_at
    - MIPS: ath79: make bootconsole wait for both THRE and TEMT
    - MIPS: Reserve nosave data for hibernation
    - MIPS: Loongson-3: Reserve 32MB for RS780E integrated GPU
    - MIPS: Use copy_s.fmt rather than copy_u.fmt
    - MIPS: Fix MSA ld_*/st_* asm macros to use PTR_ADDU
    - MIPS: Prevent "restoration" of MSA c...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers