[Hyper-V][SAUCE] pci-hyperv: Use only 16 bit integer for PCI domain

Bug #1684971 reported by Joshua R. Poulson on 2017-04-20
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Joseph Salisbury
Xenial
High
Joseph Salisbury
Yakkety
High
Joseph Salisbury
Zesty
High
Joseph Salisbury
Artful
High
Joseph Salisbury
Cosmic
High
Marcelo Cerri
linux-azure (Ubuntu)
Undecided
Unassigned
Xenial
Undecided
Unassigned
Yakkety
Undecided
Unassigned
Zesty
Undecided
Unassigned
Artful
Undecided
Unassigned
Cosmic
Undecided
Marcelo Cerri

Bug Description

The following patch fixes a problem with "[PATCH] pci-hyperv: Use device serial number as PCI domain" where some drivers were expecting a u16 instead of a u32 for PCI device serial numbers, as observed by Oops and hangs in Azure on NC and NV GPU instances.

From: Haiyang Zhang <email address hidden>

This patch uses the lower 16 bits of the serial number as PCI
domain, otherwise some drivers may not be able to handle it.

Signed-off-by: Haiyang Zhang <email address hidden>
---
 drivers/pci/host/pci-hyperv.c | 4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c
index e73880c..b18dff3 100644
--- a/drivers/pci/host/pci-hyperv.c
+++ b/drivers/pci/host/pci-hyperv.c
@@ -1334,9 +1334,11 @@ static void put_pcichild(struct hv_pci_dev *hpdev,
   * can have shorter names than based on the bus instance UUID.
   * Only the first device serial number is used for domain, so the
   * domain number will not change after the first device is added.
+ * The lower 16 bits of the serial number is used, otherwise some
+ * drivers may not be able to handle it.
   */
  if (list_empty(&hbus->children))
- hbus->sysdata.domain = desc->ser;
+ hbus->sysdata.domain = desc->ser & 0xFFFF;
  list_add_tail(&hpdev->list_entry, &hbus->children);
  spin_unlock_irqrestore(&hbus->device_list_lock, flags);
  return hpdev;
--
1.7.1

Joshua R. Poulson (jrp) wrote :

The patch has been submitted upstream and should apply to 4.4, 4.8, and 4.10

Changed in linux (Ubuntu):
status: New → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key kernel-hyper-v xenial yakkety zesty
Changed in linux (Ubuntu Xenial):
importance: Undecided → High
Changed in linux (Ubuntu Yakkety):
importance: Undecided → High
Changed in linux (Ubuntu Zesty):
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
status: New → In Progress
Changed in linux (Ubuntu Yakkety):
status: New → In Progress
Changed in linux (Ubuntu Zesty):
status: New → In Progress
Changed in linux (Ubuntu):
status: Confirmed → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Yakkety):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Zesty):
assignee: nobody → Joseph Salisbury (jsalisbury)
Joseph Salisbury (jsalisbury) wrote :

Test kernels with this patch are now available at:
http://kernel.ubuntu.com/~jsalisbury/lp1684971/

tags: added: patch
Joshua R. Poulson (jrp) wrote :

Verified by Long to work. My own instance had a problem, but I think that was with my test kernel with a different patch as it hung shutting down not coming up.

4.4.0-75, 4.8.0-49, and 4.10.0-20 are all exhibiting the problems in the wild and I'm getting reports. This may require a fast turnaround.

Changed in linux (Ubuntu Artful):
status: In Progress → Fix Committed
Stefan Bader (smb) on 2017-04-26
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

Josh,

There is a -proposed kernel ready for testing with this fix.

Brad Figg (brad-figg) wrote :

Josh,

There is a -proposed kernel with this fix. Please test asap.

Joshua R. Poulson (jrp) wrote :

In progress.

Joshua R. Poulson (jrp) wrote :
Download full text (3.6 KiB)

Succeeds.

Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-77-generic x86_64)

 * Documentation: https://help.ubuntu.com
 * Management: https://landscape.canonical.com
 * Support: https://ubuntu.com/advantage

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

0 packages can be updated.
0 updates are security updates.

Last login: Wed Apr 26 19:33:01 2017 from 167.220.1.17
jrp@jrpcudau:~$ dmesg | grep [Nn][Vv]
[ 0.000000] BIOS-e820: [mem 0x000000001ffff000-0x000000001fffffff] ACPI NVS
[ 0.108095] smpboot: APIC(0) Converting physical 0 to logical package 0
[ 0.200051] PM: Registering ACPI NVS region [mem 0x1ffff000-0x1fffffff] (4096 bytes)
[ 1.150948] rtc_cmos 00:00: alarms up to one month, 114 bytes nvram
[ 19.933256] sd 2:0:0:0: [storvsc] Add. Sense: Invalid command operation code
[ 19.933268] sd 3:0:1:0: [storvsc] Add. Sense: Invalid command operation code
[ 19.933304] sd 2:0:0:0: [storvsc] Add. Sense: Invalid command operation code
[ 19.933312] sd 3:0:1:0: [storvsc] Add. Sense: Invalid command operation code
[ 21.402336] nvidia: module license 'NVIDIA' taints kernel.
[ 21.406984] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 21.411864] nvidia c4d9:00:00.0: can't derive routing for PCI INT A
[ 21.411867] nvidia c4d9:00:00.0: PCI INT A: no GSI
[ 21.414385] nvidia-nvlink: Nvlink Core is being initialized, major device number 246
[ 21.414395] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 375.51 Wed Mar 22 10:26:12 PDT 2017 (using threaded interrupts)
[ 21.530031] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 375.51 Wed Mar 22 09:00:58 PDT 2017
[ 21.531290] [drm] [nvidia-drm] [GPU ID 0xc4d90000] Loading driver
[ 21.594837] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 245
jrp@jrpcudau:~$ lspci
0000:00:00.0 Host bridge: Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge (AGP disabled) (rev 03)
0000:00:07.0 ISA bridge: Intel Corporation 82371AB/EB/MB PIIX4 ISA (rev 01)
0000:00:07.1 IDE interface: Intel Corporation 82371AB/EB/MB PIIX4 IDE (rev 01)
0000:00:07.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 02)
0000:00:08.0 VGA compatible controller: Microsoft Corporation Hyper-V virtual VGA
c4d9:00:00.0 3D controller: NVIDIA Corporation GK210GL [Tesla K80] (rev a1)
jrp@jrpcudau:~$ nvidia-smi
Wed Apr 26 19:50:27 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.51 Driver Version: 375.51 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | C4D9:00:00.0 Off | 0 |
| N/A 41C P0 60W / 149W | 0MiB / 11439MiB | 0% Default |
+-------------------------------+----------------------+----------------------+

+---...

Read more...

Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.4.0-77.98

---------------
linux (4.4.0-77.98) xenial; urgency=low

  * linux: 4.4.0-77.98 -proposed tracker (LP: #1686040)

  * [Hyper-V][SAUCE] pci-hyperv: Use only 16 bit integer for PCI domain
    (LP: #1684971)
    - SAUCE: pci-hyperv: Use only 16 bit integer for PCI domain

  * Upgrade Redpine WLAN/BT driver to ver. 1.2.RC4 (LP: #1669672)
    - SAUCE: sdhci: use PCI ID to identify Dell IoT gateways
    - SAUCE: Redpine: Upgrade to ver. 1.2.RC4
    - [Config] Update CONFIG_VEN_RSI_* configs
    - SAUCE: Redpine: add copyright to kernel packages

  * Fix RX fail issue on Exar USB serial driver after resume from S3/S4
    (LP: #1685133)
    - SAUCE: xr-usb-serial: Update driver for Exar USB serial ports

  * Miscellaneous Ubuntu changes
    - [Config] updating configs to match redpine driver changes

 -- Kleber Sacilotto de Souza <email address hidden> Tue, 25 Apr 2017 19:32:01 +0200

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.8.0-51.54

---------------
linux (4.8.0-51.54) yakkety; urgency=low

  * linux: 4.8.0-51.54 -proposed tracker (LP: #1686070)

  * [Hyper-V][SAUCE] pci-hyperv: Use only 16 bit integer for PCI domain
    (LP: #1684971)
    - SAUCE: pci-hyperv: Use only 16 bit integer for PCI domain

linux (4.8.0-50.53) yakkety; urgency=low

  * linux: 4.8.0-50.53 -proposed tracker (LP: #1685847)

  * ubuntu 4.8 kernel, virtio_net error causes NAT packets to be lost
    (LP: #1683947)
    - virtio_net: Simplify call sites for virtio_net_hdr_{from, to}_skb().
    - virtio: don't set VIRTIO_NET_HDR_F_DATA_VALID on xmit
    - virtio-net: restore VIRTIO_HDR_F_DATA_VALID on receiving

 -- Kleber Sacilotto de Souza <email address hidden> Tue, 25 Apr 2017 13:08:56 +0200

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-zesty' to 'verification-done-zesty'. If the problem still exists, change the tag 'verification-needed-zesty' to 'verification-failed-zesty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-zesty
Launchpad Janitor (janitor) wrote :
Download full text (14.0 KiB)

This bug was fixed in the package linux - 4.10.0-21.23

---------------
linux (4.10.0-21.23) zesty; urgency=low

  * linux: 4.10.0-21.23 -proposed tracker (LP: #1686414)

  * Need to stop using bzip2 compression in packages for zesty onward
    (LP: #1686782)
    - [Debian] Use default compression for all packages

  * [Hyper-V][SAUCE] pci-hyperv: Use only 16 bit integer for PCI domain
    (LP: #1684971)
    - SAUCE: pci-hyperv: Use only 16 bit integer for PCI domain

  * CVE-2017-7477: macsec: avoid heap overflow in skb_to_sgvec (LP: #1685892)
    - macsec: avoid heap overflow in skb_to_sgvec
    - macsec: dynamically allocate space for sglist

  * Zesty update to 4.10.11 stable release (LP: #1685140)
    - drm/i915: Fix forcewake active domain tracking
    - drm/i915: Move updating color management to before vblank evasion
    - drm/i915/fbdev: Stop repeating tile configuration on stagnation
    - drm/i915: Squelch any ktime/jiffie rounding errors for wait-ioctl
    - drm/i915/gen9: Increase PCODE request timeout to 50ms
    - drm/i915: Store a permanent error in obj->mm.pages
    - drm/i915: Nuke debug messages from the pipe update critical section
    - drm/i915: Avoid tweaking evaluation thresholds on Baytrail v3
    - drm/i915: Reject HDMI 12bpc if the sink doesn't indicate support
    - drm/i915: Only enable hotplug interrupts if the display interrupts are
      enabled
    - drm/i915: Drop support for I915_EXEC_CONSTANTS_* execbuf parameters.
    - drm/i915: Stop using RP_DOWN_EI on Baytrail
    - drm/i915: Avoid rcu_barrier() from reclaim paths (shrinker)
    - drm/i915: Do .init_clock_gating() earlier to avoid it clobbering watermarks
    - orangefs: Dan Carpenter influenced cleanups...
    - orangefs: fix buffer size mis-match between kernel space and user space.
    - nfs: flexfiles: fix kernel OOPS if MDS returns unsupported DS type
    - rt2x00usb: fix anchor initialization
    - rt2x00usb: do not anchor rx and tx urb's
    - MIPS: Introduce irq_stack
    - MIPS: Stack unwinding while on IRQ stack
    - MIPS: Only change $28 to thread_info if coming from user mode
    - MIPS: Switch to the irq_stack in interrupts
    - MIPS: Select HAVE_IRQ_EXIT_ON_IRQ_STACK
    - MIPS: IRQ Stack: Fix erroneous jal to plat_irq_dispatch
    - crypto: caam - fix RNG deinstantiation error checking
    - crypto: caam - fix invalid dereference in caam_rsa_init_tfm()
    - dma-buf: add support for compat ioctl
    - Linux 4.10.11

  * Zesty update to v4.10.10 stable release (LP: #1682130)
    - drm/vmwgfx: Type-check lookups of fence objects
    - drm/vmwgfx: NULL pointer dereference in vmw_surface_define_ioctl()
    - drm/vmwgfx: avoid calling vzalloc with a 0 size in vmw_get_cap_3d_ioctl()
    - drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces
    - drm/vmwgfx: Remove getparam error message
    - drm/vmwgfx: fix integer overflow in vmw_surface_define_ioctl()
    - PCI: thunder-pem: Add legacy firmware support for Cavium ThunderX host
      controller
    - PCI: thunder-pem: Fix legacy firmware PEM-specific resources
    - sysfs: be careful of error returns from ops->show()
    - staging: android: ashmem: lseek failed due to no FM...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Artful):
status: Fix Committed → Fix Released
Marcelo Cerri (mhcerri) on 2019-02-28
Changed in linux (Ubuntu Cosmic):
assignee: nobody → Marcelo Cerri (mhcerri)
status: New → In Progress
importance: Undecided → High
Marcelo Cerri (mhcerri) on 2019-02-28
Changed in linux (Ubuntu Cosmic):
status: In Progress → Invalid
Changed in linux-azure (Ubuntu Xenial):
status: New → Invalid
Changed in linux-azure (Ubuntu Yakkety):
status: New → Invalid
Changed in linux-azure (Ubuntu Zesty):
status: New → Invalid
Changed in linux-azure (Ubuntu Artful):
status: New → Invalid
Changed in linux-azure (Ubuntu Cosmic):
status: New → In Progress
assignee: nobody → Marcelo Cerri (mhcerri)
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :
Download full text (4.3 KiB)

This bug was fixed in the package linux-azure - 4.18.0-1013.13~18.04.1

---------------
linux-azure (4.18.0-1013.13~18.04.1) bionic; urgency=medium

  * linux-azure: 4.18.0-1013.13~18.04.1 -proposed tracker (LP: #1818126)

  [ Ubuntu: 4.18.0-1013.13 ]

  * linux-azure: 4.18.0-1013.13 -proposed tracker (LP: #1818128)
  * linux-azure - Add the same 4.15 InfiniBand configuration settings to the
    4.18 kernel (LP: #1818141)
    - [Config] linux-azure: CONFIG_INFINIBAND_{USER_MAD,IPOIB,IPOIB_DEBUG}=y
  * Packaging resync (LP: #1786013)
    - [Packaging] resync getabis
    - [Packaging] update helper scripts
  * [Hyper-V][SAUCE] pci-hyperv: Use only 16 bit integer for PCI domain
    (LP: #1684971)
    - SAUCE: pci-hyperv: Use only 16 bit integer for PCI domain

linux-azure (4.18.0-1012.12~18.04.1) bionic; urgency=medium

  * linux-azure: 4.18.0-1012.12~18.04.1 -proposed tracker (LP: #1816782)

  * Packaging resync (LP: #1786013)
    - [Packaging] update update.conf

  [ Ubuntu: 4.18.0-1012.12 ]

  * linux-azure: 4.18.0-1012.12 -proposed tracker (LP: #1816783)
  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts
  * linux: 4.18.0-16.17 -proposed tracker (LP: #1814749)
  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts
  * CVE-2018-16880
    - vhost: fix OOB in get_rx_bufs()
  * RTL8822BE WiFi Disabled in Kernel 4.18.0-12 (LP: #1806472)
    - SAUCE: staging: rtlwifi: allow RTLWIFI_DEBUG_ST to be disabled
    - [Config] CONFIG_RTLWIFI_DEBUG_ST=n
    - SAUCE: Add r8822be to signature inclusion list
  * kernel oops in bcache module (LP: #1793901)
    - SAUCE: bcache: never writeback a discard operation
  * CVE-2018-18397
    - userfaultfd: use ENOENT instead of EFAULT if the atomic copy user fails
    - userfaultfd: shmem: allocate anonymous memory for MAP_PRIVATE shmem
    - userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas
    - userfaultfd: shmem: add i_size checks
    - userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
  * Ignore "incomplete report" from Elan touchpanels (LP: #1813733)
    - HID: i2c-hid: Ignore input report if there's no data present on Elan
      touchpanels
  * Vsock connect fails with ENODEV for large CID (LP: #1813934)
    - vhost/vsock: fix vhost vsock cid hashing inconsistent
  * Fix non-working pinctrl-intel (LP: #1811777)
    - pinctrl: intel: Do pin translation in other GPIO operations as well
  * ip6_gre: fix tunnel list corruption for x-netns (LP: #1812875)
    - ip6_gre: fix tunnel list corruption for x-netns
  * Backported commit breaks audio (fixed upstream) (LP: #1811566)
    - ASoC: intel: cht_bsw_max98090_ti: Add quirk for boards using pmc_plt_clk_0
    - ASoC: intel: cht_bsw_max98090_ti: Add pmc_plt_clk_0 quirk for Chromebook
      Clapper
    - ASoC: intel: cht_bsw_max98090_ti: Add pmc_plt_clk_0 quirk for Chromebook
      Gnawty
  * kvm_stat : missing python dependency (LP: #1798776)
    - tools/kvm_stat: switch to python3
  * [SRU] Fix Xorg crash with nomodeset when BIOS enable 64-bit fb addr
    (LP: #1812797)
    - vgaarb: Add support for 64-bit frame buffer address
    - vgaarb: Keep adding VGA device in queue
  * F...

Read more...

Changed in linux-azure (Ubuntu):
status: New → Fix Released
status: New → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-azure - 4.18.0-1013.13

---------------
linux-azure (4.18.0-1013.13) cosmic; urgency=medium

  * linux-azure: 4.18.0-1013.13 -proposed tracker (LP: #1818128)

  * linux-azure - Add the same 4.15 InfiniBand configuration settings to the
    4.18 kernel (LP: #1818141)
    - [Config] linux-azure: CONFIG_INFINIBAND_{USER_MAD,IPOIB,IPOIB_DEBUG}=y

  * Packaging resync (LP: #1786013)
    - [Packaging] resync getabis
    - [Packaging] update helper scripts

  * [Hyper-V][SAUCE] pci-hyperv: Use only 16 bit integer for PCI domain
    (LP: #1684971)
    - SAUCE: pci-hyperv: Use only 16 bit integer for PCI domain

 -- Marcelo Henrique Cerri <email address hidden> Thu, 28 Feb 2019 19:09:26 -0300

Changed in linux-azure (Ubuntu Cosmic):
status: Fix Committed → Fix Released
Brad Figg (brad-figg) on 2019-07-24
tags: added: cscc
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers