Xen MSI setup code incorrectly re-uses cached pirq

Bug #1656381 reported by Dan Streetman
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Dan Streetman
Trusty
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Undecided
Unassigned
Yakkety
Fix Released
Undecided
Unassigned
Zesty
Fix Released
High
Dan Streetman

Bug Description

[Impact]

This bug fixes the root problem reported in bug 1648449, so its description can be mostly reused here:

On an Amazon AWS instance that has NVMe drives, the NVMe drives fail to initialize, and so aren't usable by the system. If one of the NVMe drives contains the root filesystem, the instance won't boot.

[Test Case]

Boot an AWS instance with multiple NVMe drives. All except the first will fail to initialize, and errors will appear in the system log (if the system boots at all). With a patched kernel, all NVMe drives are initialized and enumerated and work properly.

[Regression Potential]

Patching the Xen MSI setup function may cause problems with other PCI devices using MSI/MSIX interrupts on a Xen guest.

Note this patch restores correct behavior for guests running under Xen 4.5 or later hypervisors - specifically Xen hypervisors with qemu 2.1.0 or later. For Xen hypervisors with qemu 2.0.0 or earlier, this patch causes a regression. With an Ubuntu hypervisor, Vivid or later qemu is patched, as well as UCA Kilo or later qemu. Trusty qemu or UCA Icehouse qemu are not patched - see bug 1657489.

[Other Info]

The patch from bug 1648449 was only a workaround, that changed the NVMe driver to not trigger this Xen bug. However, there have been reports of that patch causing non-Xen systems with NVMe drives to stop working, in bug 1626894. So, the best thing to do is revert the workaround patch (and its regression fix patch from bug 1651602) back to the original NVMe drive code, and apply the real Xen patch to fix the problem. That should restore functionality for non-Xen systems, and should allow Xen systems with multiple NVMe controllers to work.

Upstream discussion:
https://lists.xen.org/archives/html/xen-devel/2017-01/msg00447.html

Related: bug 1657489 ("qemu-xen: free all the pirqs for msi/msix when driver unload")

CVE References

Dan Streetman (ddstreet)
Changed in linux (Ubuntu):
assignee: nobody → Dan Streetman (ddstreet)
status: New → In Progress
importance: Undecided → High
Changed in linux (Ubuntu Xenial):
status: New → Fix Committed
Luis Henriques (henrix)
Changed in linux (Ubuntu Yakkety):
status: New → Fix Committed
Dan Streetman (ddstreet)
description: updated
Revision history for this message
John Donnelly (jpdonnelly) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'. If the problem still exists, change the tag 'verification-needed-yakkety' to 'verification-failed-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-yakkety
Revision history for this message
John Donnelly (jpdonnelly) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Revision history for this message
Dan Streetman (ddstreet) wrote :

Verified with the 4.4.0-62-generic kernel on an AWS i3 instance, all the NVMe drives initialized successfully.

tags: added: verification-done-xenial
removed: verification-needed-xenial
Revision history for this message
Dan Streetman (ddstreet) wrote :

Verified with the 4.8.0-36-generic kernel on an AWS i3 instance, all the NVMe drives initialized successfully.

tags: added: verification-done-yakkety
removed: verification-needed-yakkety
Luis Henriques (henrix)
Changed in linux (Ubuntu Trusty):
status: New → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (10.8 KiB)

This bug was fixed in the package linux - 4.4.0-62.83

---------------
linux (4.4.0-62.83) xenial; urgency=low

  [ Thadeu Lima de Souza Cascardo ]

  * Release Tracking Bug
    - LP: #1657430

  * Backport DP MST fixes to i915 (LP: #1657353)
    - SAUCE: i915_bpo: Fix DP link rate math
    - SAUCE: i915_bpo: Validate mode against max. link data rate for DP MST

  * Ubuntu xenial - 4.4.0-59-generic i3 I/O performance issue (LP: #1657281)
    - blk-mq: really fix plug list flushing for nomerge queues

linux (4.4.0-61.82) xenial; urgency=low

  [ Thadeu Lima de Souza Cascardo ]

  * Release Tracking Bug
    - LP: #1656810

  * Xen MSI setup code incorrectly re-uses cached pirq (LP: #1656381)
    - SAUCE: xen: do not re-use pirq number cached in pci device msi msg data

  * nvme drive probe failure (LP: #1626894)
    - nvme: revert NVMe: only setup MSIX once

linux (4.4.0-60.81) xenial; urgency=low

  [ John Donnelly ]

  * Release Tracking Bug
    - LP: #1656084

  * Couldn't emulate instruction 0x7813427c (LP: #1634129)
    - KVM: PPC: Book3S PR: Fix illegal opcode emulation

  * perf: 24x7: Eliminate domain name suffix in event names (LP: #1560482)
    - powerpc/perf/hv-24x7: Fix usage with chip events.
    - powerpc/perf/hv-24x7: Display change in counter values
    - powerpc/perf/hv-24x7: Display domain indices in sysfs
    - powerpc/perf/24x7: Eliminate domain suffix in event names

  * i386 ftrace tests hang on ADT testing (LP: #1655040)
    - ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps
      to it

  * VMX module autoloading if available (LP: #1651322)
    - powerpc: Add module autoloading based on CPU features
    - crypto: vmx - Convert to CPU feature based module autoloading

  * ACPI probe support for AD5592/3 configurable multi-channel converter
    (LP: #1654497)
    - SAUCE: iio: dac: ad5592r: Add ACPI support
    - SAUCE: iio: dac: ad5593r: Add ACPI support

  * Xenial update to v4.4.40 stable release (LP: #1654602)
    - btrfs: limit async_work allocation and worker func duration
    - Btrfs: fix tree search logic when replaying directory entry deletes
    - btrfs: store and load values of stripes_min/stripes_max in balance status
      item
    - Btrfs: fix qgroup rescan worker initialization
    - USB: serial: option: add support for Telit LE922A PIDs 0x1040, 0x1041
    - USB: serial: option: add dlink dwm-158
    - USB: serial: kl5kusb105: fix open error path
    - USB: cdc-acm: add device id for GW Instek AFG-125
    - usb: hub: Fix auto-remount of safely removed or ejected USB-3 devices
    - usb: gadget: f_uac2: fix error handling at afunc_bind
    - usb: gadget: composite: correctly initialize ep->maxpacket
    - USB: UHCI: report non-PME wakeup signalling for Intel hardware
    - ALSA: usb-audio: Add QuickCam Communicate Deluxe/S7500 to
      volume_control_quirks
    - ALSA: hiface: Fix M2Tech hiFace driver sampling rate change
    - ALSA: hda/ca0132 - Add quirk for Alienware 15 R2 2016
    - ALSA: hda - ignore the assoc and seq when comparing pin configurations
    - ALSA: hda - fix headset-mic problem on a Dell laptop
    - ALSA: hda - Gate the mic jack on HP Z1 Gen3 AiO
    - ALSA: hd...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.8.0-37.39

---------------
linux (4.8.0-37.39) yakkety; urgency=low

  [ Thadeu Lima de Souza Cascardo ]

  * Release Tracking Bug
    - LP: #1659381

  * Mouse cursor invisible or does not move (LP: #1646574)
    - drm/nouveau/disp/nv50-: split chid into chid.ctrl and chid.user
    - drm/nouveau/disp/nv50-: specify ctrl/user separately when constructing
      classes
    - drm/nouveau/disp/gp102: fix cursor/overlay immediate channel indices

 -- Benjamin M Romer <email address hidden> Wed, 25 Jan 2017 16:12:02 -0200

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'. If the problem still exists, change the tag 'verification-needed-trusty' to 'verification-failed-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Revision history for this message
Dan Streetman (ddstreet) wrote :

Verified with kernel 3.13.0-109-generic on AWS instance NVMe drives are all initialized.

tags: added: verification-done-trusty
removed: verification-needed-trusty
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.13.0-109.156

---------------
linux (3.13.0-109.156) trusty; urgency=low

  [ Thadeu Lima de Souza Cascardo ]

  * Release Tracking Bug
    - LP: #1662186

  [ Luis Henriques ]
  * Backport Dirty COW patch to prevent wineserver freeze (LP: #1658270)
    - ARM: 7985/1: mm: implement pte_accessible for faulting mappings
    - ARM: 8108/1: mm: Introduce {pte,pmd}_isset and {pte,pmd}_isclear
    - ARM: 8037/1: mm: support big-endian page tables
    - ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE
    - arm64: mm: Route pmd thp functions through pte equivalents
    - mm: fix huge zero page accounting in smaps report
    - SAUCE: mm: Respect FOLL_FORCE/FOLL_COW for thp

  * kernel BUG at skbuff.h:1486 Insufficient linear data in skb
    __skb_pull.part.7+0x4/0x6 [openvswitch] (LP: #1655683)
    - SAUCE: openvswitch: gre: filter gre packets

  * CVE-2016-7911
    - block: fix use-after-free in sys_ioprio_get()

  * CVE-2016-7910
    - block: fix use-after-free in seq file

  * Xen MSI setup code incorrectly re-uses cached pirq (LP: #1656381)
    - SAUCE: xen: do not re-use pirq number cached in pci device msi msg data

 -- Thadeu Lima de Souza Cascardo <email address hidden> Tue, 07 Feb 2017 09:26:42 -0200

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
Revision history for this message
Dan Streetman (ddstreet) wrote :

this is fixed in zesty by bug 1677589, and is already upstream in artful and later.

Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Released
Dan Streetman (ddstreet)
Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.