[Hyper-V] Missing PCI patches breaking SR-IOV hot remove

Bug #1670518 reported by Joshua R. Poulson
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned
Xenial
Fix Released
Medium
Joseph Salisbury
Yakkety
Fix Released
Medium
Tim Gardner
Zesty
Fix Released
Medium
Unassigned

Bug Description

Looks like the rebase work missed some prerequisite patches in drivers/pci/host/pci-hyperv.c

(Needed for SR-IOV in Azure on lts-xenial, HWE, and custom)

commit 0de8ce3ee8e38cc66683438f715c79a2cc69539e
Author: Long Li <email address hidden>
Date: Tue Nov 8 14:04:38 2016 -0800

    PCI: hv: Allocate physically contiguous hypercall params buffer

    hv_do_hypercall() assumes that we pass a segment from a physically
    contiguous buffer. A buffer allocated on the stack may not work if
    CONFIG_VMAP_STACK=y is set.

    Use kmalloc() to allocate this buffer.

commit 542ccf4551fa019a8ae9dfb7c8cd7e73a3d7e614
Author: Tobias Klauser <email address hidden>
Date: Mon Oct 31 12:04:09 2016 +0100

    PCI: hv: Make unnecessarily global IRQ masking functions static

    Make hv_irq_mask() and hv_irq_unmask() static as they are only used in
    pci-hyperv.c

    This fixes a sparse warning.

commit e74d2ebdda33b3bdd1826b5b92e9aa45bdf92bb3
Author: Dexuan Cui <email address hidden>
Date: Thu Nov 10 07:19:52 2016 +0000

    PCI: hv: Delete the device earlier from hbus->children for hot-remove

    After we send a PCI_EJECTION_COMPLETE message to the host, the host will
    immediately send us a PCI_BUS_RELATIONS message with
    relations->device_count == 0, so pci_devices_present_work(), running on
    another thread, can find the being-ejected device, mark the
    hpdev->reported_missing to true, and run list_move_tail()/list_del() for
    the device -- this races hv_eject_device_work() -> list_del().

    Move the list_del() in hv_eject_device_work() to an earlier place, i.e.,
    before we send PCI_EJECTION_COMPLETE, so later the
    pci_devices_present_work() can't see the device.

commit 17978524a636d007e6b929304ae3eb5ea0371019
Author: Dexuan Cui <email address hidden>
Date: Thu Nov 10 07:18:47 2016 +0000

    PCI: hv: Fix hv_pci_remove() for hot-remove

    1. We don't really need such a big on-stack buffer when sending the
    teardown_packet: vmbus_sendpacket() here only uses sizeof(struct
    pci_message).

    2. In the hot-remove case (PCI_EJECT), after we send PCI_EJECTION_COMPLETE
    to the host, the host will send a RESCIND_CHANNEL message to us and the
    host won't access the per-channel ringbuffer any longer, so we needn't send
    PCI_RESOURCES_RELEASED/PCI_BUS_D0EXIT to the host, and we shouldn't expect
    the host's completion message of PCI_BUS_D0EXIT, which will never come.

    3. We should send PCI_BUS_D0EXIT after hv_send_resources_released().

    Signed-off-by: Dexuan Cui <email address hidden>
    Signed-off-by: Bjorn Helgaas <email address hidden>
    Reviewed-by: Jake Oshins <email address hidden>
    Acked-by: K. Y. Srinivasan <email address hidden>
    CC: Haiyang Zhang <email address hidden>
    CC: Vitaly Kuznetsov <email address hidden>

CVE References

Joshua R. Poulson (jrp)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Yakkety):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Changed in linux (Ubuntu Zesty):
status: Confirmed → Fix Released
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Xenial):
assignee: nobody → Tim Gardner (timg-tpi)
status: New → In Progress
Revision history for this message
Marcelo Cerri (mhcerri) wrote :

Patches already included on the custom azure kernel.

Revision history for this message
Simon Xiao (sixiao) wrote :

Could you please build a test kernel which is:
1) based on https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1667531
2) AND, with including the missing patches of this bug.

Thanks,
Simon

tags: added: kernel-da-key kernel-hyper-v
Changed in linux (Ubuntu Xenial):
importance: Undecided → Medium
assignee: Tim Gardner (timg-tpi) → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel per comment #4. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1670518/xenial/

Revision history for this message
Simon Xiao (sixiao) wrote :

Thanks. This test kernel has been verified and the result is PASS.

Changed in linux (Ubuntu Yakkety):
importance: Undecided → Medium
Changed in linux (Ubuntu Zesty):
importance: Undecided → Medium
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Xenial test kernel with all the patches from the following bugs:

bug 1670518
  PCI: hv: Allocate physically contiguous hypercall params buffer
  PCI: hv: Make unnecessarily global IRQ masking functions static
  PCI: hv: Delete the device earlier from hbus->children for hot-remove
  PCI: hv: Fix hv_pci_remove() for hot-remove

bug 1672785
  net/mlx4_core: Avoid delays during VF driver device shutdown

bug 1667531
  tools: hv: Enable network manager for bonding scripts on RH
  [net-next] tools: hv: Add clean up function for Ubuntu config
  bcc5a76 tools: hv: Add a script to help bonding synthetic and VF NICs

bug 1667527
 4a9b0933bdfc PCI: hv: Use device serial number as PCI domain

bug 1667007
 d3de209 net/mlx4_core: Use cq quota in SRIOV when creating completion EQs

bug 1650058
 14c84da90b0d net/mlx4_en: Fix bad WQE issue
 c46100f413ca net/mlx4_core: Fix racy CQ (Completion Queue) free
 f4f73e2e6308 net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions
 3c05ac20fe6e net/mlx4_core: Avoid command timeouts during VF driver device shutdown

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/HyperVCombined/

Brad Figg (brad-figg)
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
tags: added: verification-needed-yakkety
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'. If the problem still exists, change the tag 'verification-needed-yakkety' to 'verification-failed-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Simon Xiao (sixiao) wrote :

The issue has been fixed with those patches.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Changing tag as per Simon's comment in #10

tags: added: verification-done-xenial verification-done-yakkety
removed: verification-needed-xenial verification-needed-yakkety
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (29.1 KiB)

This bug was fixed in the package linux - 4.4.0-75.96

---------------
linux (4.4.0-75.96) xenial; urgency=low

  * linux: 4.4.0-75.96 -proposed tracker (LP: #1684441)

  * [Hyper-V] hv: util: move waiting for release to hv_utils_transport itself
    (LP: #1682561)
    - Drivers: hv: util: move waiting for release to hv_utils_transport itself

linux (4.4.0-74.95) xenial; urgency=low

  * linux: 4.4.0-74.95 -proposed tracker (LP: #1682041)

  * [Hyper-V] hv: vmbus: Raise retry/wait limits in vmbus_post_msg()
    (LP: #1681893)
    - Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()

linux (4.4.0-73.94) xenial; urgency=low

  * linux: 4.4.0-73.94 -proposed tracker (LP: #1680416)

  * CVE-2017-6353
    - sctp: deny peeloff operation on asocs with threads sleeping on it

  * vfat: missing iso8859-1 charset (LP: #1677230)
    - [Config] NLS_ISO8859_1=y

  * Regression: KVM modules should be on main kernel package (LP: #1678099)
    - [Config] powerpc: Add kvm-hv and kvm-pr to the generic inclusion list

  * linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
    4.4.0-63.84~14.04.2 (LP: #1664912)
    - SAUCE: apparmor: fix link auditing failure due to, uninitialized var

  * regession tests failing after stackprofile test is run (LP: #1661030)
    - SAUCE: fix regression with domain change in complain mode

  * Permission denied and inconsistent behavior in complain mode with 'ip netns
    list' command (LP: #1648903)
    - SAUCE: fix regression with domain change in complain mode

  * unexpected errno=13 and disconnected path when trying to open /proc/1/ns/mnt
    from a unshared mount namespace (LP: #1656121)
    - SAUCE: apparmor: null profiles should inherit parent control flags

  * apparmor refcount leak of profile namespace when removing profiles
    (LP: #1660849)
    - SAUCE: apparmor: fix ns ref count link when removing profiles from policy

  * tor in lxd: apparmor="DENIED" operation="change_onexec"
    namespace="root//CONTAINERNAME_<var-lib-lxd>" profile="unconfined"
    name="system_tor" (LP: #1648143)
    - SAUCE: apparmor: Fix no_new_privs blocking change_onexec when using stacked
      namespaces

  * apparmor oops in bind_mnt when dev_path lookup fails (LP: #1660840)
    - SAUCE: apparmor: fix oops in bind_mnt when dev_path lookup fails

  * apparmor auditing denied access of special apparmor .null fi\ le
    (LP: #1660836)
    - SAUCE: apparmor: Don't audit denied access of special apparmor .null file

  * apparmor label leak when new label is unused (LP: #1660834)
    - SAUCE: apparmor: fix label leak when new label is unused

  * apparmor reference count bug in label_merge_insert() (LP: #1660833)
    - SAUCE: apparmor: fix reference count bug in label_merge_insert()

  * apparmor's raw_data file in securityfs is sometimes truncated (LP: #1638996)
    - SAUCE: apparmor: fix replacement race in reading rawdata

  * unix domain socket cross permission check failing with nested namespaces
    (LP: #1660832)
    - SAUCE: apparmor: fix cross ns perm of unix domain sockets

  * Xenial update to v4.4.59 stable release (LP: #1678960)
    - xfrm: policy: init locks early
    - virtio_balloon: init ...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (14.5 KiB)

This bug was fixed in the package linux - 4.8.0-49.52

---------------
linux (4.8.0-49.52) yakkety; urgency=low

  * linux: 4.8.0-49.52 -proposed tracker (LP: #1684427)

  * [Hyper-V] hv: util: move waiting for release to hv_utils_transport itself
    (LP: #1682561)
    - Drivers: hv: util: move waiting for release to hv_utils_transport itself

linux (4.8.0-48.51) yakkety; urgency=low

  * linux: 4.8.0-48.51 -proposed tracker (LP: #1682034)

  * [Hyper-V] hv: vmbus: Raise retry/wait limits in vmbus_post_msg()
    (LP: #1681893)
    - Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()

linux (4.8.0-47.50) yakkety; urgency=low

  * linux: 4.8.0-47.50 -proposed tracker (LP: #1679678)

  * CVE-2017-6353
    - sctp: deny peeloff operation on asocs with threads sleeping on it

  * CVE-2017-5986
    - sctp: avoid BUG_ON on sctp_wait_for_sndbuf

  * vfat: missing iso8859-1 charset (LP: #1677230)
    - [Config] NLS_ISO8859_1=y

  * [Hyper-V] pci-hyperv: Use device serial number as PCI domain (LP: #1667527)
    - net/mlx4_core: Use cq quota in SRIOV when creating completion EQs

  * Regression: KVM modules should be on main kernel package (LP: #1678099)
    - [Config] powerpc: Add kvm-hv and kvm-pr to the generic inclusion list

  * linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
    4.4.0-63.84~14.04.2 (LP: #1664912)
    - SAUCE: apparmor: fix link auditing failure due to, uninitialized var

  * regession tests failing after stackprofile test is run (LP: #1661030)
    - SAUCE: fix regression with domain change in complain mode

  * Permission denied and inconsistent behavior in complain mode with 'ip netns
    list' command (LP: #1648903)
    - SAUCE: fix regression with domain change in complain mode

  * unexpected errno=13 and disconnected path when trying to open /proc/1/ns/mnt
    from a unshared mount namespace (LP: #1656121)
    - SAUCE: apparmor: null profiles should inherit parent control flags

  * apparmor refcount leak of profile namespace when removing profiles
    (LP: #1660849)
    - SAUCE: apparmor: fix ns ref count link when removing profiles from policy

  * tor in lxd: apparmor="DENIED" operation="change_onexec"
    namespace="root//CONTAINERNAME_<var-lib-lxd>" profile="unconfined"
    name="system_tor" (LP: #1648143)
    - SAUCE: apparmor: Fix no_new_privs blocking change_onexec when using stacked
      namespaces

  * apparmor oops in bind_mnt when dev_path lookup fails (LP: #1660840)
    - SAUCE: apparmor: fix oops in bind_mnt when dev_path lookup fails

  * apparmor auditing denied access of special apparmor .null fi\ le
    (LP: #1660836)
    - SAUCE: apparmor: Don't audit denied access of special apparmor .null file

  * apparmor label leak when new label is unused (LP: #1660834)
    - SAUCE: apparmor: fix label leak when new label is unused

  * apparmor reference count bug in label_merge_insert() (LP: #1660833)
    - SAUCE: apparmor: fix reference count bug in label_merge_insert()

  * apparmor's raw_data file in securityfs is sometimes truncated (LP: #1638996)
    - SAUCE: apparmor: fix replacement race in reading rawdata

  * unix domain socket cross permission check failing with n...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.