[Hyper-V] pci-hyperv: Use device serial number as PCI domain

Bug #1667527 reported by Joshua R. Poulson on 2017-02-24
22
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Joseph Salisbury
Xenial
Medium
Joseph Salisbury
Yakkety
Medium
Joseph Salisbury
Zesty
Medium
Joseph Salisbury

Bug Description

This allows PCI domain numbers starts with 1, and also unique
on the same VM. So names, such as VF NIC names, that include
domain number as part of the name, can be shorter than that
based on part of bus UUID previously. The new names will also
stay same for VMs created with copied VHD and same number of
devices.

This is needed for SR-IOV in Azure.

This is Bjorn's tree for 4.11 here: https://git.kernel.org/cgit/linux/kernel/git/helgaas/pci.git/commit/?h=pci/host-hv&id=4a9b0933bdfcd85da840284bf5a0eb17b654b9c2

CVE References

Joshua R. Poulson (jrp) on 2017-02-24
Changed in linux (Ubuntu):
status: New → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Is this only needed in Xenial or other releases as well?

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
tags: added: kernel-da-key kernel-hyper-v
Changed in linux (Ubuntu Xenial):
status: New → Triaged
importance: Undecided → Medium
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Xenial):
assignee: nobody → Joseph Salisbury (jsalisbury)
Joshua R. Poulson (jrp) wrote :

lts-xenial, HWE, and Azure custom.

Changed in linux (Ubuntu):
status: Triaged → In Progress
Changed in linux (Ubuntu Xenial):
status: Triaged → In Progress
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commit 4a9b0933bdfcd85. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1667527/xenial/

Simon Xiao (sixiao) wrote :
Download full text (5.1 KiB)

Kernel panic when running this kernel on Ubuntu 1604 on HyperV with Mellanox CX3 SR-IOV enabled.
Could you please build a test kernel based on this one (test kernel 4.4.0-65):

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1667007
http://kernel.ubuntu.com/~jsalisbury/lp1667007/xenial/

[ 7.976040] Modules linked in: mlx4_core(+) pci_hyperv i2c_piix4 8250_fintek hyperv_fb hv_ballo on input_leds joydev serio_raw mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr is csi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi parport_pc ppdev lp parport autofs4 btrfs raid1 0 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic hyperv_keyboard crct10dif_pclmul crc32_pclmul hid_hyperv hv_net vsc hv_storvsc hid scsi_transport_fc ghash_clmulni_intel hv_utils aesni_intel aes_x86_64 lrw gf128 mul glue_helper ablk_helper cryptd psmouse tulip pata_acpi hv_vmbus floppy fjes
[ 7.976040] CPU: 0 PID: 668 Comm: systemd-udevd Tainted: G B D 4.4.0-64-generic #85~ lp1667527
[ 7.976041] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 04/28/2016
[ 7.976042] 0000000000000286 0000000096836efa ffff8800f277f1e8 ffffffff813f8083
[ 7.976043] ffffea0003a87500 ffffffff81cd75a2 ffff8800f277f210 ffffffff811937ee
[ 7.976043] ffffea0003a87540 ffff880107ffbf80 ffff8800f277f320 ffff8800f277f2d8
[ 7.976044] Call Trace:
[ 7.976044] [<ffffffff813f8083>] dump_stack+0x63/0x90
[ 7.976045] [<ffffffff811937ee>] bad_page.part.68+0xae/0x100
[ 7.976046] [<ffffffff81197c86>] get_page_from_freelist+0x516/0xa50
[ 7.976048] [<ffffffff81198f19>] __alloc_pages_nodemask+0x159/0x2a0
[ 7.976049] [<ffffffff811e2a7c>] alloc_pages_current+0x8c/0x110
[ 7.976050] [<ffffffff8119354e>] __get_free_pages+0xe/0x40
[ 7.976051] [<ffffffff811beba4>] __tlb_remove_page+0x54/0xa0
[ 7.976053] [<ffffffff811bfa9e>] unmap_page_range+0x50e/0x7a0
[ 7.976054] [<ffffffff811bfdad>] unmap_single_vma+0x7d/0xe0
[ 7.976055] [<ffffffff811c0871>] unmap_vmas+0x51/0xa0
[ 7.976056] [<ffffffff811c9df7>] exit_mmap+0xa7/0x170
[ 7.976057] [<ffffffff8107e0a7>] mmput+0x57/0x130
[ 7.976058] [<ffffffff81083f2a>] do_exit+0x27a/0xb00
[ 7.976059] [<ffffffff81031c41>] oops_end+0xa1/0xd0
[ 7.976060] [<ffffffff810320fb>] die+0x4b/0x70
[ 7.976061] [<ffffffff8102f121>] do_trap+0xb1/0x140
[ 7.976062] [<ffffffff8102f4a9>] do_error_trap+0x89/0x110
[ 7.976063] [<ffffffff811edef7>] ? kfree+0x147/0x150
[ 7.976064] [<ffffffff81558c5e>] ? dev_printk_emit+0x4e/0x70
[ 7.976068] [<ffffffffc0443c0d>] ? mlx4_free_eq+0x11d/0x190 [mlx4_core]
[ 7.976069] [<ffffffff8102fa10>] do_invalid_op+0x20/0x30
[ 7.976070] [<ffffffff8183e10e>] invalid_op+0x1e/0x30
[ 7.976074] [<ffffffffc0443c0d>] ? mlx4_free_eq+0x11d/0x190 [mlx4_core]
[ 7.976075] [<ffffffff811edef7>] ? kfree+0x147/0x150
[ 7.976079] [<ffffffffc0443c0d>...

Read more...

Joseph Salisbury (jsalisbury) wrote :

I built another test kernel with commit 4a9b0933bdfcd85, but based on -65. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1667527/xenial/

Simon Xiao (sixiao) wrote :

Same kernel crash issue found on the v2 kernel (Comment #5).

This is the known good kernel:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1667007
Could you please apply the patch of this bug to that kernel?

Joseph Salisbury (jsalisbury) wrote :

I built a test kernel using the source tree from bug 1667007. However, I applied commit 4a9b0933bdfcd85 on top of that tree. The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1667527/Commit4a9b0933bdfOnBug1667007Tree/

Simon Xiao (sixiao) wrote :

Thanks. This test kernel works.

Joseph Salisbury (jsalisbury) wrote :

I'm going to mark 1667007 as a duplicate of this bug, since it's patches seem to be a dependency. That way all the patches will be SRU'd together.

Joseph Salisbury (jsalisbury) wrote :

Both patches were submitted for SRU in Xenial.

Simon Xiao (sixiao) wrote :

Do you mean the "two patches" from bug #1667531?
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1667531

Where can I get a test kernel for test right now? Thanks!

Joseph Salisbury (jsalisbury) wrote :

In comment #10, I meant the patches requested in this bug and in bug 1667007.

For bug 1667531, I saw your test kernel request in that bugs comments. That kernel is building now, and I'll post it shortly.

Simon Xiao (sixiao) wrote :

Please include this patch into custom azure kernel:
https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/azure/tree/

Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Joseph Salisbury (jsalisbury) wrote :

I built a Xenial test kernel with all the patches from the following bugs:

bug 1670518
  PCI: hv: Allocate physically contiguous hypercall params buffer
  PCI: hv: Make unnecessarily global IRQ masking functions static
  PCI: hv: Delete the device earlier from hbus->children for hot-remove
  PCI: hv: Fix hv_pci_remove() for hot-remove

bug 1672785
  net/mlx4_core: Avoid delays during VF driver device shutdown

bug 1667531
  tools: hv: Enable network manager for bonding scripts on RH
  [net-next] tools: hv: Add clean up function for Ubuntu config
  bcc5a76 tools: hv: Add a script to help bonding synthetic and VF NICs

bug 1667527
 4a9b0933bdfc PCI: hv: Use device serial number as PCI domain

bug 1667007
 d3de209 net/mlx4_core: Use cq quota in SRIOV when creating completion EQs

bug 1650058
 14c84da90b0d net/mlx4_en: Fix bad WQE issue
 c46100f413ca net/mlx4_core: Fix racy CQ (Completion Queue) free
 f4f73e2e6308 net/mlx4_core: Fix when to save some qp context flags for dynamic VST to VGT transitions
 3c05ac20fe6e net/mlx4_core: Avoid command timeouts during VF driver device shutdown

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/HyperVCombined/

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
Simon Xiao (sixiao) wrote :

Tested and issue has been resolved. Thanks!

Joshua R. Poulson (jrp) on 2017-03-22
tags: added: verification-done-xenial
removed: verification-needed-xenial
Changed in linux (Ubuntu Yakkety):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Zesty):
status: Fix Committed → In Progress
Tim Gardner (timg-tpi) on 2017-03-30
Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :
Download full text (21.0 KiB)

This bug was fixed in the package linux - 4.10.0-19.21

---------------
linux (4.10.0-19.21) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1680535

  * ADT regressions caused by "audit: fix auditd/kernel connection state
    tracking" (LP: #1680532)
    - SAUCE: Revert "audit: fix auditd/kernel connection state tracking"

  * Miscellaneous Ubuntu changes
    - [Config] updateconfigs to update CONFIG_GENERIC_CSUM for ppc64el
      This cleans up behind a Kconfig change that went undetected.

linux (4.10.0-18.20) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1680168

  * smartpqi driver needed in initram disk and installer (LP: #1680156)
    - UBUNU: [Config] Add smartpqi to d-i

linux (4.10.0-17.19) zesty; urgency=low

  [ Tim Gardner ]

  * Release Tracking Bug
    - LP: #1679718

  * Fix CVE-2017-7308 (LP: #1678009)
    - net/packet: fix overflow in check for priv area size
    - net/packet: fix overflow in check for tp_frame_nr
    - net/packet: fix overflow in check for tp_reserve

  * apparmor: oops on boot if parameters set on grub command line (LP: #1678048)
    - SAUCE: apparmor: fix parameters so that the permission test is bypassed at boot

  * apparmor: does not provide a way to detect policy updataes (LP: #1678032)
    - SAUCE: apparmor: add policy revision file interface

  * apparmor does not make support of query data visible (LP: #1678023)
    - SAUCE: apparmor: add label data availability to the feature set

  * apparmor query interface does not make supported query info available
    (LP: #1678030)
    - SAUCE: apparmor: add information about the query inteface to the feature set

  * change_profile incorrect when using namespaces with a compound stack
    (LP: #1677959)
    - SAUCE: apparmor: fix label parse for stacked labels

  * Zesty update to v4.10.8 stable release (LP: #1678930)
    - xfrm: policy: init locks early
    - xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window
    - xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder
    - KVM: nVMX: Fix nested VPID vmx exec control
    - KVM: x86: cleanup the page tracking SRCU instance
    - virtio_balloon: init 1st buffer in stats vq
    - pinctrl: qcom: Don't clear status bit on irq_unmask
    - c6x/ptrace: Remove useless PTRACE_SETREGSET implementation
    - h8300/ptrace: Fix incorrect register transfer count
    - mips/ptrace: Preserve previous registers for short regset write
    - sparc/ptrace: Preserve previous registers for short regset write
    - metag/ptrace: Preserve previous registers for short regset write
    - metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS
    - metag/ptrace: Reject partial NT_METAG_RPIPE writes
    - qla2xxx: Allow vref count to timeout on vport delete.
    - sched/rt: Add a missing rescheduling point
    - usb: musb: fix possible spinlock deadlock
    - Linux 4.10.8

  * [Hyper-V] pci-hyperv: Use device serial number as PCI domain (LP: #1667527)
    - net/mlx4_core: Use cq quota in SRIOV when creating completion EQs
    - PCI: hv: Use device serial number as PCI domain

  * Miscellaneous Ubuntu changes
    - [Config] flash-kernel should be a...

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'. If the problem still exists, change the tag 'verification-needed-yakkety' to 'verification-failed-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-yakkety
Simon Xiao (sixiao) on 2017-04-13
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :
Download full text (29.1 KiB)

This bug was fixed in the package linux - 4.4.0-75.96

---------------
linux (4.4.0-75.96) xenial; urgency=low

  * linux: 4.4.0-75.96 -proposed tracker (LP: #1684441)

  * [Hyper-V] hv: util: move waiting for release to hv_utils_transport itself
    (LP: #1682561)
    - Drivers: hv: util: move waiting for release to hv_utils_transport itself

linux (4.4.0-74.95) xenial; urgency=low

  * linux: 4.4.0-74.95 -proposed tracker (LP: #1682041)

  * [Hyper-V] hv: vmbus: Raise retry/wait limits in vmbus_post_msg()
    (LP: #1681893)
    - Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()

linux (4.4.0-73.94) xenial; urgency=low

  * linux: 4.4.0-73.94 -proposed tracker (LP: #1680416)

  * CVE-2017-6353
    - sctp: deny peeloff operation on asocs with threads sleeping on it

  * vfat: missing iso8859-1 charset (LP: #1677230)
    - [Config] NLS_ISO8859_1=y

  * Regression: KVM modules should be on main kernel package (LP: #1678099)
    - [Config] powerpc: Add kvm-hv and kvm-pr to the generic inclusion list

  * linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
    4.4.0-63.84~14.04.2 (LP: #1664912)
    - SAUCE: apparmor: fix link auditing failure due to, uninitialized var

  * regession tests failing after stackprofile test is run (LP: #1661030)
    - SAUCE: fix regression with domain change in complain mode

  * Permission denied and inconsistent behavior in complain mode with 'ip netns
    list' command (LP: #1648903)
    - SAUCE: fix regression with domain change in complain mode

  * unexpected errno=13 and disconnected path when trying to open /proc/1/ns/mnt
    from a unshared mount namespace (LP: #1656121)
    - SAUCE: apparmor: null profiles should inherit parent control flags

  * apparmor refcount leak of profile namespace when removing profiles
    (LP: #1660849)
    - SAUCE: apparmor: fix ns ref count link when removing profiles from policy

  * tor in lxd: apparmor="DENIED" operation="change_onexec"
    namespace="root//CONTAINERNAME_<var-lib-lxd>" profile="unconfined"
    name="system_tor" (LP: #1648143)
    - SAUCE: apparmor: Fix no_new_privs blocking change_onexec when using stacked
      namespaces

  * apparmor oops in bind_mnt when dev_path lookup fails (LP: #1660840)
    - SAUCE: apparmor: fix oops in bind_mnt when dev_path lookup fails

  * apparmor auditing denied access of special apparmor .null fi\ le
    (LP: #1660836)
    - SAUCE: apparmor: Don't audit denied access of special apparmor .null file

  * apparmor label leak when new label is unused (LP: #1660834)
    - SAUCE: apparmor: fix label leak when new label is unused

  * apparmor reference count bug in label_merge_insert() (LP: #1660833)
    - SAUCE: apparmor: fix reference count bug in label_merge_insert()

  * apparmor's raw_data file in securityfs is sometimes truncated (LP: #1638996)
    - SAUCE: apparmor: fix replacement race in reading rawdata

  * unix domain socket cross permission check failing with nested namespaces
    (LP: #1660832)
    - SAUCE: apparmor: fix cross ns perm of unix domain sockets

  * Xenial update to v4.4.59 stable release (LP: #1678960)
    - xfrm: policy: init locks early
    - virtio_balloon: init ...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (14.5 KiB)

This bug was fixed in the package linux - 4.8.0-49.52

---------------
linux (4.8.0-49.52) yakkety; urgency=low

  * linux: 4.8.0-49.52 -proposed tracker (LP: #1684427)

  * [Hyper-V] hv: util: move waiting for release to hv_utils_transport itself
    (LP: #1682561)
    - Drivers: hv: util: move waiting for release to hv_utils_transport itself

linux (4.8.0-48.51) yakkety; urgency=low

  * linux: 4.8.0-48.51 -proposed tracker (LP: #1682034)

  * [Hyper-V] hv: vmbus: Raise retry/wait limits in vmbus_post_msg()
    (LP: #1681893)
    - Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()

linux (4.8.0-47.50) yakkety; urgency=low

  * linux: 4.8.0-47.50 -proposed tracker (LP: #1679678)

  * CVE-2017-6353
    - sctp: deny peeloff operation on asocs with threads sleeping on it

  * CVE-2017-5986
    - sctp: avoid BUG_ON on sctp_wait_for_sndbuf

  * vfat: missing iso8859-1 charset (LP: #1677230)
    - [Config] NLS_ISO8859_1=y

  * [Hyper-V] pci-hyperv: Use device serial number as PCI domain (LP: #1667527)
    - net/mlx4_core: Use cq quota in SRIOV when creating completion EQs

  * Regression: KVM modules should be on main kernel package (LP: #1678099)
    - [Config] powerpc: Add kvm-hv and kvm-pr to the generic inclusion list

  * linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
    4.4.0-63.84~14.04.2 (LP: #1664912)
    - SAUCE: apparmor: fix link auditing failure due to, uninitialized var

  * regession tests failing after stackprofile test is run (LP: #1661030)
    - SAUCE: fix regression with domain change in complain mode

  * Permission denied and inconsistent behavior in complain mode with 'ip netns
    list' command (LP: #1648903)
    - SAUCE: fix regression with domain change in complain mode

  * unexpected errno=13 and disconnected path when trying to open /proc/1/ns/mnt
    from a unshared mount namespace (LP: #1656121)
    - SAUCE: apparmor: null profiles should inherit parent control flags

  * apparmor refcount leak of profile namespace when removing profiles
    (LP: #1660849)
    - SAUCE: apparmor: fix ns ref count link when removing profiles from policy

  * tor in lxd: apparmor="DENIED" operation="change_onexec"
    namespace="root//CONTAINERNAME_<var-lib-lxd>" profile="unconfined"
    name="system_tor" (LP: #1648143)
    - SAUCE: apparmor: Fix no_new_privs blocking change_onexec when using stacked
      namespaces

  * apparmor oops in bind_mnt when dev_path lookup fails (LP: #1660840)
    - SAUCE: apparmor: fix oops in bind_mnt when dev_path lookup fails

  * apparmor auditing denied access of special apparmor .null fi\ le
    (LP: #1660836)
    - SAUCE: apparmor: Don't audit denied access of special apparmor .null file

  * apparmor label leak when new label is unused (LP: #1660834)
    - SAUCE: apparmor: fix label leak when new label is unused

  * apparmor reference count bug in label_merge_insert() (LP: #1660833)
    - SAUCE: apparmor: fix reference count bug in label_merge_insert()

  * apparmor's raw_data file in securityfs is sometimes truncated (LP: #1638996)
    - SAUCE: apparmor: fix replacement race in reading rawdata

  * unix domain socket cross permission check failing with n...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers