[Hyper-V] hv: util: move waiting for release to hv_utils_transport itself

Bug #1682561 reported by Joshua R. Poulson on 2017-04-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Critical
Joseph Salisbury
Xenial
Critical
Joseph Salisbury
Yakkety
Critical
Joseph Salisbury
Zesty
Critical
Joseph Salisbury

Bug Description

We are observing call traces with the -73 and -74 proposed kernels that look like this:

[ 240.408061] INFO: task kworker/14:1:179 blocked for more than 120 seconds.
[ 240.412299] Not tainted 4.4.0-73-generic #94-Ubuntu
[ 240.415546] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.420217] kworker/14:1 D ffff8804e6f23b68 0 179 2 0x00000000
[ 240.420229] Workqueue: hv_vmbus_con vmbus_onmessage_work [hv_vmbus]
[ 240.420236] ffff8804e6f23b68 00000001810000f9 ffff8804ed944600 ffff8804e6fe3800
[ 240.420241] ffff8804e6f24000 ffffffffc0187a48 ffffffffc0187a40 ffff8804e6fe3800
[ 240.420243] ffff8804ea196800 ffff8804e6f23b80 ffffffff81837845 7fffffffffffffff
[ 240.420245] Call Trace:
[ 240.420254] [<ffffffff81837845>] schedule+0x35/0x80
[ 240.420256] [<ffffffff8183a995>] schedule_timeout+0x1b5/0x270
[ 240.420260] [<ffffffff811edf9a>] ? kfree+0x13a/0x150
[ 240.420264] [<ffffffff811ada52>] ? kfree_const+0x22/0x30
[ 240.420268] [<ffffffff813faa14>] ? kobject_release+0x94/0x190
[ 240.420270] [<ffffffff813fa8a7>] ? kobject_put+0x27/0x50
[ 240.420272] [<ffffffff818382a3>] wait_for_completion+0xb3/0x140
[ 240.420276] [<ffffffff810ac590>] ? wake_up_q+0x70/0x70
[ 240.420284] [<ffffffffc01840cf>] hv_kvp_deinit+0x4f/0x60 [hv_utils]
[ 240.420288] [<ffffffffc0183321>] util_remove+0x21/0x40 [hv_utils]
[ 240.420294] [<ffffffffc000c027>] vmbus_remove+0x27/0x30 [hv_vmbus]
[ 240.420301] [<ffffffff8155d2e1>] __device_release_driver+0xa1/0x150
[ 240.420303] [<ffffffff8155d3b3>] device_release_driver+0x23/0x30
[ 240.420306] [<ffffffff8155ca01>] bus_remove_device+0x101/0x170
[ 240.420308] [<ffffffff81558b69>] device_del+0x139/0x270
[ 240.420310] [<ffffffff81558cbe>] device_unregister+0x1e/0x60
[ 240.420313] [<ffffffffc000d82f>] vmbus_device_unregister+0x1f/0x50 [hv_vmbus]
[ 240.420316] [<ffffffffc00110c1>] vmbus_onoffer_rescind+0x91/0xb0 [hv_vmbus]
[ 240.420324] [<ffffffffc0011323>] vmbus_onmessage+0x33/0xa0 [hv_vmbus]
[ 240.420327] [<ffffffffc000d381>] vmbus_onmessage_work+0x21/0x30 [hv_vmbus]
[ 240.420331] [<ffffffff8109a555>] process_one_work+0x165/0x480
[ 240.420335] [<ffffffff8109a8bb>] worker_thread+0x4b/0x4c0
[ 240.420338] [<ffffffff8109a870>] ? process_one_work+0x480/0x480
[ 240.420341] [<ffffffff8109a870>] ? process_one_work+0x480/0x480
[ 240.420345] [<ffffffff810a0be8>] kthread+0xd8/0xf0
[ 240.420347] [<ffffffff810a0b10>] ? kthread_create_on_node+0x1e0/0x1e0
[ 240.420350] [<ffffffff8183bd0f>] ret_from_fork+0x3f/0x70
[ 240.420352] [<ffffffff810a0b10>] ? kthread_create_on_node+0x1e0/0x1e0
[ 317.360035] serial8250: too much work for irq3

The following commit should fix this issue:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9c18ae6eb2b312f16c63e34b43ea23926daa398

Waiting for release_event in all three drivers introduced issues on release
as on_reset() hook is not always called. E.g. if the device was never
opened we will never get the completion.

Move the waiting code to hvutil_transport_destroy() and make sure it is
only called when the device is open. hvt->lock serialization should
guarantee the absence of races.

Fixes: 5a66fecbf6aa ("Drivers: hv: util: kvp: Fix a rescind processing issue")
Fixes: 20951c7535b5 ("Drivers: hv: util: Fcopy: Fix a rescind processing issue")
Fixes: d77044d142e9 ("Drivers: hv: util: Backup: Fix a rescind processing issue")

Reported-by: Dexuan Cui <email address hidden>
Tested-by: Dexuan Cui <email address hidden>
Signed-off-by: Vitaly Kuznetsov <email address hidden>
Signed-off-by: K. Y. Srinivasan <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>

CVE References

Joshua R. Poulson (jrp) wrote :

This appears to affect 4.4, 4.8, and 4.10

Changed in linux (Ubuntu):
status: New → Confirmed
Joshua R. Poulson (jrp) wrote :

Unfortunately this is another critical issue.

Changed in linux (Ubuntu Xenial):
status: New → In Progress
Changed in linux (Ubuntu Yakkety):
status: New → In Progress
Changed in linux (Ubuntu Zesty):
status: Confirmed → In Progress
Changed in linux (Ubuntu Xenial):
importance: Undecided → Critical
Changed in linux (Ubuntu Yakkety):
importance: Undecided → Critical
Changed in linux (Ubuntu Zesty):
importance: Undecided → Critical
Changed in linux (Ubuntu Xenial):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Yakkety):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Zesty):
assignee: nobody → Joseph Salisbury (jsalisbury)
tags: added: kernel-hyper-v xenial yakkety zesty
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with a pick of commit e9c18ae6eb2b. It can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1682561/

I'll submit an SRU request as soon as it's confirmed a fix for the bug.

Simon Xiao (sixiao) wrote :

It is fixed in the test kernel http://kernel.ubuntu.com/~jsalisbury/lp1682561/.
Please submit the SRU request.

Joseph Salisbury (jsalisbury) wrote :

The three commits that introduced this bug have not landed in Yakkety(4.8). That's because 4.8 is EOL upstream and no longer gets upstream stable updates.

Will yakkety still needed commit e9c18ae6eb2b? The requested commit applied and built fine.

Yakkety will not get the following three updates unless they come in from another Ubuntu specific bug, which I don't think exists:

  - Drivers: hv: util: kvp: Fix a rescind processing issue
  - Drivers: hv: util: Fcopy: Fix a rescind processing issue
  - Drivers: hv: util: Backup: Fix a rescind processing issue

Joshua R. Poulson (jrp) wrote :

Those commits are in the payload for https://bugs.launchpad.net/ubuntu/xenial/+source/linux/+bug/1670544 which is in proposed right now.

Stefan Bader (smb) on 2017-04-20
Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Joshua R. Poulson (jrp) wrote :

Thanks! We'll verify the fix in -proposed.

Launchpad Janitor (janitor) wrote :
Download full text (29.1 KiB)

This bug was fixed in the package linux - 4.4.0-75.96

---------------
linux (4.4.0-75.96) xenial; urgency=low

  * linux: 4.4.0-75.96 -proposed tracker (LP: #1684441)

  * [Hyper-V] hv: util: move waiting for release to hv_utils_transport itself
    (LP: #1682561)
    - Drivers: hv: util: move waiting for release to hv_utils_transport itself

linux (4.4.0-74.95) xenial; urgency=low

  * linux: 4.4.0-74.95 -proposed tracker (LP: #1682041)

  * [Hyper-V] hv: vmbus: Raise retry/wait limits in vmbus_post_msg()
    (LP: #1681893)
    - Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()

linux (4.4.0-73.94) xenial; urgency=low

  * linux: 4.4.0-73.94 -proposed tracker (LP: #1680416)

  * CVE-2017-6353
    - sctp: deny peeloff operation on asocs with threads sleeping on it

  * vfat: missing iso8859-1 charset (LP: #1677230)
    - [Config] NLS_ISO8859_1=y

  * Regression: KVM modules should be on main kernel package (LP: #1678099)
    - [Config] powerpc: Add kvm-hv and kvm-pr to the generic inclusion list

  * linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
    4.4.0-63.84~14.04.2 (LP: #1664912)
    - SAUCE: apparmor: fix link auditing failure due to, uninitialized var

  * regession tests failing after stackprofile test is run (LP: #1661030)
    - SAUCE: fix regression with domain change in complain mode

  * Permission denied and inconsistent behavior in complain mode with 'ip netns
    list' command (LP: #1648903)
    - SAUCE: fix regression with domain change in complain mode

  * unexpected errno=13 and disconnected path when trying to open /proc/1/ns/mnt
    from a unshared mount namespace (LP: #1656121)
    - SAUCE: apparmor: null profiles should inherit parent control flags

  * apparmor refcount leak of profile namespace when removing profiles
    (LP: #1660849)
    - SAUCE: apparmor: fix ns ref count link when removing profiles from policy

  * tor in lxd: apparmor="DENIED" operation="change_onexec"
    namespace="root//CONTAINERNAME_<var-lib-lxd>" profile="unconfined"
    name="system_tor" (LP: #1648143)
    - SAUCE: apparmor: Fix no_new_privs blocking change_onexec when using stacked
      namespaces

  * apparmor oops in bind_mnt when dev_path lookup fails (LP: #1660840)
    - SAUCE: apparmor: fix oops in bind_mnt when dev_path lookup fails

  * apparmor auditing denied access of special apparmor .null fi\ le
    (LP: #1660836)
    - SAUCE: apparmor: Don't audit denied access of special apparmor .null file

  * apparmor label leak when new label is unused (LP: #1660834)
    - SAUCE: apparmor: fix label leak when new label is unused

  * apparmor reference count bug in label_merge_insert() (LP: #1660833)
    - SAUCE: apparmor: fix reference count bug in label_merge_insert()

  * apparmor's raw_data file in securityfs is sometimes truncated (LP: #1638996)
    - SAUCE: apparmor: fix replacement race in reading rawdata

  * unix domain socket cross permission check failing with nested namespaces
    (LP: #1660832)
    - SAUCE: apparmor: fix cross ns perm of unix domain sockets

  * Xenial update to v4.4.59 stable release (LP: #1678960)
    - xfrm: policy: init locks early
    - virtio_balloon: init ...

Changed in linux (Ubuntu Xenial):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (14.5 KiB)

This bug was fixed in the package linux - 4.8.0-49.52

---------------
linux (4.8.0-49.52) yakkety; urgency=low

  * linux: 4.8.0-49.52 -proposed tracker (LP: #1684427)

  * [Hyper-V] hv: util: move waiting for release to hv_utils_transport itself
    (LP: #1682561)
    - Drivers: hv: util: move waiting for release to hv_utils_transport itself

linux (4.8.0-48.51) yakkety; urgency=low

  * linux: 4.8.0-48.51 -proposed tracker (LP: #1682034)

  * [Hyper-V] hv: vmbus: Raise retry/wait limits in vmbus_post_msg()
    (LP: #1681893)
    - Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()

linux (4.8.0-47.50) yakkety; urgency=low

  * linux: 4.8.0-47.50 -proposed tracker (LP: #1679678)

  * CVE-2017-6353
    - sctp: deny peeloff operation on asocs with threads sleeping on it

  * CVE-2017-5986
    - sctp: avoid BUG_ON on sctp_wait_for_sndbuf

  * vfat: missing iso8859-1 charset (LP: #1677230)
    - [Config] NLS_ISO8859_1=y

  * [Hyper-V] pci-hyperv: Use device serial number as PCI domain (LP: #1667527)
    - net/mlx4_core: Use cq quota in SRIOV when creating completion EQs

  * Regression: KVM modules should be on main kernel package (LP: #1678099)
    - [Config] powerpc: Add kvm-hv and kvm-pr to the generic inclusion list

  * linux-lts-xenial 4.4.0-63.84~14.04.2 ADT test failure with linux-lts-xenial
    4.4.0-63.84~14.04.2 (LP: #1664912)
    - SAUCE: apparmor: fix link auditing failure due to, uninitialized var

  * regession tests failing after stackprofile test is run (LP: #1661030)
    - SAUCE: fix regression with domain change in complain mode

  * Permission denied and inconsistent behavior in complain mode with 'ip netns
    list' command (LP: #1648903)
    - SAUCE: fix regression with domain change in complain mode

  * unexpected errno=13 and disconnected path when trying to open /proc/1/ns/mnt
    from a unshared mount namespace (LP: #1656121)
    - SAUCE: apparmor: null profiles should inherit parent control flags

  * apparmor refcount leak of profile namespace when removing profiles
    (LP: #1660849)
    - SAUCE: apparmor: fix ns ref count link when removing profiles from policy

  * tor in lxd: apparmor="DENIED" operation="change_onexec"
    namespace="root//CONTAINERNAME_<var-lib-lxd>" profile="unconfined"
    name="system_tor" (LP: #1648143)
    - SAUCE: apparmor: Fix no_new_privs blocking change_onexec when using stacked
      namespaces

  * apparmor oops in bind_mnt when dev_path lookup fails (LP: #1660840)
    - SAUCE: apparmor: fix oops in bind_mnt when dev_path lookup fails

  * apparmor auditing denied access of special apparmor .null fi\ le
    (LP: #1660836)
    - SAUCE: apparmor: Don't audit denied access of special apparmor .null file

  * apparmor label leak when new label is unused (LP: #1660834)
    - SAUCE: apparmor: fix label leak when new label is unused

  * apparmor reference count bug in label_merge_insert() (LP: #1660833)
    - SAUCE: apparmor: fix reference count bug in label_merge_insert()

  * apparmor's raw_data file in securityfs is sometimes truncated (LP: #1638996)
    - SAUCE: apparmor: fix replacement race in reading rawdata

  * unix domain socket cross permission check failing with n...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 4.10.0-20.22

---------------
linux (4.10.0-20.22) zesty; urgency=low

  * linux: 4.10.0-20.22 -proposed tracker (LP: #1684491)

  * [Hyper-V] hv: util: move waiting for release to hv_utils_transport itself
    (LP: #1682561)
    - Drivers: hv: util: move waiting for release to hv_utils_transport itself

 -- Stefan Bader <email address hidden> Wed, 19 Apr 2017 16:13:16 +0200

Changed in linux (Ubuntu Zesty):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers