[Hyper-V] hv: util: move waiting for release to hv_utils_transport itself

Bug #1682561 reported by Joshua R. Poulson on 2017-04-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Critical
Joseph Salisbury
Xenial
Critical
Joseph Salisbury
Yakkety
Critical
Joseph Salisbury
Zesty
Critical
Joseph Salisbury

Bug Description

We are observing call traces with the -73 and -74 proposed kernels that look like this:

[ 240.408061] INFO: task kworker/14:1:179 blocked for more than 120 seconds.
[ 240.412299] Not tainted 4.4.0-73-generic #94-Ubuntu
[ 240.415546] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 240.420217] kworker/14:1 D ffff8804e6f23b68 0 179 2 0x00000000
[ 240.420229] Workqueue: hv_vmbus_con vmbus_onmessage_work [hv_vmbus]
[ 240.420236] ffff8804e6f23b68 00000001810000f9 ffff8804ed944600 ffff8804e6fe3800
[ 240.420241] ffff8804e6f24000 ffffffffc0187a48 ffffffffc0187a40 ffff8804e6fe3800
[ 240.420243] ffff8804ea196800 ffff8804e6f23b80 ffffffff81837845 7fffffffffffffff
[ 240.420245] Call Trace:
[ 240.420254] [<ffffffff81837845>] schedule+0x35/0x80
[ 240.420256] [<ffffffff8183a995>] schedule_timeout+0x1b5/0x270
[ 240.420260] [<ffffffff811edf9a>] ? kfree+0x13a/0x150
[ 240.420264] [<ffffffff811ada52>] ? kfree_const+0x22/0x30
[ 240.420268] [<ffffffff813faa14>] ? kobject_release+0x94/0x190
[ 240.420270] [<ffffffff813fa8a7>] ? kobject_put+0x27/0x50
[ 240.420272] [<ffffffff818382a3>] wait_for_completion+0xb3/0x140
[ 240.420276] [<ffffffff810ac590>] ? wake_up_q+0x70/0x70
[ 240.420284] [<ffffffffc01840cf>] hv_kvp_deinit+0x4f/0x60 [hv_utils]
[ 240.420288] [<ffffffffc0183321>] util_remove+0x21/0x40 [hv_utils]
[ 240.420294] [<ffffffffc000c027>] vmbus_remove+0x27/0x30 [hv_vmbus]
[ 240.420301] [<ffffffff8155d2e1>] __device_release_driver+0xa1/0x150
[ 240.420303] [<ffffffff8155d3b3>] device_release_driver+0x23/0x30
[ 240.420306] [<ffffffff8155ca01>] bus_remove_device+0x101/0x170
[ 240.420308] [<ffffffff81558b69>] device_del+0x139/0x270
[ 240.420310] [<ffffffff81558cbe>] device_unregister+0x1e/0x60
[ 240.420313] [<ffffffffc000d82f>] vmbus_device_unregister+0x1f/0x50 [hv_vmbus]
[ 240.420316] [<ffffffffc00110c1>] vmbus_onoffer_rescind+0x91/0xb0 [hv_vmbus]
[ 240.420324] [<ffffffffc0011323>] vmbus_onmessage+0x33/0xa0 [hv_vmbus]
[ 240.420327] [<ffffffffc000d381>] vmbus_onmessage_work+0x21/0x30 [hv_vmbus]
[ 240.420331] [<ffffffff8109a555>] process_one_work+0x165/0x480
[ 240.420335] [<ffffffff8109a8bb>] worker_thread+0x4b/0x4c0
[ 240.420338] [<ffffffff8109a870>] ? process_one_work+0x480/0x480
[ 240.420341] [<ffffffff8109a870>] ? process_one_work+0x480/0x480
[ 240.420345] [<ffffffff810a0be8>] kthread+0xd8/0xf0
[ 240.420347] [<ffffffff810a0b10>] ? kthread_create_on_node+0x1e0/0x1e0
[ 240.420350] [<ffffffff8183bd0f>] ret_from_fork+0x3f/0x70
[ 240.420352] [<ffffffff810a0b10>] ? kthread_create_on_node+0x1e0/0x1e0
[ 317.360035] serial8250: too much work for irq3

The following commit should fix this issue:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e9c18ae6eb2b312f16c63e34b43ea23926daa398

Waiting for release_event in all three drivers introduced issues on release
as on_reset() hook is not always called. E.g. if the device was never
opened we will never get the completion.

Move the waiting code to hvutil_transport_destroy() and make sure it is
only called when the device is open. hvt->lock serialization should
guarantee the absence of races.

Fixes: 5a66fecbf6aa ("Drivers: hv: util: kvp: Fix a rescind processing issue")
Fixes: 20951c7535b5 ("Drivers: hv: util: Fcopy: Fix a rescind processing issue")
Fixes: d77044d142e9 ("Drivers: hv: util: Backup: Fix a rescind processing issue")

Reported-by: Dexuan Cui <email address hidden>
Tested-by: Dexuan Cui <email address hidden>
Signed-off-by: Vitaly Kuznetsov <email address hidden>
Signed-off-by: K. Y. Srinivasan <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>

Joshua R. Poulson (jrp) wrote :

This appears to affect 4.4, 4.8, and 4.10

Changed in linux (Ubuntu):
status: New → Confirmed
Joshua R. Poulson (jrp) wrote :

Unfortunately this is another critical issue.

Changed in linux (Ubuntu Xenial):
status: New → In Progress
Changed in linux (Ubuntu Yakkety):
status: New → In Progress
Changed in linux (Ubuntu Zesty):
status: Confirmed → In Progress
Changed in linux (Ubuntu Xenial):
importance: Undecided → Critical
Changed in linux (Ubuntu Yakkety):
importance: Undecided → Critical
Changed in linux (Ubuntu Zesty):
importance: Undecided → Critical
Changed in linux (Ubuntu Xenial):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Yakkety):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Zesty):
assignee: nobody → Joseph Salisbury (jsalisbury)
tags: added: kernel-hyper-v xenial yakkety zesty
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with a pick of commit e9c18ae6eb2b. It can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1682561/

I'll submit an SRU request as soon as it's confirmed a fix for the bug.

Simon Xiao (sixiao) wrote :

It is fixed in the test kernel http://kernel.ubuntu.com/~jsalisbury/lp1682561/.
Please submit the SRU request.

Joseph Salisbury (jsalisbury) wrote :

The three commits that introduced this bug have not landed in Yakkety(4.8). That's because 4.8 is EOL upstream and no longer gets upstream stable updates.

Will yakkety still needed commit e9c18ae6eb2b? The requested commit applied and built fine.

Yakkety will not get the following three updates unless they come in from another Ubuntu specific bug, which I don't think exists:

  - Drivers: hv: util: kvp: Fix a rescind processing issue
  - Drivers: hv: util: Fcopy: Fix a rescind processing issue
  - Drivers: hv: util: Backup: Fix a rescind processing issue

Joshua R. Poulson (jrp) wrote :

Those commits are in the payload for https://bugs.launchpad.net/ubuntu/xenial/+source/linux/+bug/1670544 which is in proposed right now.

Stefan Bader (smb) on 2017-04-20
Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Xenial):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Joshua R. Poulson (jrp) wrote :

Thanks! We'll verify the fix in -proposed.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers