QEMU 5.0: Guest VM hangs/freeze when unplugging USB device

Bug #1884684 reported by TheCatFelix
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned

Bug Description

Setup:

Host: Debian/SID, Kernel 5.6, QEMU 5.0
Guest: Windows 10 VM with PCI and USB device passthrough.

Problem: Guest VM suddenly hangs when pulling USB device out from the Host.

Observations:
 - Issue appears to be related to QEMU 5.0
   - It started after an upgrade to QEMU 5.0.
   - Downgrading only QEMU on multiple systems fixes the issue.

 - Issue is very reproducible.
   - Most of the time within a few attempts of pulling/reconnecting the device.
   - Issue happens with multiple devices (I did try standard HID devices, a webcam and an x-ray sensor).

 - Guest just hangs.
   - Display output remains on last frame shown.
   - Ping to Guest immediately stops working.
   - Logs in the Guest stop logging immediately.

 - Host is fine and thinks the Guest is fine.
   - Guest continues to show as running in "virsh list".
   - No suspicious entries in the QEMU logs.
   - No suspicious entries in Host syslogs/messages.
   - Host can can kill guest "virsh destroy" and respawn fine.

 - Issue seems widespread.
   - Multiple similar reports from ProxMox users after upgrade to ProxMox 6.2 for both Windows and Linux guests (First version that uses QEMU 5.0)

https://forum.proxmox.com/threads/vm-freezes-when-disconnecting-usb-keyboard-and-mouse.70287/
https://forum.proxmox.com/threads/usb-drive-crashes-vm.70214/
https://forum.proxmox.com/threads/latest-proxmox-usb-disconnects-freeze-kvm.70398/
https://forum.proxmox.com/threads/vm-with-gpu-passthrough-freezes-when-turning-off-monitor-after-proxmox-6-2-upgrade.69821/
https://forum.proxmox.com/threads/vm-with-gpu-passthrough-freezes-when-turning-off-monitor-after-proxmox-6-2-upgrade.69824/

I'd be more than happy any debugs that might be helpful.

Tags: usb
Revision history for this message
TheCatFelix (thecatfelix) wrote :
Download full text (4.5 KiB)

Following reports on Proxmox forums, this is still very much seen by multiple users with no known workaround.

I was able to run QEMU 5.0.13 (Debian) with all traces turned on and capture the following:

- Behavior is reproducible by unbinding usb device on the host (ex. "echo '1-8' > /sys/bus/usb/drivers/usb/unbind")
- qemu trace logs stops at exactly the same time when VM freezes
- Last few lines of the qemu trace:

1592303@1596123157.254134:virtio_notify vdev 0x56193d04d820 vq 0x7fcd5c48a010
1592320@1596123158.822309:usb_xhci_oper_read off 0x0004, ret 0x00000008
1592320@1596123158.822397:usb_xhci_port_read port 1, off 0x0000, ret 0x0e0002a0
1592320@1596123158.822459:usb_xhci_port_read port 2, off 0x0000, ret 0x0e0002a0
1592320@1596123158.822513:usb_xhci_port_read port 3, off 0x0000, ret 0x0e0002a0
1592320@1596123158.822565:usb_xhci_port_read port 4, off 0x0000, ret 0x0e0002a0
1592303@1596123159.858372:virtqueue_alloc_element elem 0x56193c8c8990 size 56 in_num 2 out_num 0
1592303@1596123159.858435:virtqueue_pop vq 0x7fcd5c48a010 elem 0x56193c8c8990 in_num 2 out_num 0
1592303@1596123159.858482:virtqueue_fill vq 0x7fcd5c48a010 elem 0x56193c8c8990 len 72 idx 0
1592303@1596123159.858533:virtqueue_flush vq 0x7fcd5c48a010 count 1
1592303@1596123159.858565:virtio_notify vdev 0x56193d04d820 vq 0x7fcd5c48a010
1592303@1596123160.272641:virtqueue_alloc_element elem 0x56193c8c8990 size 56 in_num 2 out_num 0
1592303@1596123160.272702:virtqueue_pop vq 0x7fcd5c48a010 elem 0x56193c8c8990 in_num 2 out_num 0
1592303@1596123160.272751:virtqueue_fill vq 0x7fcd5c48a010 elem 0x56193c8c8990 len 104 idx 0
1592303@1596123160.272802:virtqueue_flush vq 0x7fcd5c48a010 count 1
1592303@1596123160.272833:virtio_notify vdev 0x56193d04d820 vq 0x7fcd5c48a010
1592303@1596123160.845694:lockcnt_unlock_attempt lockcnt 0x56193bea6514 unlock 5->4
1592303@1596123160.846821:lockcnt_unlock_success lockcnt 0x56193bea6514 unlock 5->4 succeeded
1592303@1596123160.847923:usb_host_req_complete dev 1:4, packet 0x7fcb84000ea8, status 0, length 0
1592303@1596123160.849369:usb_packet_state_change bus 0, port 1, ep 2, packet 0x7fcb84000ea8, state async -> complete
1592303@1596123160.851157:usb_xhci_xfer_success 0x7fcb84000ea0: len 0
1592303@1596123160.851214:usb_xhci_queue_event v 3, idx 5, ER_TRANSFER, CC_SHORT_PACKET, p 0xffffac0c62444ae3, s 0x0d000000, c 0x02058005
1592303@1596123160.851285:usb_xhci_irq_msix nr 3
1592303@1596123160.851331:usb_xhci_ep_kick slotid 2, epid 5, streamid 0
1592303@1596123160.851374:usb_host_req_data dev 1:4, packet 0x56193cce8da8, in 1, ep 2, size 4
1592303@1596123160.851434:usb_host_req_complete dev 1:4, packet 0x56193cce8da8, status -1, length 0
1592303@1596123160.851485:usb_packet_state_change bus 0, port 1, ep 2, packet 0x56193cce8da8, state queued -> complete
1592303@1596123160.851541:usb_xhci_xfer_error 0x56193cce8da0: ret -1
1592303@1596123160.851577:usb_xhci_queue_event v 3, idx 6, ER_TRANSFER, CC_USB_TRANSACTION_ERROR, p 0x00000001c18a4e20, s 0x04000004, c 0x02058001
1592303@1596123160.851647:usb_xhci_ep_state slotid 2, epid 5, running -> halted
1592303@1596123160.852700:usb_xhci_ep_kick slotid 2, epid 5, streamid 0
1592303@1596123160.852744:usb_host_re...

Read more...

Revision history for this message
TheCatFelix (thecatfelix) wrote :

Link to bug on the proxmox side:

https://bugzilla.proxmox.com/show_bug.cgi?id=2781

Revision history for this message
TheCatFelix (thecatfelix) wrote :
Download full text (5.5 KiB)

I do get get the same backtrace in gdb every time every time when we reproduce the hang:

(gdb) thread apply all bt

Thread 9 (Thread 0x7fd1415ff700 (LWP 3202)):
#0 0x00007fd323d154bf in __GI___poll (fds=0x7fd1415fe6c0, nfds=2, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1 0x00007fd324978bb2 in ?? () from target:/lib/x86_64-linux-gnu/libusb-1.0.so.0
#2 0x00007fd323defea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#3 0x00007fd323d1feaf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 8 (Thread 0x7fd1437fe700 (LWP 3171)):
#0 0x00007fd323d16d87 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055a5daef74f7 in kvm_vcpu_ioctl ()
#2 0x000055a5daef7631 in kvm_cpu_exec ()
#3 0x000055a5daedaede in ?? ()
#4 0x000055a5db32194b in ?? ()
#5 0x00007fd323defea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007fd323d1feaf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 7 (Thread 0x7fd143fff700 (LWP 3170)):
#0 0x00007fd323d16d87 in ioctl () at ../sysdeps/unix/syscall-template.S:120
#1 0x000055a5daef74f7 in kvm_vcpu_ioctl ()
#2 0x000055a5daef7631 in kvm_cpu_exec ()
#3 0x000055a5daedaede in ?? ()
#4 0x000055a5db32194b in ?? ()
#5 0x00007fd323defea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#6 0x00007fd323d1feaf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 6 (Thread 0x7fd150dfd700 (LWP 3169)):
#0 __lll_lock_wait (futex=futex@entry=0x55a5db80a540, private=0) at lowlevellock.c:52
#1 0x00007fd323df2843 in __GI___pthread_mutex_lock (mutex=0x55a5db80a540) at ../nptl/pthread_mutex_lock.c:80
#2 0x000055a5db321b43 in qemu_mutex_lock_impl ()
#3 0x000055a5daedac8e in qemu_mutex_lock_iothread_impl ()
#4 0x000055a5dae92ac9 in ?? ()
#5 0x000055a5dae97de7 in flatview_read_continue ()
#6 0x000055a5dae98023 in ?? ()
#7 0x000055a5dae9813b in address_space_read_full ()
#8 0x000055a5daef78cf in kvm_cpu_exec ()
#9 0x000055a5daedaede in ?? ()
#10 0x000055a5db32194b in ?? ()
#11 0x00007fd323defea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007fd323d1feaf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 5 (Thread 0x7fd1515fe700 (LWP 3168)):
#0 __lll_lock_wait (futex=futex@entry=0x55a5db80a540, private=0) at lowlevellock.c:52
#1 0x00007fd323df2843 in __GI___pthread_mutex_lock (mutex=0x55a5db80a540) at ../nptl/pthread_mutex_lock.c:80
#2 0x000055a5db321b43 in qemu_mutex_lock_impl ()
#3 0x000055a5daedac8e in qemu_mutex_lock_iothread_impl ()
#4 0x000055a5dae92ac9 in ?? ()
#5 0x000055a5dae97de7 in flatview_read_continue ()
#6 0x000055a5dae98023 in ?? ()
#7 0x000055a5dae9813b in address_space_read_full ()
#8 0x000055a5daef78cf in kvm_cpu_exec ()
#9 0x000055a5daedaede in ?? ()
#10 0x000055a5db32194b in ?? ()
#11 0x00007fd323defea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007fd323d1feaf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 4 (Thread 0x7fd151dff700 (LWP 3167)):
#0 __lll_lock_wait (futex=futex@entry=0x55a5db80a540, private=0) at lowlevellock.c:52
#1 0x00007fd323df2843 in __GI___pthread_mutex_lock (mutex=0x55a...

Read more...

Revision history for this message
Thomas Huth (th-huth) wrote :

The QEMU project is currently moving its bug tracking to another system.
For this we need to know which bugs are still valid and which could be
closed already. Thus we are setting older bugs to "Incomplete" now.

If the bug has already been fixed in the latest upstream version of QEMU,
then please close this ticket as "Fix released".

If it is not fixed yet and you think that this bug report here is still
valid, then you have two options:

1) If you already have an account on gitlab.com, please open a new ticket
for this problem in our new tracker here:

    https://gitlab.com/qemu-project/qemu/-/issues

and then close this ticket here on Launchpad (or let it expire auto-
matically after 60 days). Please mention the URL of this bug ticket on
Launchpad in the new ticket on GitLab.

2) If you don't have an account on gitlab.com and don't intend to get
one, but still would like to keep this ticket opened, then please switch
the state back to "New" within the next 60 days (otherwise it will get
closed as "Expired"). We will then eventually migrate the ticket auto-
matically to the new system (but you won't be the reporter of the bug
in the new system and thus won't get notified on changes anymore).

Thank you and sorry for the inconvenience.

Changed in qemu:
status: New → Incomplete
tags: added: usb
Revision history for this message
TheCatFelix (thecatfelix) wrote :

Issue does not occur in latest version of QEMU anymore.

Changed in qemu:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.