Race condition during shutdown
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
QEMU |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
I ran into a bug when I started several VMs in parallel using
libvirt. The VMs are using only a kernel and a initrd (which includes a
minimal OS). The guest OS itself does a 'poweroff -f' as soon as the
login prompt shows up. So the expectaction is that the VMs will start,
the shutdown will be initiated, and the QEMU processes will then
end. But instead some of the QEMU processes get stuck in ppoll().
A bisect showed that the first bad commit was
0f12264e7a41458
bdrv_drain_
I've already tried the current master (13b7b188501d41
since the problem might be related
to the commit a1405acddeb0af6
aio_notify_accept only during blocking aio_poll"). But the bug is still
there. I’ve reproduced the bug on x86_64 and on s390x.
The backtrace of a hanging QEMU process:
(gdb) bt
#0 0x00007f5d0e251b36 in ppoll () from target:
#1 0x0000560191052014 in qemu_poll_ns (fds=0x560193b2
#2 0x00005601910531fa in os_host_
#3 0x0000560191053119 in main_loop_wait (nonblocking=0) at /home/user/
#4 0x0000560190baf454 in main_loop () at /home/user/
#5 0x0000560190baa552 in main (argc=71, argv=0x7ffde10e
The used domain definition is:
<domain type='kvm'>
<name>test</name>
<memory unit='KiB'
<vcpu placement=
<iothreads>
<os>
<type arch='x86_64' machine=
<kernel>
<initrd>
<cmdline>
<boot dev='hd'/>
</os>
<features>
<acpi/>
</features>
<clock offset='utc'/>
<on_poweroff>
<on_reboot>
<on_crash>
<devices>
<emulator>
<controller type='usb' index='0' model='piix3-uhci'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
</controller>
<controller type='pci' index='0' model='pci-root'/>
<controller type='virtio-
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</controller>
<console type='pty'>
<target type='virtio' port='0'/>
</console>
<input type='mouse' bus='ps2'/>
<input type='keyboard' bus='ps2'/>
<memballoon model='none'/>
</devices>
</domain>
Do you find the cause of the bug and fix it?