New qemu instances often cannot be started if host system is under load
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
QEMU |
Expired
|
Undecided
|
Unassigned |
Bug Description
I've got a problem where I cannot start any new VMs with qemu-kvm if the host machine is under high CPU load. The problem is not 100% reproducible (it works sometimes), but under load conditions, it happens most of the time - roughly 95%.
I'm usually using libvirt to start and stop KVM VMs. When using virsh to start a new VM under those conditions, the output looks like this:
virsh # start testserver-a
error: Failed to start domain testserver-a
error: monitor socket did not show up.: Connection refused
(There is a very long wait after the command has been sent until the error message shows up.)
This is (an example of) the command line that libvirtd uses to start up qemu:
----- snip -----
LC_ALL=C PATH=/sbin:
----- snip -----
Copy-pasting this to a commandline on the host to start qemu manually leads to a non-functional qemu process that "just sits there" with nothing happening. The monitor socket /usr/local/
I've tried starting qemu with the same commandline but without the parameters for redirecting the monitor to a socket, without the fd parameter for the network interface and without the vnc parameter. This resulted in a black window with the title "QEMU (testserver-a) [Stopped]". I could not access the monitor console in graphical mode either. When I press Ctrl-Alt-2 in graphical mode to access the monitor console, qemu will sometimes (but not always) crash with a segfault about 2 seconds after.
Some experimentation I've done suggests that this problem only happens if the high cpu load is caused by another qemu process, not if it is caused by something else running on the machine.
The bug appears much less often if I leave off the -nodefaults parameter.
The bug will still appear if I start qemu as qemu-system-x86_64 instead of qemu-kvm and replace the -enable-kvm parameter with -no-kvm.
The host machine I'm running this on has got 16 cores in total. It looks like it is sufficient for this bug to surface if at least one of these cores is brought to near 100% use by a qemu process.
The version of qemu I'm using is qemu-kvm 0.12.4, built from source. Libvirt is version 0.8.1, built from source as well. The host OS is Fedora 12. The Kernel version is 2.6.32.
Attached is an strace of attempting to start qemu which I hope will help someone with a better understanding of qemu's internals see what's actually going on there.
Forgot to mention:
The bug is still present in the latest git-version as of this writing.