New qemu instances often cannot be started if host system is under load

Bug #607204 reported by Guido Winkelmann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
QEMU
Expired
Undecided
Unassigned

Bug Description

I've got a problem where I cannot start any new VMs with qemu-kvm if the host machine is under high CPU load. The problem is not 100% reproducible (it works sometimes), but under load conditions, it happens most of the time - roughly 95%.

I'm usually using libvirt to start and stop KVM VMs. When using virsh to start a new VM under those conditions, the output looks like this:

virsh # start testserver-a
error: Failed to start domain testserver-a
error: monitor socket did not show up.: Connection refused

(There is a very long wait after the command has been sent until the error message shows up.)

This is (an example of) the command line that libvirtd uses to start up qemu:

----- snip -----
LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin HOME=/root USER=root LOGNAME=root QEMU_AUDIO_DRV=none /usr/bin/qemu-kvm -S -M pc-0.12 -enable-kvm -m 256 -smp 1,sockets=1,cores=1,threads=1 -name testserver-a -uuid 7cbb3665-4d58-86b8-ce8f-20541995a99c -nodefaults -chardev socket,id=monitor,path=/usr/local/var/lib/libvirt/qemu/testserver-a.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -no-acpi -boot c -device lsi,id=scsi0,bus=pci.0,addr=0x7 -drive file=/data/testserver-a-system.img,if=none,id=drive-scsi0-0-1,boot=on -device scsi-disk,bus=scsi0.0,scsi-id=1,drive=drive-scsi0-0-1,id=scsi0-0-1 -drive file=/data/testserver-a-data1.img,if=none,id=drive-virtio-disk1 -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,id=virtio-disk1 -drive file=/data/testserver-a-data2.img,if=none,id=drive-virtio-disk2 -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk2,id=virtio-disk2 -drive file=/data/gentoo-install-amd64-minimal-20100408.iso,if=none,media=cdrom,id=drive-ide0-0-0,readonly=on -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=/data/testserver-a_configfloppy.img,if=none,id=drive-fdc0-0-0 -global isa-fdc.driveA=drive-fdc0-0-0 -device e1000,vlan=0,id=net0,mac=52:54:00:84:6d:69,bus=pci.0,addr=0x6 -net tap,fd=24,vlan=0,name=hostnet0 -usb -vnc 127.0.0.1:1,password -k de -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3
----- snip -----

Copy-pasting this to a commandline on the host to start qemu manually leads to a non-functional qemu process that "just sits there" with nothing happening. The monitor socket /usr/local/var/lib/libvirt/qemu/testserver-a.monitor will, indeed, not show up.

I've tried starting qemu with the same commandline but without the parameters for redirecting the monitor to a socket, without the fd parameter for the network interface and without the vnc parameter. This resulted in a black window with the title "QEMU (testserver-a) [Stopped]". I could not access the monitor console in graphical mode either. When I press Ctrl-Alt-2 in graphical mode to access the monitor console, qemu will sometimes (but not always) crash with a segfault about 2 seconds after.

Some experimentation I've done suggests that this problem only happens if the high cpu load is caused by another qemu process, not if it is caused by something else running on the machine.

The bug appears much less often if I leave off the -nodefaults parameter.

The bug will still appear if I start qemu as qemu-system-x86_64 instead of qemu-kvm and replace the -enable-kvm parameter with -no-kvm.

The host machine I'm running this on has got 16 cores in total. It looks like it is sufficient for this bug to surface if at least one of these cores is brought to near 100% use by a qemu process.

The version of qemu I'm using is qemu-kvm 0.12.4, built from source. Libvirt is version 0.8.1, built from source as well. The host OS is Fedora 12. The Kernel version is 2.6.32.12-115.fc12.x86_64.

Attached is an strace of attempting to start qemu which I hope will help someone with a better understanding of qemu's internals see what's actually going on there.

Revision history for this message
Guido Winkelmann (guido-qemubugs) wrote :
Revision history for this message
Guido Winkelmann (guido-qemubugs) wrote :

Forgot to mention:

The bug is still present in the latest git-version as of this writing.

Revision history for this message
Thomas Huth (th-huth) wrote :

Can you still reproduce this problem with the latest version of QEMU?

Changed in qemu:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.