On Fri, Aug 02, 2013 at 09:58:29AM -0000, Oliver Francke wrote:
> after some testing I tried to narrow down a problem, which was initially reported by some users.
> Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as reported by now.
>
> All using some flavour of linux-3.2.x kernel.
>
> Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which solves the problem.
Is that a guest kernel upgrade?
> Problem could be triggert with some workload ala:
>
> spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat
> and in parallel do some apt-get install/remove/whatever.
>
> That results in a somewhat stuck qemu-session with the bad
> "kernel_hung_task..." messages.
>
> A typical command-line is as follows:
>
> /usr/local/qemu-1.6.0/bin/qemu-system-x86_64 -usbdevice tablet -enable-
> kvm -daemonize -pidfile /var/run/qemu-server/760.pid -monitor
> unix:/var/run/qemu-server/760.mon,server,nowait -vnc unix:/var/run/qemu-
> server/760.vnc,password -qmp unix:/var/run/qemu-
> server/760.qmp,server,nowait -nodefaults -serial none -parallel none
> -device virtio-net-pci,mac=00:F1:70:00:2F:80,netdev=vlan0d0 -netdev
> type=tap,id=vlan0d0,ifname=tap760i0d0,script=/etc/fcms/add_if.sh,downscript=/etc/fcms/downscript.sh
> -name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1
> -device virtio-blk-pci,drive=virtio0 -drive
> format=raw,file=rbd:1155823384/vm-760-disk-1.rbd:rbd_cache=false,cache=writeback,if=none,id=virtio0,media=disk,index=0,aio=native
> -drive
> format=raw,file=rbd:1155823384/vm-760-swap-1.rbd:rbd_cache=false,cache=writeback,if=virtio,media=disk,index=1,aio=native
> -drive if=ide,media=cdrom,id=ide1-cd0,readonly=on -drive
> if=ide,media=cdrom,id=ide1-cd1,readonly=on -boot order=dc
>
> no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring-
> session is accepted, need to hard-kill the process.
Yesterday I saw a possibly related report on IRC. It was a Windows
guest running under OpenStack with images on Ceph.
They reported that the QEMU process would lock up - ping would not work
and their management tools showed 0 CPU activity for the guest.
However, they were able to "kick" the guest by taking a VNC screenshot
(I think). Then it would come back to life.
If you have a Linux guest that is reporting kernel_hung_task, then it
could be a similar scenario.
Please confirm that the hung task message is from inside the guest.
If you are able to reproduce this and have an alternative non-Ceph
storage pool, please try that since Ceph is common to both these bug
reports.
On Fri, Aug 02, 2013 at 09:58:29AM -0000, Oliver Francke wrote:
> after some testing I tried to narrow down a problem, which was initially reported by some users.
> Seen on different distros - debian 7.1, ubuntu 12.04 LTS, IPFire-2.3 as reported by now.
>
> All using some flavour of linux-3.2.x kernel.
>
> Tried e.g. under Ubuntu an upgrade to "Linux 3.8.0-27-generic x86_64" which solves the problem.
Is that a guest kernel upgrade?
> Problem could be triggert with some workload ala: remove/ whatever. hung_task. .." messages. qemu-1. 6.0/bin/ qemu-system- x86_64 -usbdevice tablet -enable- qemu-server/ 760.pid -monitor run/qemu- server/ 760.mon, server, nowait -vnc unix:/var/run/qemu- 760.vnc, password -qmp unix:/var/run/qemu- 760.qmp, server, nowait -nodefaults -serial none -parallel none net-pci, mac=00: F1:70:00: 2F:80,netdev= vlan0d0 -netdev id=vlan0d0, ifname= tap760i0d0, script= /etc/fcms/ add_if. sh,downscript= /etc/fcms/ downscript. sh blk-pci, drive=virtio0 -drive raw,file= rbd:1155823384/ vm-760- disk-1. rbd:rbd_ cache=false, cache=writeback ,if=none, id=virtio0, media=disk, index=0, aio=native raw,file= rbd:1155823384/ vm-760- swap-1. rbd:rbd_ cache=false, cache=writeback ,if=virtio, media=disk, index=1, aio=native media=cdrom, id=ide1- cd0,readonly= on -drive media=cdrom, id=ide1- cd1,readonly= on -boot order=dc
>
> spew -v --raw -P -t -i 3 -b 4k -p random -B 4k 1G /tmp/doof.dat
> and in parallel do some apt-get install/
>
> That results in a somewhat stuck qemu-session with the bad
> "kernel_
>
> A typical command-line is as follows:
>
> /usr/local/
> kvm -daemonize -pidfile /var/run/
> unix:/var/
> server/
> server/
> -device virtio-
> type=tap,
> -name 1155823384-4 -m 512 -vga cirrus -k de -smp sockets=1,cores=1
> -device virtio-
> format=
> -drive
> format=
> -drive if=ide,
> if=ide,
>
> no "system_reset", "sendkey ctrl-alt-delete" or "q" in monitoring-
> session is accepted, need to hard-kill the process.
Yesterday I saw a possibly related report on IRC. It was a Windows
guest running under OpenStack with images on Ceph.
They reported that the QEMU process would lock up - ping would not work
and their management tools showed 0 CPU activity for the guest.
However, they were able to "kick" the guest by taking a VNC screenshot
(I think). Then it would come back to life.
If you have a Linux guest that is reporting kernel_hung_task, then it
could be a similar scenario.
Please confirm that the hung task message is from inside the guest.
If you are able to reproduce this and have an alternative non-Ceph
storage pool, please try that since Ceph is common to both these bug
reports.
Stefan