kvm guests stuck (not in io) and cannot be killed using -9

Bug #1130007 reported by Corin Langosch
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
High
Unassigned
qemu-kvm (Ubuntu)
Confirmed
High
Unassigned

Bug Description

I have two stuck kvm guests hanging in state R (for several hours now) on two
different hosts. Actually they are not consuming any CPU and cannot be killed
with kill -9. They are not responding to anything. Both systems are Ubuntu
12.10 Quantal running on a AMD II X4 965 Processor with 16GB RAM.

Stack trace of the first stuck guest:

cat /proc/2647/stack
[<ffffffff81683e06>] retint_careful+0x14/0x32
[<ffffffffffffffff>] 0xffffffffffffffff

Stack trace of the second stuck guest:

cat /proc/11932/stack
[<ffffffff81084d2a>] __cond_resched+0x2a/0x40
[<ffffffff811544aa>] try_to_unmap_file+0x3a/0x6d0
[<ffffffff811553c1>] try_to_unmap+0x31/0x70
[<ffffffff811722a8>] migrate_pages+0x318/0x500
[<ffffffff811453b4>] compact_zone+0x1e4/0x350
[<ffffffff811458bf>] try_to_compact_pages+0x16f/0x1d0
[<ffffffff81677a94>] __alloc_pages_direct_compact+0xaa/0x191
[<ffffffff8112bee3>] __alloc_pages_nodemask+0x473/0x920
[<ffffffff81164b30>] alloc_pages_current+0xb0/0x120
[<ffffffff8116bf5d>] new_slab+0x22d/0x2c0
[<ffffffff816790ea>] __slab_alloc+0x314/0x46e
[<ffffffff811708a1>] __kmalloc_node_track_caller+0xa1/0x1b0
[<ffffffff81566e99>] __alloc_skb+0x79/0x230
[<ffffffff81560a31>] sock_alloc_send_pskb+0x1d1/0x320
[<ffffffff81499ec1>] tun_get_user+0x131/0x4a0
[<ffffffff8149a759>] tun_chr_aio_write+0x69/0x90
[<ffffffff8118297a>] do_sync_readv_writev+0xda/0x120
[<ffffffff81182c64>] do_readv_writev+0xd4/0x1e0
[<ffffffff81182da5>] vfs_writev+0x35/0x60
[<ffffffff81182f2a>] sys_writev+0x4a/0xb0
[<ffffffff8168bb69>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

Packages:

Kernel: 3.5.0-23-generic #35-Ubuntu SMP Thu Jan 24 13:15:40 UTC 2013 x86_64
x86_64 x86_64 GNU/Linux
KVM: 1.2.0+noroms-0ubuntu2.12.10.2

Command line:

kvm -name vm-16762 -machine pc-1.2 -enable-kvm -cpu kvm64 -smp
sockets=1,cores=2 -m 1024 -vga cirrus -drive
id=drive1245,if=none,cache=writeback,aio=native,format=raw,media=disk,file=rbd:kvm1/vm-16762-disk-1
-device virtio-blk-pci,id=hdrive1245,addr=0x12,drive=drive1245 -netdev
tap,id=netdev1243,ifname=tap1243,script=,downscript= -device
virtio-net-pci,id=nic1243,addr=0x03,mac=02:48:6e:2b:4e:55,netdev=netdev1243
-chardev socket,id=qmp,path=/var/run/kvm/vm-16762/qmp,server,nowait -mon
chardev=qmp,mode=control -monitor unix:/var/run/kvm/vm-16762/mon,server,nowait
-vnc 10.0.0.60:21,password -usbdevice tablet -nodefaults -pidfile
/var/run/kvm/vm-16762/pid -boot menu=on -k de -chroot /var/run/kvm/vm-16762
-runas kvm -daemonize

Original bug report is here: https://bugzilla.kernel.org/show_bug.cgi?id=54071

Changed in qemu-kvm (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in qemu-kvm (Ubuntu):
importance: Medium → High
Changed in linux (Ubuntu):
importance: Medium → High
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1130007

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: quantal
Revision history for this message
Corin Langosch (ipfo) wrote :

No logs available. No entries in syslog, kern, dmesg, ...

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Corin Langosch (ipfo) wrote :

Today another guest got stuck:

cat /proc/499/stack
[<ffffffff81084d2a>] __cond_resched+0x2a/0x40
[<ffffffff81144bf2>] isolate_migratepages_range+0xb2/0x5d0
[<ffffffff81145386>] compact_zone+0x1b6/0x350
[<ffffffff811458bf>] try_to_compact_pages+0x16f/0x1d0
[<ffffffff81677a94>] __alloc_pages_direct_compact+0xaa/0x191
[<ffffffff8112bee3>] __alloc_pages_nodemask+0x473/0x920
[<ffffffff81164b30>] alloc_pages_current+0xb0/0x120
[<ffffffff8116bf5d>] new_slab+0x22d/0x2c0
[<ffffffff816790ea>] __slab_alloc+0x314/0x46e
[<ffffffff811708a1>] __kmalloc_node_track_caller+0xa1/0x1b0
[<ffffffff81566e99>] __alloc_skb+0x79/0x230
[<ffffffff81560a31>] sock_alloc_send_pskb+0x1d1/0x320
[<ffffffff81499ec1>] tun_get_user+0x131/0x4a0
[<ffffffff8149a759>] tun_chr_aio_write+0x69/0x90
[<ffffffff8118297a>] do_sync_readv_writev+0xda/0x120
[<ffffffff81182c64>] do_readv_writev+0xd4/0x1e0
[<ffffffff81182da5>] vfs_writev+0x35/0x60
[<ffffffff81182f2a>] sys_writev+0x4a/0xb0
[<ffffffff8168bb69>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

Revision history for this message
penalvch (penalvch) wrote :

Corin Langosch, could you please confirm this issue exists for your guest environment with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ . If the issue remains, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

tags: added: needs-kernel-logs needs-upstream-testing regression-potential
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.