Restoring KVM guest from saved state results in hung guest with non-virtio devices - in lucid

Bug #555981 reported by Kindjal
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Confirmed
Low
Unassigned

Bug Description

The dom0 host is a fresh install of Ubuntu Lucid

ii qemu-kvm 0.12.3+noroms-0ubuntu4
ii libvirt-bin 0.7.5-5ubuntu17
ii libvirt0 0.7.5-5ubuntu17

kernel 2.6.32-18-generic-pae

The domU is a Ubuntu Lucid created with ubuntu-vm-builder:

ubuntu-vm-builder kvm lucid \
  --domain vm2 --dest vm2 --hostname vm2 \
  --mem 256 --user user --pass password \
  --ip 10.0.24.200 --mask 255.255.255.0 --net 10.0.24.0 \
  --bcast 10.0.24.255 --gw 10.0.24.253 --dns 10.0.5.220 --bridge=br0 \
  --libvirt qemu:///system \
  --addpkg openssh-server \
  --addpkg acpid \
  --addpkg acpi-support \
  --addpkg screen

virsh suspend/restore works ok.
virsh save domain /tmp/domain.state works ok
virsh restore /tmp/domain.state claims to work, but then connect via vnc, or serial console reveals a "hung" guest, unsreponsive to any input.

This is different than a previously reported kernel panic in the guest.

The kvm process on dom0 is consuming 100% of cpu.

Stracing the process reveals...

8949 ioctl(13, 0xae80, 0) = 0
8938 <... select resumed> ) = 1 (in [18], left {0, 970198})
8949 ioctl(13, 0xae80 <unfinished ...>
8938 read(18, <unfinished ...>
8949 <... ioctl resumed> , 0) = 0
8938 <... read resumed> "\16\0\0\0\0\0\0\0\376\377\377\377\0\0\0\0\0\0\0\0\0\0\
0\0\0\0\0\0\0\0\0\0"..., 128) = 128
8949 futex(0x825f804, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
8938 rt_sigaction(SIGALRM, NULL, {0x80524e0, ~[KILL STOP RTMIN RT_1], 0}, 8) =
0
8938 write(8, "\0", 1) = 1
8938 write(17, "\1\0\0\0\0\0\0\0", 8) = 8
8938 read(18, 0xbfc53adc, 128) = -1 EAGAIN (Resource temporarily unavailable)
8938 gettimeofday({1270498070, 605367}, NULL) = 0
8938 clock_gettime(CLOCK_MONOTONIC, {351575, 67115276}) = 0
8938 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
8938 timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 250000}}, NULL) = 0
8938 clock_gettime(CLOCK_MONOTONIC, {351575, 67249930}) = 0
8938 clock_gettime(CLOCK_MONOTONIC, {351575, 67289879}) = 0
8938 clock_gettime(CLOCK_MONOTONIC, {351575, 67329828}) = 0
8938 gettimeofday({1270498070, 605665}, NULL) = 0
8938 clock_gettime(CLOCK_MONOTONIC, {351575, 67409796}) = 0
8938 timer_gettime(0, {it_interval={0, 0}, it_value={0, 13378}}) = 0
8938 gettimeofday({1270498070, 605791}, NULL) = 0
8938 futex(0x825f804, FUTEX_WAKE_PRIVATE, 1) = 1
8949 <... futex resumed> ) = 0
8938 select(67, [7 10 14 15 16 18 19 66], [], [], {1, 0} <unfinished ...>
8949 futex(0x825f804, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
8938 <... select resumed> ) = 3 (in [7 16 18], left {0, 999996})
8949 <... futex resumed> ) = 0
8938 read(18, <unfinished ...>
8949 ioctl(13, 0xae80 <unfinished ...>
8938 <... read resumed> "\16\0\0\0\0\0\0\0\376\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 128) = 128
8949 <... ioctl resumed> , 0) = 0
8938 rt_sigaction(SIGALRM, NULL, <unfinished ...>
8949 futex(0x825f804, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
8938 <... rt_sigaction resumed> {0x80524e0, ~[KILL STOP RTMIN RT_1], 0}, 8) = 0
8938 write(8, "\0", 1) = 1
8938 write(17, "\1\0\0\0\0\0\0\0", 8) = 8
8938 read(18, 0xbfc53adc, 128) = -1 EAGAIN (Resource temporarily unavailable)
8938 read(16, "\2\0\0\0\0\0\0\0", 4096) = 8
8938 read(16, 0xbfc52b6c, 4096) = -1 EAGAIN (Resource temporarily unavailable)
8938 read(7, "\0\0", 512) = 2
8938 read(7, 0xbfc5396c, 512) = -1 EAGAIN (Resource temporarily unavailable)
8938 gettimeofday({1270498070, 606443}, NULL) = 0
8938 clock_gettime(CLOCK_MONOTONIC, {351575, 68189924}) = 0
8938 timer_gettime(0, {it_interval={0, 0}, it_value={0, 0}}) = 0
8938 timer_settime(0, 0, {it_interval={0, 0}, it_value={0, 29000000}}, NULL) = 0
8938 clock_gettime(CLOCK_MONOTONIC, {351575, 68325486}) = 0
8938 clock_gettime(CLOCK_MONOTONIC, {351575, 68367321}) = 0
8938 gettimeofday({1270498070, 606704}, NULL) = 0
8938 futex(0x825f804, FUTEX_WAKE_PRIVATE, 1) = 1
8949 <... futex resumed> ) = 0
8938 select(67, [7 10 14 15 16 18 19 66], [], [], {1, 0} <unfinished ...>
8949 futex(0x825f804, FUTEX_WAKE_PRIVATE, 1) = 0
8949 ioctl(13, 0xae80, 0) = 0
8949 ioctl(13, 0xae80, 0) = 0
8949 ioctl(13, 0xae80, 0) = 0
8949 ioctl(13, 0xae80, 0) = 0
8949 ioctl(13, 0xae80, 0) = 0

Those file descriptors are:

# ls -l /proc/8938/fd
total 0
lr-x------ 1 root root 64 2010-04-05 15:08 0 -> /tmp/52.state
l-wx------ 1 root root 64 2010-04-05 15:08 1 -> /var/log/libvirt/qemu/one-52.log
lrwx------ 1 root root 64 2010-04-05 15:08 10 -> anon_inode:[signalfd]
lrwx------ 1 root root 64 2010-04-05 15:08 11 -> /root/52/disk.1
lrwx------ 1 root root 64 2010-04-05 15:08 12 -> /root/52/disk.2
lrwx------ 1 root root 64 2010-04-05 15:08 13 -> anon_inode:kvm-vcpu
lrwx------ 1 root root 64 2010-04-05 15:08 14 -> socket:[3817853]
lrwx------ 1 root root 64 2010-04-05 15:08 15 -> socket:[3817861]
lrwx------ 1 root root 64 2010-04-05 15:08 16 -> anon_inode:[eventfd]
lrwx------ 1 root root 64 2010-04-05 15:08 17 -> anon_inode:[eventfd]
lrwx------ 1 root root 64 2010-04-05 15:08 18 -> anon_inode:[signalfd]
lrwx------ 1 root root 64 2010-04-05 15:08 19 -> socket:[3818303]
l-wx------ 1 root root 64 2010-04-05 15:08 2 -> /var/log/libvirt/qemu/one-52.log
lrwx------ 1 root root 64 2010-04-05 15:08 3 -> socket:[3817846]
lrwx------ 1 root root 64 2010-04-05 15:08 4 -> /dev/ptmx
lrwx------ 1 root root 64 2010-04-05 15:08 5 -> /dev/kvm
lrwx------ 1 root root 64 2010-04-05 15:08 6 -> anon_inode:kvm-vm
lrwx------ 1 root root 64 2010-04-05 15:08 66 -> /dev/net/tun
lr-x------ 1 root root 64 2010-04-05 15:08 7 -> pipe:[3817851]
l-wx------ 1 root root 64 2010-04-05 15:08 8 -> pipe:[3817851]
lrwx------ 1 root root 64 2010-04-05 15:08 9 -> /root/52/disk.0

Looks very similar to this:

http://<email address hidden>/msg21669.html

Same results with libvirt 0.7.7

Revision history for this message
Kindjal (kindjal) wrote :
Revision history for this message
Kindjal (kindjal) wrote :

Converting to "virtio" alleviates the problem!

This config produces the problem:

<domain type='kvm'>
        <name>one-52</name>
        <memory>393216</memory>
        <os>
                <type>hvm</type>
                <boot dev='hd'/>
        </os>
        <clock offset="utc"/>
        <devices>
                <emulator>/usr/bin/kvm</emulator>
                <disk type='file' device='disk'>
                        <source file='/root/52/disk.0'/>
                        <target dev='sda'/>
                </disk>
                <disk type='file' device='disk'>
                        <source file='/root/52/disk.1'/>
                        <target dev='sdb'/>
                </disk>
                <disk type='file' device='cdrom'>
                        <source file='/root/52/disk.2'/>
                        <target dev='sdc'/>
                        <readonly/>
                </disk>
                <interface type='bridge'>
                        <source bridge='br0'/>
                        <mac address='00:03:0a:00:18:c8'/>
                </interface>
                <graphics type='vnc' port='5904'/>
        </devices>
        <features>
                <acpi/>
                <pae/>
        </features>
        <devices>
        <serial type="pty">
          <target port="0"/>
        </serial>
        </devices>
</domain>

This config properly resumes from saved state:

<domain type='kvm'>
        <name>one-52</name>
        <memory>393216</memory>
        <os>
                <type>hvm</type>
                <boot dev='hd'/>
        </os>
        <clock offset="utc"/>
        <devices>
                <emulator>/usr/bin/kvm</emulator>
                <disk type='file' device='disk'>
                        <source file='/root/52/disk.0'/>
                        <target dev='vda' bus='virtio'/>
                </disk>
                <disk type='file' device='disk'>
                        <source file='/root/52/disk.1'/>
                        <target dev='vdb' bus='virtio'/>
                </disk>
                <disk type='file' device='cdrom'>
                        <source file='/root/52/disk.2'/>
                        <target dev='vdc' bus='virtio'/>
                        <readonly/>
                </disk>
                <interface type='bridge'>
                        <source bridge='br0'/>
                        <mac address='00:03:0a:00:18:c8'/>
                        <model type='virtio'/>
                </interface>
                <graphics type='vnc' port='5904'/>
        </devices>
        <features>
                <acpi/>
                <pae/>
        </features>
        <devices>
        <serial type="pty">
          <target port="0"/>
        </serial>
        </devices>
</domain>

Revision history for this message
Mathias Gug (mathiaz) wrote : Re: Restoring KVM guest from saved state results in hung guest with non-virtio devices

Could you try to identify whether it's the network or the block device that makes the guest fail to resume correctly?

summary: - Restoring KVM guest from saved state results in hung guest
+ Restoring KVM guest from saved state results in hung guest with non-
+ virtio devices
Changed in libvirt (Ubuntu):
status: New → Incomplete
importance: Undecided → Medium
importance: Medium → Low
Revision history for this message
Kindjal (kindjal) wrote :

virtio disk but non-virtio network fails to resume

non-virtio disk and virtio network resumes properly

Revision history for this message
Florian Kruse (florian-kruse) wrote :

This bug affects me as well, BUT:

My VMs (also build with vm-builder) occasionally cannot be restored, although I use virtio for disk and network interfaces. However, I cannot reproduce the effect. Sometimes, it happens in around 1 of 10 attempts to restore the VM, sometimes I can save/restore the machine a hundred of times without any problems. Currently, I have no clue how to find out what causes the effect.

Revision history for this message
BenLake (me-benlake) wrote :

I had this exact same problem. How do you edit the configuration when it is overwritten by the (save) file? When I edit the config, then perform a restore <somefile>, the configuration seems to be replaced as my changes disappear.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@BenLake

Do you still have this problem. Which release are you on?

I don't know of a virsh option other than '--xml' which isn't quite what you want. But you can edit the configuration after the fact with 'virsh edit <domainname>'. So you could do 'virsh dumpxml domainname > domainname.saved.xml', do the restore, and then 'virsh edit domainname" and copy the portions you want back in from the file domainname.saved.xml.

Hope that made sense.

Changed in libvirt (Ubuntu):
status: Incomplete → New
status: New → Incomplete
Revision history for this message
BenLake (me-benlake) wrote :

Ubuntu 10.04.3 LTS

Yes I still have this problem, and I have a machine I can do testing on if you have anything for me to do. There were a few updates that just came in (listed below), and I tried saving all running VMs before the updates (hit the 1MB/s save issue) then the restore failed. So needless to say that didn't go well :/

lucid-updates/main x11-common 1:7.5+5ubuntu1.1
lucid-updates/main qemu-common 0.12.3+noroms-0ubuntu9.17
lucid-updates/main qemu-kvm 0.12.3+noroms-0ubuntu9.17
lucid-updates/main kvm 1:84+dfsg-0ubuntu16+0.12.3+noroms+0ubuntu9.17

I did dump the xml to a file then edit the running config via virsh. But issuing a restore after that wipes the config. I understand what you are saying about editing after doing the restore, but then how to you get the vm reset to use that config while it is "running" and churning at 100% CPU? I assume a destroy ruins the point of the restore, no?

Changed in libvirt (Ubuntu):
status: Incomplete → Confirmed
summary: Restoring KVM guest from saved state results in hung guest with non-
- virtio devices
+ virtio devices - in lucid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.