KVM - reboot VM problem (with 3+ days uptime)(with virtio drivers?)

Bug #924247 reported by Markus Breitegger
34
This bug affects 5 people
Affects Status Importance Assigned to Milestone
qemu-kvm (Ubuntu)
Fix Released
Medium
Unassigned
Natty
Won't Fix
Medium
Unassigned
Oneiric
Won't Fix
Medium
Unassigned

Bug Description

Hi

I've seen the problem of this forum user in ubuntu 11.04 and 11.10 on the host side.
http://ubuntuforums.org/showthread.php?t=1870112

When rebooting a kvm guest with
"reboot" the guest hangs somewhere.
only way to get it up again is to kill the kvm process and start it with "virsh start guest"
the acpid package is installed on the guest
"virsh shutdown guest" is working without any errors.

I use on the guest
Linux ns2 3.0.0-15-server #26-Ubuntu SMP Fri Jan 20 19:07:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
root@ns2:~# lsb_release -rd
Description: Ubuntu 11.10
Release: 11.10

I use on the host
Linux fs 3.0.0-15-server #26-Ubuntu SMP Fri Jan 20 19:07:39 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
root@fs:~# lsb_release -rd
Description: Ubuntu 11.10
Release: 11.10

I've never seen the problem on ubuntu 10.10 host with any guestOS.

The special of this problem is that it's only happening when the guest is running for some time.
So you won't see the problem if you start the guest and reboot it following.

I only use the packages of the current ubuntu version on the host.

root@fs:~# dpkg -l | grep libvirt
ii libvirt-bin 0.9.2-4ubuntu15.1 the programs for the libvirt library
ii libvirt0 0.9.2-4ubuntu15.1 library for interfacing with different virtualization systems
ii munin-libvirt-plugins 0.0.6-1 Munin plugins using libvirt
ii python-libvirt 0.9.2-4ubuntu15.1 libvirt Python bindings
root@fs:~# dpkg -l | grep kvm
ii qemu-kvm 0.14.1+noroms-0ubuntu6.2 Full virtualization on i386 and amd64 hardware
ii qemu-kvm-extras 0.15.50-2011.08-0ubuntu4 QEMU system and user mode emulation (transitional package)

here my xml config of the guest

<domain type='kvm'>
  <name>ns2</name>
  <uuid>e0943e14-b124-3bc1-b4b2-97656b969c62</uuid>
  <memory>512000</memory>
  <currentMemory>512000</currentMemory>
  <vcpu>2</vcpu>
  <os>
    <type arch='x86_64' machine='pc-0.14'>hvm</type>
    <boot dev='hd'/>
    <boot dev='cdrom'/>
    <bootmenu enable='yes'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <pae/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <emulator>/usr/bin/kvm</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/var/vms/iso/ubuntu-11.10-server-amd64.iso'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='1' unit='0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/vms/ns2/hd.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:e9:a2:07'/>
      <source bridge='br1'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target port='0'/>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <graphics type='vnc' port='5902' autoport='no' keymap='de' passwd='password'/>
    <video>
      <model type='cirrus' vram='9216' heads='1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </memballoon>
  </devices>
</domain>

Kind regards,

Revision history for this message
Markus Breitegger (markus-paranoids) wrote :
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for taking the time to report this bug. I'll try to reproduce when I can.

Is your guest a simple uptodate oneiric server image?

Changed in libvirt (Ubuntu):
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Markus Breitegger (markus-paranoids) wrote :

thank you for taking the time to process my request.

I've seen the problem with any ubuntu guest starting vom 10.04. any other versions I've not in use with ubuntu 11.10 or 11.04 as host.
yes my testimage is a simple ubuntu oneiric server.
I've installaed the guest from the offical ubuntu-11.10-server-amd64.iso.

please let me know if you need any further information or if I could help you testing some things.

since i rebootet my oneiric guest it has got this uptime
reboot worked.

root@ns2:~# uptime
 19:22:08 up 7:29, 1 user, load average: 0.08, 0.03, 0.05
root@ns2:~# reboot ; exit

with my other oneiric guest
reboot not worked
root@diaspora:~# uptime
 19:24:22 up 7 days, 21:49, 1 user, load average: 0.00, 0.01, 0.05
root@diaspora:~# reboot ; exit

Revision history for this message
Markus Breitegger (markus-paranoids) wrote :

when I'm connected to vnc while rebooting
it reboots normally until bios

after bios there will be no grub or linux boot

reboot failed
root@csrelay:~# uptime
 19:34:25 up 4 days, 23:44, 2 users, load average: 0.00, 0.00, 0.00
root@csrelay:~# reboot ; exit

"virt-top" ist not working
"virsh destroy guest" is functionless too
until i kill the guest process.
vnc gets blank.

sounds like the virtual disk gets lost in bios anyhow or something.
in /var/log/libvirt/qemu/guest.log /var/log/syslog there is nothing interessting. only the normal stuff.

I know this because somedays there was a change in libvirt of the virtual disk format.
And I forgot at rebooting the guest to change this everytime :-)
But I think this has nothing to do with that issue.

Changed in libvirt (Ubuntu):
status: Incomplete → New
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Tried to reproduce this with a 11.10 server vm hosting anothe 11.10 server vm - the nested guest rebooted fine .

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

(Trying with a few more differences between our configurations: qcow2 backing store; virtio vda; cpu=2; network=bridge)

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I simply cannot reproduce this. If the bug is in kvm itself, then we can try some introspection with the debugger.

To verify whether a simple kvm can reproduce this in your environment, could
you please shut down the VM in libvirt, and then run it with:

kvm -drive file=/var/vms/ns2/hd.qcow2,if=virtio -m 512 -smp 2 -vnc :1

Log into the guest through vnc, and try to reboot, make sure it fails the
same way.

Changed in libvirt (Ubuntu):
status: New → Incomplete
Revision history for this message
Markus Breitegger (markus-paranoids) wrote :

How much uptime had your test guest's got?
My guest shows this problem only after a couple of time. 4-7days uptime!
When I boot my guest and reboot it afterwords it's not showing the problem. It needs some time. This sounds also like a cronjob on the host or on the guest which is doing some nasty things.

I will try to verify the problem with an strace on the host and will try to start the same guest with kvm only without libvirt.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

(Marking this Confirmed under the assumption that bug 921609 is a duplicate)

Changed in libvirt (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks Markus - you're right, I didn't keep the guests up long enough.

Revision history for this message
dyna (ubuntu-dyna) wrote :

Guess i was wrong about the virtio_net part. The VM i changed to e1000 just hung at reboot.

Now it hung right before:

[ 0.664434] virtio-pci 0000:00:04.0: irq 40 for MSI/MSI-X
[ 0.664467] virtio-pci 0000:00:04.0: irq 41 for MSI/MSI-X

I'll try some combinations of virtio drivers on different vm's to see if i can narrow it down to a certain driver.

Revision history for this message
Markus Breitegger (markus-paranoids) wrote :

I took a screenshot of the guest which hung at reboot (in attachment)
My tests with strace were not successful
I will try to start the guest with serge's cli

kvm -drive file=/var/vms/ns2/hd.qcow2,if=virtio -m 512 -smp 2 -vnc :1

Revision history for this message
Markus Breitegger (markus-paranoids) wrote :

I tested with the cli of Serge

kvm -drive file=/var/vms/ns2/hd.qcow2,if=virtio -m 512 -smp 2 -vnc :1 -net tap

without libvirt the guest reboots normally

uptime
 17:21:04 up 3 days, 5:51, 1 user, load average: 0.19, 0.22, 0.39
root@ns2:~# reboot ; exit

Revision history for this message
Tomokazu Hirai (tomokazu-hirai) wrote :

I tried to test about this problem.

hung up at these lines...

  --
  input: AT Translated Set 2 Keyboard as /class/input/input0

  --
  ide0: BM-DMA at 0xc008-0xc00f, BIOS settings : hdc:pio, hdd:pio

  --
  uhci_hcd 0000:00:01.2: irq 11, io base 0x0000c020

every time I try, stop at different line.

My Environments are

  * virtio CentOS VM on Natty
  * non-virtio CentOS VM on Natty
  * virtio CentOS VM on Lucid/Natty (Lucid based and Natty qemu-kvm/libvirt/linux-kernel)

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

If you have a chance to try to reproduce this with upstream qemu, we could get them involved in debugging this.

(Upstream qemu means:

    git clone git://git.qemu.org/qemu.git
    apt-get build-dep qemu-kvm
    ./configure --target-list="x86_64-softmmu i386-softmmu x86_64-linux-user i386-linux-user"
    make
    ./x86_64-softmmu/qemu-system-x86_64 -hda x.img <...>

Although, this reminds me, due to a known bug it may not build. Drat. If that
happens, please let me know and i'll raise that bug to critical priority. You
can work around it by applying the following patch:

diff --git a/hw/9pfs/virtio-9p-handle.c b/hw/9pfs/virtio-9p-handle.c
index f96d17a..38f45b1 100644
--- a/hw/9pfs/virtio-9p-handle.c
+++ b/hw/9pfs/virtio-9p-handle.c
@@ -22,6 +22,16 @@
 #include "qemu-xattr.h"
 #include <unistd.h>
 #include <linux/fs.h>
+#define AT_FDCWD -100 /* Special value used to indicate
+ openat should use the current
+ working directory. */
+#define AT_SYMLINK_NOFOLLOW 0x100 /* Do not follow symbolic links. */
+#define AT_REMOVEDIR 0x200 /* Remove directory instead of
+ unlinking file. */
+#define AT_SYMLINK_FOLLOW 0x400 /* Follow symbolic links. */
+#define AT_NO_AUTOMOUNT 0x800 /* Suppress terminal automount traversal */
+#define AT_EMPTY_PATH 0x1000 /* Allow empty relative pathname */
+#include <fcntl.h>
 #ifdef CONFIG_LINUX_MAGIC_H
 #include <linux/magic.h>
 #endif

Revision history for this message
Tomokazu Hirai (tomokazu-hirai) wrote :

I tried to reboot VM with this patched qemu.

  uptime : 3days
  reboot : 2012/02/20 09:42:50 - can not boot
  stop at : ACPI: core revision 20060707

3 days before, I did this operation :

% virsh destroy ${VM}
% git clone git://git.qemu.org/gemu.git
% apt-get build-dep qemu-kvm
% cd qemu
% ./configure --target-list="x86_64-softmmu i386-softmmu x86_64-linux-user i386-linux-user" --prefix=/usr
% make
% sudo make install
% virsh start ${VM}

${VM} did not boot and stoped at this line.

  ACPI: core revision 20060707

Should we use unpached qemu from git ? is this patch for work arround ?
I use this patch for this test, but ${VM} did not boot.

and I could not boot ${VM} without libvirt.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Tomokazu - the patch was only to make upstream qemu compile. It doesn't change its behavior.

If libvirt must be used then perhaps we need to more closely reproduce the configure options used in the package. In which case it might be easier to create a daily build.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@tomokazu: I've queued up a qemu package based on the latest qemu-kvm git tree. Once it's built, you can find it at ppa:serge-hallyn/virt. If you need more help please let me know. Thanks for testing!

(the version # will be 1.0+noroms-20120220-0ubuntu1)

Revision history for this message
Tomokazu Hirai (tomokazu-hirai) wrote :

@serge:

Now, I started to debug with your qemu-kvm (1.0+noroms-20120220-0ubuntu1).
3 days later, I will reboot VM on this host.

I built this environment this way.

  * clean install 12.04 precise from alpha iso
  * add your repository (ppa:serge-hallyn/virt)
  * install qemu-kvm libvirt-bin virtinst bridge-utils ebtables iptables
  * define xml which I tested before
  * install centos5.6 on VM
  * start VM

But I must to use Natty (linux-kernel, qemu-kvm, libvirt-bin packages).
Do you have workarround patch for Natty based qemu-kvm ?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Tomokazu

I assume you mean you must use natty in production, and you are asking whether the fix in upstream will be back-portable?

At this point I don't even know whether upstream will fix it. If it does, then I'll try to find the patch which fixed it. If not, then we can report the bug upstream and, when fixed there, backport the fix. I suspect the fix is hard to find but small and easy to backport.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Tomokazu

Thanks very much for testing again.

Revision history for this message
Tomokazu Hirai (tomokazu-hirai) wrote :

@serge

I rebooted test VM with your qemu-kvm (1.0+noroms-20120220-0ubuntu1).

  uptime : 3days + 21:26

This VM booted without any problem.

So is problem fixed with daily build ? Can you find the patch with Natty based qemu-kvm .deb ?

Thank you for helping us. :D

Revision history for this message
Tomokazu Hirai (tomokazu-hirai) wrote :

sorry, this vm booted at :

   reboot : 2012/02/25 09:14:00 - 2012/02/25 09:15:15

regards,

-- Tomokazu Bob Hirai

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

comment #22 confirms this is a qemu bug.

affects: libvirt (Ubuntu) → qemu-kvm (Ubuntu)
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Tomokazu:

To recap, the oneiric version of kvm on an oneiric host fails to reboot, and the upstream qemu on a *precise* host reboots fine? You have not tried the precise qemu on a precise host, right? Looking at qemu changes, I suspect that this should be fixed in precise.

At first I thought seabios 0f67397 could be involved, but that's applied in oneiric (but not in natty).

So my guesses are upstream qemu commits:

qemu upstream 47113ab6b8c5659ad94c69aacca572f731ebb0ac
qemu upstream cd19cfa23609dc1a35dd34f0b7554a8462337fde (plus 3c85e74fbf9e5a39d8d13ef91a5f3dd91f0bc8a8)

I will push a natty and oneiric package with those fixes to ppa:serge-hallyn/virt. Please give them a few hours to build.

Revision history for this message
Tomokazu Hirai (tomokazu-hirai) wrote :

@Serge:

Yes, I have not tried the precise qemu on a precise host.

> I will push a natty and oneiric package with those fixes to ppa:serge-hallyn/
> virt. Please give them a few hours to build.

thanks. :D I will test your qemu.

Revision history for this message
Tomokazu Hirai (tomokazu-hirai) wrote :

@Serge

I dit test with your qemu packages at "https://launchpad.net/~serge-hallyn/+archive/virt?field.series_filter=natty"

But this problem was found ..

  uptime : 2 days + 23:53
  reboot : 2012/03/19 11:12:10 - 11:17:18 ( 5 minutes )
  stop at: input: ImExPS/2 Generic Explorer Mouse as /class/input/input1

My environment is Lucid based with Natty Package (qemu-kvm, libvirt, seabios, virtinst)

ii qemu-kvm 0.14.0+noroms-0ubuntu4.6~reboot1
ii seabios 0.6.1.2-0ubuntu1
ii libvirt-bin 0.8.8-1ubuntu6.6
ii virtinst 0.500.5-1ubuntu5

seabios on your site was built at 2011-02-14. is this ok ?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for testing. THe seabios version you cite (0.6.1.2-0ubuntu1) is the current natty one so that's good.

Revision history for this message
Tomokazu Hirai (tomokazu-hirai) wrote :

@Serge

Do you have any idea for this problem ?

We could not workarround this problem.

and..

--- natty's current Package.gz (2012/03/22) ---
Package: seabios
Priority: optional
Section: misc
Installed-Size: 180
Maintainer: Dustin Kirkland <email address hidden>
Architecture: all
Version: 0.6.1.2-0ubuntu1
Filename: pool/main/s/seabios/seabios_0.6.1.2-0ubuntu1_all.deb
Size: 64898

I tested with seabios_0.6.1.2-0ubuntu1_all.deb that is used our service since last fall.
But maybe this is current one on natty. should we upgrade this from your repository or
from natty's repository ? if i need upgrade seabios, please tell me which is better.

Natty's current seabios is looks like same as your repository.
version of seabios is 0.6.1.2-0ubuntu1.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

At this point, to help find the bug which fixed this, it might help to reproduce this with debugging symbols installed, then attach to the hung kvm task with gdb and get a backtrace:

1. install the stock qemu-kvm for your release
2. install gdb (sudo apt-get install gdb)
3. add qemu-kvm-dbgsym by following the instructions at https://wiki.ubuntu.com/DebuggingProgramCrash
4. start qemu-kvm, let it run, and reboot to make it hang
5. get the pid for the hung kvm, call it $pid
6. connect gdb to the process (gdb /usr/bin/qemu-system-x86_64 -p $pid), and get the backtrace by typing 'where'.

Revision history for this message
Tomokazu Hirai (tomokazu-hirai) wrote :
Download full text (13.1 KiB)

@Serge

I took gdb debug log.

VM was stopped at this line :
  uhci_hcd 0000:00:01.2: new USB bus registered, assigned bus number 1 0.00, 0.00, 0.00

and VM was started at 7 minutes later

regards

---
GNU gdb (GDB) 7.1-ubuntu
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/bin/qemu-system-x86_64...(no debugging symbols found)...done.
Attaching to program: /usr/bin/qemu-system-x86_64, process 2373
Error while mapping shared library sections:
/lib/librt.so.1: No such file or directory.
Error while mapping shared library sections:
/lib/libpthread.so.0: No such file or directory.
Error while mapping shared library sections:
/lib/libutil.so.1: No such file or directory.
Error while mapping shared library sections:
/lib/libm.so.6: No such file or directory.
Error while mapping shared library sections:
/lib/libc.so.6: No such file or directory.

warning: .dynamic section for "/lib64/ld-linux-x86-64.so.2" is not at the expected address (wrong library or version mismatch?)
Error while mapping shared library sections:
/lib/libdl.so.2: No such file or directory.
Error while mapping shared library sections:
/lib/libresolv.so.2: No such file or directory.
Error while mapping shared library sections:
/lib/libnsl.so.1: No such file or directory.
Error while mapping shared library sections:
/lib/libnss_compat.so.2: No such file or directory.
Error while mapping shared library sections:
/lib/libnss_nis.so.2: No such file or directory.
Error while mapping shared library sections:
/lib/libnss_files.so.2: No such file or directory.
Error while mapping shared library sections:
/lib/libcrypt.so.1: No such file or directory.
Symbol file not found for /lib/librt.so.1
Symbol file not found for /lib/libpthread.so.0
Symbol file not found for /lib/libutil.so.1
Reading symbols from /usr/lib/libcurl-gnutls.so.4...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libcurl-gnutls.so.4
Reading symbols from /lib/libncurses.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/libncurses.so.5
Reading symbols from /usr/lib/libasound.so.2...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libasound.so.2
Reading symbols from /usr/lib/libpulse.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libpulse.so.0
Reading symbols from /usr/lib/libpulse-simple.so.0...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libpulse-simple.so.0
Reading symbols from /lib/libuuid.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libuuid.so.1
Reading symbols from /lib/libpng12.so.0...(no debugging symbols found)...done.
Loaded symbols for /lib/libpng12.so.0
Reading symbols from /usr/lib/libjpeg.so.62...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/libjpeg.so.62
Reading symb...

Revision history for this message
Markus Breitegger (markus-paranoids) wrote :

I've upgraded my server to 12.04 LTS
the problem doesn't occour in ubuntu 12.04 LTS

Have fun!

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks, Markus. Marked fix released for 12.04, and confirmed for 11.04 and 11.10.

Changed in qemu-kvm (Ubuntu):
status: Confirmed → Fix Released
Changed in qemu-kvm (Ubuntu Natty):
status: New → Confirmed
Changed in qemu-kvm (Ubuntu Oneiric):
status: New → Confirmed
summary: - KVM - reboot VM problem
+ KVM - reboot VM problem (with virtio drivers?)
tags: added: needs-cherrypick
Changed in qemu-kvm (Ubuntu Natty):
importance: Undecided → Medium
Changed in qemu-kvm (Ubuntu Oneiric):
importance: Undecided → Medium
summary: - KVM - reboot VM problem (with virtio drivers?)
+ KVM - reboot VM problem (with 3+ days uptime)(with virtio drivers?)
Revision history for this message
Rolf Leggewie (r0lf) wrote :

natty has seen the end of its life and is no longer receiving any updates. Marking the natty task for this ticket as "Won't Fix".

Changed in qemu-kvm (Ubuntu Natty):
status: Confirmed → Won't Fix
Revision history for this message
Rolf Leggewie (r0lf) wrote :

oneiric has seen the end of its life and is no longer receiving any updates. Marking the oneiric task for this ticket as "Won't Fix".

Changed in qemu-kvm (Ubuntu Oneiric):
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.