CentOS

qemu-kvm : Ubuntu 12.04 (host) / Centos 6.3 (guest) rebooting from guest gets stuck in a seabios/grub loop - cannot initialise kernel (10.04 is fine)

Bug #1025188 reported by yossarian_uk on 2012-07-16

This bug affects 2 people

	Status	Importance	Assigned to
libvirt	New	Undecided	Unassigned
qemu-kvm	New	Undecided	Unassigned
CentOS	New	Undecided	Unassigned

Bug Description

I have a really annoying bug, I can reproduce often (although it is a bit random).

I have an Ubuntu 12.04 KVM server , using Centos 6 guests - when I install the latest kernel for centos 6.3 -2.6.32-279.1.1.el6 - if you reboot from inside a Centos6 vm it gets stuck in a loop between seabios/grub - this does't happen 100% of the time - there is a high chance it will though - usually after 3 reboots it will get stuck in the loop - it will NEVER reboot without manual intervention (virsh destroy..)

It seem to fail at the kernel initialise stage - if I use vga=normal I can see the words 'Probing EEID...' for a sec (then it reboots)

If i use virsh/virt-manager to reboot its fine, only from inside a centos6 vm (with latest centos kernel) does this occur

As a test I installed centos 6.2 - this was 100% fine *until* I did a yum update then I got the same issue.

Ubuntu 10.04 KVM host / Centos 6.3 (guest) is fine - so i'm unsure where the fault is. Likewize a Centos 6.3 kvm host and centos 6.3 guest is also fine...

I have installed a 2nd Ubuntu 12.04 KVM server and the exact same thing occurs (i.e 2 different servers = same issue)

How can I troubleshoot this ? I already have already enabled the boot options 'console=ttyS0' (which I can access vm's using virsh console id) however this gives no output (as it crashes when initialising the kernel)

I have also tried installing the latest qemu-kvm (1.1) from source on the Ubuntu 12.04 kvm server, the same thing occurs .

At present we are replacing Ubuntu as the KVM host (to Centos) whilst this bug remains.

Tags:

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-07-16:

Thanks for submitting this bug. If you have another host on which you can still reproduce this, then it would be great if you could:

1. sudo mv /usr/share/apport/package-hooks/source_libvirt-bin.py /usr/share/apport/package-hooks/source_libvirt.py
2. apport-collect 1025188

However since you've switched to a centos host for now, I assume I'll need to try to reproduce myself. I'll grab a centos iso to try.

Could you however give the contents of /proc/cpuinfo ?

affects:	kvm → qemu-kvm
no longer affects:	ubuntu

Revision history for this message

yossarian_uk (morgancoxuk) wrote on 2012-07-17:

cpuinfo.txt Edit (3.6 KiB, text/plain)

I still have the servers running.

I ran apport-collect 1025188 on the server and logged into my accont.

See the attachement for cpuinfo.

Shouldn't this potentially still count as an ubuntu bug? This only happens on Ubuntu 12.04 that I have seen.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-07-17:

The bug is marked as affecting qemu-kvm in ubuntu. A bug against 'Ubuntu' means something different ('a bug against the Ubuntu project itself').

Thanks for the cpu info. As you're on Intel I should be able to easily reproduce. Unfortunately I don't see the apport info. Did it get saved to a file which you could upload? If you don't see a file, you could try 'sudo apport-cli --save /tmp/libvirt.apport 1025188', then upload /tmp/libvirt.apport to this bug.

Revision history for this message

yossarian_uk (morgancoxuk) wrote on 2012-07-17: Re: [Bug 1025188] Re: qemu-kvm : Ubuntu 12.04 (host) / Centos 6.3 (guest) rebooting from guest gets stuck in a seabios/grub loop - cannot initialise kernel (10.04 is fine)

I get

---------------------
root@ubuntu:~# apport-cli --save /tmp/libvirt.apport 1025188

*** Error: Invalid PID

The specified process ID does not belong to a program.

Press any key to continue...
---------------------

Regards

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-07-17:

Sorry, could you try

apport-cli --save /tmp/libvirt.apport libvirt-bin

I've tried to reproduce this with centos 6.3 x86-64, but no luck so far.

Revision history for this message

yossarian_uk (morgancoxuk) wrote on 2012-07-17:

libvirt.apport Edit (16.5 KiB, text/plain)

apport attached !

I should mention that both Ubuntu 12.04 servers I have tested (and found the bug on) are the same type of CPU.

Also I usually notice it after rebooting from inside the guest via virt-manager then rebooting say from an ssh session (or virsh console, etc)

?field.comment=apport attached !

I should mention that both Ubuntu 12.04 servers I have tested (and found the bug on) are the same type of CPU.

Also I usually notice it after rebooting from inside the guest via virt-manager then rebooting say from an ssh session (or virsh console, etc)

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-07-17:

Your guests are also x86-64 right?

Sorry I'm not sure I understood right - are you saying that you first reboot the guest through the menus in virt-manager, then the next time by running 'reboot' logged into the guest through ssh? As I can't reproduce this yet, it's possible I just need to follow your precise steps more closely.

Revision history for this message

yossarian_uk (morgancoxuk) wrote on 2012-07-18:

video showing issue Edit (791.6 KiB, video/ogg)

There is no set pattern - just that essentially rebooting from a guest often gets caught in that loop.

Sometimes it happens on first reboot, sometimes after 5 - it will happen though.

I happens regardless of how I reboot from the guest...

I have attached a video showing the issue -> out.ogv

In this case I start the vm up - login (via virt-manager) and reboot (once) and it happens (lucky in terms of videoing it)

Could this be connected with the way I clone a template?

On Ubuntu 10.04 I used to use virt-clone - however in Ubuntu 12.04 I can non longer virt-clone into an existing LVM partition so I simply use virt-resize to clone the template (rather than virt-clone) - however I should point out that centos5 vm's are done in the same way and do not have the issue.

Revision history for this message

yossarian_uk (morgancoxuk) wrote on 2012-07-18:

btw - you can't see the boot up messages due to console=ttyS0 ...

p.s i have removed the console=ttyS0 line and the same thing happens.

Revision history for this message

yossarian_uk (morgancoxuk) wrote on 2012-07-19:

#10

I have a 'solution' - credits to Marcelo Tosatti on the KVM mailing list

----------------------------------------
Can you disable kvmclock? (by appending "no-kvmclock" to the end of the
"kernel" line of 2.6.32-279.1.1.el6 entry in /boot/grub/menu.lst).
----------------------------------------

i.e add no-kvmclock to the kernel boot line in grub.

I have rebooted over 20 X now.

Does this indicate the bug is centos kernel related?

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2012-07-19:

#11

Quoting yossarian_uk (<email address hidden>):
> I have a 'solution' - credits to Marcelo Tosatti on the KVM mailing list
>
> ----------------------------------------
> Can you disable kvmclock? (by appending "no-kvmclock" to the end of the
> "kernel" line of 2.6.32-279.1.1.el6 entry in /boot/grub/menu.lst).
> ----------------------------------------
>
> i.e add no-kvmclock to the kernel boot line in grub.
>
> I have rebooted over 20 X now.
>
> Does this indicate the bug is centos kernel related?

Sounds like it. If that is the problem, you may be able to turn off
cpu frequency scaling on the host to avoid having to make changes in
the guest.

Revision history for this message

Nathaniel W. Turner (nturner) wrote on 2013-10-20:

#12

I'm seeing the same symptoms described by the OP on an up to date Ubuntu 12.04 server install, but with a 2.6.32-358.6.1 based CentOS kernel. Adding "no-kvmclock" to the kernel command line does *not* work around it.

Should I infer from the lack of activity on this bug that Ubuntu server is not intended as a virtualization platform for operating systems other than Ubuntu, an no one else is even trying it, or are lots of people using Ubuntu to host CentOS guests with no problems? Serious question; I need to decide what platform I'm going to use for an upcoming project.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2013-10-22:

#13

Please list an install iso url that I can fetch to test this with (and, if I need to upgrade the kernel to have this happen, detailed instructions for how you are doing so). Have you tested this under raring or saucy's qemu?

Revision history for this message

Nathaniel W. Turner (nturner) wrote on 2013-11-08:

#14

screenshot of occasional panic on centos 6 VM hosted on precise Edit (32.5 KiB, image/png)

Hi Serge,

Thanks for following up. Sorry for not replying sooner. Here's a test case that reproduces this reliably on precise:

Create a new VM using these settings:
ISO: http://centos.corenetworks.net/6.4/isos/x86_64/CentOS-6.4-x86_64-minimal.iso
Virtual CPUs: 2 (or more)

After install, edit /etc/rc.local and add the following line to set up a reboot loop:
grep STOP /proc/cmdline || shutdown -r now test

Reboot the VM.

In my testing, it almost immediately gets into a loop where right after the GRUB countdown, it jumps back to the beginning of the boot sequence. After a few boots, it stops on either a kernel panic (see attached) or hangs (sometimes causing virt-manager to lock up) with messages like these (similar to bug 957957) in the syslog:

Nov 7 20:30:55 selma kernel: [855283.206924] kvm [27518]: vcpu0 unhandled rdmsr: 0xc0010001

A colleague pointed out today that reducing the virtual CPU allocation for the VM guest to 1 does appear to either make this far less frequent or go away entirely (haven't done enough testing to tell). Obviously that's not a long term solution, but I mention it in case it's a helpful workaround for others who find this bug report.

The good news is that on saucy, the same test case has not reproduced after 80+ reboots. (Unfortunately, my saucy VM host and precise VM host are not identical hardware. Hopefully that's not important.)

Revision history for this message

Nathaniel W. Turner (nturner) wrote on 2013-11-08:

#15

screenshot of occasional panic on centos 6 VM hosted on precise (variant 2) Edit (30.8 KiB, image/png)

Oops, ignore that specific bit about the "vcpu0 unhandled rdmsr" messages. Those were on the saucy host (i.e. the one where this bug was not observed). The rest is correct.

In case it's a clue, attached is a screenshot of a different panic seen during the same test run on the precise guest.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.