intrepid kernel 2.6.26-2-generic won't boot as kvm guest

Bug #243677 reported by Roland Dreier on 2008-06-28
40
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kvm (Ubuntu)
High
Soren Hansen
Hardy
Undecided
Unassigned
Intrepid
High
Soren Hansen
linux (Ubuntu)
Medium
Unassigned
Hardy
Undecided
Unassigned
Intrepid
Medium
Unassigned

Bug Description

Binary package hint: linux-source-2.6.26

I have a Hardy kvm virtual machine (amd64) created in virt-manager. I just updated to Intrepid Alpha1 (using update-manager), and the 2.6.26-2-generic kernel that gets installed won't boot... on some boots it hangs at the "waiting for root filesystem line" and some boots it drops into the busybox "(initramfs)" prompt, although when it does drop into busybox the VM seems to hang soon after that.

The Hardy 2.6.24-16 kernel that is left in the image does boot fine.

i tried the intrepid i386 alpha1 as kvm guest

it does not boot too!

it hangs right after starting "install" from the graphical boot menu.

choosing "check cdrom ..." does not work to.

the windows goes black, for one second, there is
a underscore cursur in the topleft. after this the
screen goes completly black and kvm goes to ~100% cpu (on the host)

I am also running Intrepid Ibex (64-bit) within a virtual machine in Hardy Heron (64-bit).

I am able to use kvm to boot the 2.6.24 kernel. However, the system hangs when attempting to boot 2.6.26-1 and 2.6.26-2. Both are generic kernels, no modification, with lrm and lum installed. The command I use to laund QEMU/KVM is as follows.

kvm -m 512 -soundhw all ~/Virtual\ Machines/intrepid.img &

I then downloaded the Alpha-1 alternate install CD for 64-bit. This unsuccessfully boots with the symptoms mentioned above.

However, both the cd image and the existing virtual machine WILL boot with the following command.

qemu-system-x86_64 -m 512 -soundhw all -cdrom ~/path/to/iso.iso -boot d ~/Virtual\ Machines/intrepid.img

Therefore, this could be a kvm hiccup between runnng a 2.6.24 kernel as host and 2.6.26 kernel in the VM. Again, using qemu instead of kvm allows the 2.6.26 kernel to boot, however no hardware acceleration is being used.

Thierry Carrez (ttx) wrote :

Confirmed.
2.6.26-3.9 doesn't solve the problem, which is apparently related to the hardy kernel KVM.

Changed in linux:
importance: Undecided → Medium
status: New → Confirmed

I don't think it is only a hardy kernel kvm bug only.

A french blogger tested alpha1 in virtual box and had to disable acpi to make it work a little.

See this article - in french, sorry - http://www.cedynamix.fr/2008/06/30/ubuntu-intrepid-ibex-alpha-1-essai/

Bordiga Giacomo (gbordiga) wrote :

I tried many times booting the intrepid alternate cd and it seems that the virtual machine just hangs after some time. Usually removing quiet i can see the first kernel messages but the VM hangs just after that during screen resolution changes. One time i managed to see the first installation dialog, language select, and also navigate for a second before the freeze.

Roland Dreier (roland.dreier) wrote :

Actually for me the Intrepid 2.6.26-3.9 kernel does boot inside a Hardy kvm VM (with a Hardy 2.6.24-19/kvm 1:62+dfsg-0ubuntu7 host), while 2.6.26-2 still hangs.

However there is now another problem, which is that the guest is pretty much unusable due to time running much too fast: I see several minutes tick by on the guest's clock (eg on the gdm login screen) for every second in the real world (ie timekeeping is off by a factor of the order of magnitude of 100 times). This breaks autorepeat etc severely and makes it hard to do anything in the guest.

Roland Dreier (roland.dreier) wrote :

For the record -- my time problems booting with the new kernel seem to be coming from the paravirtual clocksource (KVM_CLOCK) introduced in 2.6.26. Booting with "no-kvmclock" on the kernel command line makes things work OK. The next question to answer is whether this is an issue with the Intrepid kernel or the Hardy kvm...

Soren Hansen (soren) wrote :

This is not a bug in our 2.6.26 kernels, but rather in 2.6.24 on the host side. For now, we'll just disable the codepaths in kvm.

Changed in linux:
status: Confirmed → Invalid
Changed in kvm:
importance: Undecided → High
status: New → In Progress
Roland Dreier (roland.dreier) wrote :

Out of curiousity, is the bug really in Hardy's 2.6.24 kernel, or in the version of kvm 62 shipped with Hardy? As I understand things, the host side of paravirt clock support is entirely in the userspace kvm code.

mrq1 (kubuntu-bugreporter) wrote :

> is the bug really in Hardy's 2.6.24 kernel
i dont think so. i tried again with 2.6.26-rc8 as host-kernel,
and it does not work.

but if i use "no-kvmclock", intrepid boots fine :-)

Bordiga Giacomo (gbordiga) wrote :

Same here with no-kvmclock everything works fine, but without it it locks up, even with kvm 1:69+dfsg-1ubuntu1

Roland Dreier (roland.dreier) wrote :

I just tried booting various recent Intrepid amd64 CD images under kvm 70+dfg-1 (the latest Debian package) on a host running a very recent self-built 2.6.26-rc9 kernel, and I also had the same problem that the guest locked up early in boot unless I passed in the "no-kvmclock" kernel option.

So it's still not clear to me where the bug is. Surely someone has tested a working kvmclock paravirt setup with some combination of kvm host and Linux guest?

Roland Dreier (roland.dreier) wrote :

I wonder if this problem comes from the fact that the intrepid kernel doesn't seem to have upstream commit ca373932 ("x86: KVM guest: Add memory clobber to hypercalls"), which went in just after 2.6.26-rc9. The patch description says:

    Hypercalls can modify arbitrary regions of memory. Make sure to indicate this
    in the clobber list. This fixes a hang when using KVM_GUEST kernel built with
    GCC 4.3.0.

Roland Dreier (roland.dreier) wrote :

Just as a test, I built the latest upstream kernel (post-2.6.26-rc9 git, including the commit I mentioned above) in a Fedora 9 image I happen to have, and it booted fine on the same kvm 70/2.6.26-rc9 host system that the Intrepid CD image failed on. And I verified:

$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
kvm-clock

so I do believe this actually *is* a bug in the current Intrepid kernel. I hope when the next kernel image is spun, with the latest 2.6.26 upstream fixes rolled in, that things will work again.

Roland Dreier (roland.dreier) wrote :

So I just built a kernel from the ubuntu-intrepid git tree (as of commit e5cb6d2d) with just upstream commit ca373932 (the hypercall fix) patched in by hand, and tried that kernel in my intrepid VM running on hardy kvm 62/2.6.24. Unfortunately that kernel fails in the same way without no-kvmclock, so my guess about the issue was not correct.

Roland Dreier (roland.dreier) wrote :

Just built an upstream git kernel (same tree that worked in my Fedora 9 VM with kvm 70/2.6.26-rc9 host) in my Intrepid VM on Hardy kvm/kernel, and had the same problems without no-kvmclock. I wonder if the current Intrepid gcc is miscompiling some part of the kvmclock code?

I uploaded a fix for this bug to hardy-proposed a few days ago. It
should get accepted real soon now.. The changelog from there explains
the issue:

kvm (1:62+dfsg-0ubuntu8) hardy-proposed; urgency=low

  * Disable CAP_CLOCKSOURCE. This works around the ABI incompatibility of the
    paravirt clock between Hardy's kernels on the host side and the rest of
    the known universe on the guest side. (LP: #243677) This allows guest
    2.6.26 and onwards kernels with KVM_GUEST enabled to boot.

For SRU purposes:

Impact: kernels with CONFIG_KVM_GUEST enabled will fail in undefined
ways. Mostly failure to boot, but if you're unlucky enough, they'll work
for a while, and then break, fall apart, and possibly cause data
corruption.

In intrepid, the bug is gone, since we have a newer host kernel. In
hardy, the fix in proposed returns false for KVM_CAP_CLOCKSOURCE, so no
guest will ever try to use those capabilities.

Test case: Using kvm, try to boot an intrepid kernel. It will fail
spectacularly. With new kvm (1:62+dfsg-0ubuntu8) this should not be the
case.

 subscribe ubuntu-sru

Oh, for reference here's the patch:

mrq1 (kubuntu-bugreporter) wrote :

> In intrepid, the bug is gone, since we have a newer host kernel.

are you sure? i get this bug too with a (then) up-to-date 2.6.26-rc8 host-kernel.

this would imply, that the intrepid (host) kernel has an fix, which the kernel.org
kernel does not have.

Martin Pitt (pitti) wrote :

Accepted into -proposed, please test and give feedback here. Please see https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

Changed in kvm:
status: New → Fix Committed
Changed in linux:
status: New → Fix Committed
status: Fix Committed → Invalid
Roland Dreier (roland.dreier) wrote :

Yes, with Soren's patch to disable the kvm clock in Hardy's kvm 62, the intrepid kernel seems to boot OK even without "no-kvmclock". However some intrepid updates seem to have broken X with the new kernel: now my intrepid VM hangs when starting gdm with 2.6.26 (using the Hardy 2.6.24 kernel allows it to boot to X).

I had the same symptoms with 2.6.26-2 and -3. I installed kvm version 1:62+dfsg-0ubuntu8 from hardy-proposed and did succesfully boot into intrepid VM, kernel 2.6.26-3.

At first sight, unlike Roland Dreier above, no problems with broken X in my VM when starting gdm.

Jamie Strandboge (jdstrand) wrote :

2.6.26-2 and -3 both boot with 1:62+dfsg-0ubuntu8. I too see broken X with these kernels, but it is a different bug because using kvm 1:62+dfsg-0ubuntu8 with a 2.6.24 works fine.

Martin Pitt (pitti) wrote :

Copied to hardy-updates.

Changed in kvm:
status: Fix Committed → Fix Released
Martin Pitt (pitti) wrote :

Please apply fix to Intrepid ASAP!

Changed in kvm:
assignee: nobody → soren
Soren Hansen (soren) wrote :

> this would imply, that the intrepid (host) kernel has an fix, which the kernel.org
> kernel does not have.

No, because the problem I fixed only existed in the ubuntu kernels.

Soren Hansen (soren) wrote :

> Please apply fix to Intrepid ASAP!

It doesn't make sense on Intrepid. It tells kvm to trick the guest into thinking that the host's kernel doesn't support the paravirt clocksource, since the ABI is broken in the hardy kernels (and *only* the hardy kernels) .

Soren Hansen (soren) wrote :

Ok, could everyone who had this problem please tell me how things are working out for you now. I know this upload has fixed some things, but there seems to be more going on..

I'd like to know what you're running as guests and hosts, which kvm version and how far the boot process gets (namely: does it get all the way to gdm and then fail, does it actually go into graphics mode, or does it not really get anywhere at all?).

Martin Pitt (pitti) wrote :

Thanks, Soren. Setting intrepid task as invalid then.

Changed in kvm:
status: In Progress → Invalid
mrq1 (kubuntu-bugreporter) wrote :

> It tells kvm to trick the guest into thinking that the host's kernel doesn't support the paravirt clocksource, since the ABI is broken in the hardy kernels (and *only* the hardy kernels) .

i think this is untrue. i wrote above, that i had this problem ALSO with a kernel.org kernel as host(kernel).

> Ok, could everyone who had this problem please tell me how things are working out for you now.

i can install now (used kubuntu alpha2 iso) without special kernelparameter for the guest. install worked well.

there a easy to spotting bug: (virtual mouse handling).
after booting, the mousecursor is exactly in the center of the screen. (as expected)
after moving the mouse only a little bit, the cursor moves to the complete right of the screen an stays there. you can change the
position of the (virtual) mousepointer, but you can only move it around at the bottom or right border. after patient trying, you can get the cursor to the bottomleft position and click the Kbutton and open a shell (konsole).

but i think, this is unrelated and a completly new bug.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments