Ubuntu 12.04 (Precise) guests can't boot on Ubuntu 10.04 (Lucid) QEMU-KVM host

Bug #1047531 reported by Konstantin L. Metlov
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned
qemu-kvm (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

For a couple of days I'm trying (on and off) to launch Ubuntu 12.04 guest (32-bit) on my 10.04 KVM host (AMD CPU in 32-bit mode with PAE enabled, stock server kernel). I have tried it with different images (built with VMBuilder, downloaded from cloud-images.ubuntu.com or even the installation ISO images of different flavors). They all hang on boot, either after saying "Starting up ..." or just with a black screen. Kernel options have no effect, it seems the hang happens in GRUB, before kernel is launched. I have tried enabling debug in grub and the last lines I see about loading initrd, malloc and several reallocator.c traces, then hang.

A simple steps to reproduce:

# wget http://uec-images.ubuntu.com/releases/precise/release-20120822/ubuntu-12.04-server-cloudimg-i386-disk1.img
# qemu-img convert -O qcow2 ubuntu-12.04-server-cloudimg-i386-disk1.img moa.qcow2
# kvm -m 512 -smp 1 -drive file=./moa.qcow2 -vnc 127.0.0.1:10

In attached vnc client this briefly displays GRUB prompt and hangs with black screen.

The same happens without image conversion step, that is
# kvm -m 512 -smp 1 -drive file=./ubuntu-12.04-server-cloudimg-i386-disk1.img -vnc 127.0.0.1:10

At the same time, the host perfectly runs about 10 Lucid guests, so this should not be a host configuration problem.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1047531/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
affects: ubuntu → qemu-kvm (Ubuntu)
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for taking the time to report this bug.

I can actually reproduce this in quantal. However (at least in my case) kvm is not actually hanging. The default boot prompt is sending output to 'console=ttyS0', and cloud-init waits a long time for an ec2 server to be available.

When I catch the grub prompt, type 'e' to edit the boot command, and delete the 'console=ttyS0', then output appears on my vnc session. Does that work for you in lucid as well?

Changed in qemu-kvm (Ubuntu):
importance: Undecided → High
status: New → Incomplete
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

(Also, after minutes I do then get a login prompt, with 'console=ttyS0' removed from boot command)

Revision history for this message
Konstantin L. Metlov (metlov) wrote :

Nope. In my case this does not help. I just get "Booting a command list" and hang. I've tried removing "console=ttyS0" before and did it again now just for a test. The hang happens not only with cloud images, but with VMbuilder images and install ISOs too.

What bothers me, actually, is that I can't find similar cases, reported by other people. Running new images on old VM servers should be quite common use case. May be there is something specific in my configuration that triggers this problem. On the other hand I do not remember changing any KVM or libvirt global configuration files, I use stock kernel and stock packages. Machine has 8Gb of memory (with about 2 Gb left free and for disk cache). There are exactly 10 Lucid VMs running at all times. This one is eleventh. May be there is some limit on number of running VMs ?

Revision history for this message
Konstantin L. Metlov (metlov) wrote :

Also the VM in hanged state uses exactly 36 Mb of memory out of allocated 512Mb and spins indefinitely using 100% CPU (sometimes sligntly more: 102% or 104%). Probably, this memory usage corresponds to the kernel+initrd.

I have enabled GRUB debug output, but never seen it telling anything about launching the kernel. So it looks that hang happens in GRUB (which is consistent with the observation that all other Precise images hang in the same way, irrespectively of kernel options).

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks, Konstantin.

I'll setup a new lucid host (as soon as hardware is freed from other tests) and try to reproduce.

Changed in qemu-kvm (Ubuntu):
status: Incomplete → New
Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1047531] Re: Ubuntu 12.04 (Precise) guests can't boot on Ubuntu 10.04 (Lucid) QEMU-KVM host

Quoting Konstantin L. Metlov (<email address hidden>):
> Also the VM in hanged state uses exactly 36 Mb of memory out of
> allocated 512Mb and spins indefinitely using 100% CPU (sometimes
> sligntly more: 102% or 104%). Probably, this memory usage corresponds to
> the kernel+initrd.

Interesting.

Your kvm command line shows 4 qcow2 files and no cdrom file. Is
windows already installed on c.img?

Revision history for this message
Konstantin L. Metlov (metlov) wrote :

4 ? Why ? There is only one "ubuntu-12.04-server-cloudimg-i386-disk1.img". There is no "c.img" and no Windows involved.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Ah, sorry, that's another bug, but I suspect a duplicate of this.

I'll wait until I manage (or fail) to reproduce to ask for more - sorry.

Revision history for this message
Konstantin L. Metlov (metlov) wrote :

Just to revive the topic...

I have tried booting Precise cloudimg on one of my Lucid workstations with vanilla Lucid and KVM but 2.6.38-15-generic-pae (backported Natty) kernel. After removing 'console=ttyS0' the image boots properly. The other difference is that the workstation I used has Intel CPU and no other VMs running.

I still had no progress with booting Precise cloudimg on vanilla Lucid server kernel on AMD.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Quoting Konstantin L. Metlov (<email address hidden>):
> Just to revive the topic...
>
> I have tried booting Precise cloudimg on one of my Lucid workstations
> with vanilla Lucid and KVM but 2.6.38-15-generic-pae (backported Natty)
> kernel. After removing 'console=ttyS0' the image boots properly. The
> other difference is that the workstation I used has Intel CPU and no
> other VMs running.
>
> I still had no progress with booting Precise cloudimg on vanilla Lucid
> server kernel on AMD.

Could you test the vanilla lucid kernel on intel?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I couldn't reproduce this with lucid on intel. Trying amd.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Failed to reproduce on amd as well.

Revision history for this message
Konstantin L. Metlov (metlov) wrote :

May be it is the number of VMs then... Can you run 11 of them ?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Quoting Konstantin L. Metlov (<email address hidden>):
> May be it is the number of VMs then... Can you run 11 of them ?

I couldn't do 11 - not enough memory. I could do a bunch. However
I just noticed that you're running amd in 32-bit mode with PAE. I
was running 64-bit lucid! I'll re-install and try again, unless you
confirm that you are able to reproduce this with 64-bit lucid as well.

Revision history for this message
Konstantin L. Metlov (metlov) wrote :

Yes, in my case both host and guest are 32-bit. PAE is enabled by default in stock 32-bit server Lucid kernel.

All my computers are running in 32-bit mode and, unfortunately, I can't easily reinstall OS on them.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

With 4 simultaneous instances I still could nto reproduce this. Again, I removed 'console=ttyS0' from /boot/grub/grub.cfg (with the qcow file mounted over qemu-nbd), and had to wait for cloud-init to time out on its attempts to contact a provider.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

and 6 works too (hitting my memory limit)

Can you install the qemu-kvm-dbgsym package (see https://wiki.ubuntu.com/DebuggingProgramCrash) and, when the guest is frozen, get a stack trace to see where qemu is?

Changed in qemu-kvm (Ubuntu):
status: New → Incomplete
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Actually I believe I've reproduced this on an amd laptop. It does look to be an amd-only bug. I've not reproduced it on intel.

Changed in qemu-kvm (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Peeking through gdb, it seems the kernel is hung inside setup_arch:

#0 0xc0135e81 in ?? ()
#1 0xc097a0a3 in ?? ()
#2 0xc09744ed in ?? ()
#3 0xc09740ba in ?? ()
#4 0x00000000 in ?? ()

where System.map (in the precise image) shows

c097a024 T reserve_standard_io_resources
c097a048 T setup_arch
c097a756 T i386_reserve_resources

and

c0135e50 t native_flush_tlb
c0135e60 t native_flush_tlb_global
c0135e90 t native_flush_tlb_single

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

To my surprise, this is happening even with the very latest qemu-kvm upstream git HEAD.

Host is an amd laptop with kernel: Linux ubuntu 2.6.32-43-generic-pae #97-Ubuntu SMP Wed Sep 5 16:59:17 UTC 2012 i686 GNU/Linux

Guest kernel is vmlinuz-3.2.0-29-virtual

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

When I installed the oneiric backports kernel, the VM booted just fine.

So a workaround here is:

sudo apt-get install linux-image-server-lts-backport-oneiric

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Since the backports kernel, which is officially supported, works, I suspect the kernel team may set this bug to low priority.

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Konstantin L. Metlov (metlov) wrote :

Thanks for confirming the bug !

As for me, the workaround was to run the VM on another machine with Intel processor, so that I can start upgrading the main server to Precise (which, as far as I can tell, does not have this problem).

Revision history for this message
Konstantin L. Metlov (metlov) wrote :

Just in case, I'd like to confirm that upgrading the kernel to linux-image-server-lts-backport-oneiric does solve this problem for me too.

Revision history for this message
Konstantin L. Metlov (metlov) wrote :

Also, when booting the Precise VM on Lucid host with oneiric kernel I see the following message in dmesg:

kvm: <kvm pid>: cpu0 unhandled rdmsr: 0xc0010001

There are no similar messages for other 10 Lucid VMs, running on the same host.

penalvch (penalvch)
tags: added: regression-release
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

(lowered priority since there is a workaround - using the backports kernel. I don't believe this is a bug in qemu-kvm, since newer kernel fixes it, but I'm not yet marking it invalid for qemu-kvm since it's possible qemu-kvm coudl work around it somehow if it did the right thing)

Changed in qemu-kvm (Ubuntu):
importance: High → Medium
Revision history for this message
Mario (q-mario) wrote :

I am experiencing this randomly with a host on ubuntu 12.04 and upgrading guests from 12.04 to 14.04... randomly machines never boot after the upgrade... others do..

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@mario - does upgrading to the backports kernel fix this for you?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.