cirros 0.3.1 fails to boot

Bug #1312199 reported by Attila Fazekas
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
CirrOS
Won't Fix
Low
Unassigned
OpenStack Compute (nova)
Fix Released
Low
Attila Fazekas
Icehouse
Fix Released
High
Attila Fazekas
devstack
Fix Released
Undecided
Attila Fazekas

Bug Description

Logstash query: message: "MP-BIOS bug" AND tags:"console"

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOiBcIk1QLUJJT1MgYnVnXCIgQU5EIHRhZ3M6XCJjb25zb2xlXCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6ImFsbCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjEzOTgzNDg0NzIzNzcsIm1vZGUiOiIiLCJhbmFseXplX2ZpZWxkIjoiIn0=

cirros-0.3.1-x86_64-uec sometimes fails to boot with libvirt/ soft qemu in the openstack gate jobs.

The VM's serial console log ends with:

[ 1.096067] ftrace: allocating 27027 entries in 106 pages
[ 1.140070] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 1.148071] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[ 1.148071] ...trying to set up timer (IRQ0) through the 8259A ...
[ 1.148071] ..... (found apic 0 pin 2) ...
[ 1.152071] ....... failed.
[ 1.152071] ...trying to set up timer as Virtual Wire IRQ...

description: updated
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Couple of comments:

(1) From a related old bug, https://bugzilla.redhat.com/show_bug.cgi?id=502058#c15: qemu -no-kvm guest hangs at during timer setup; works with noapic

    "After some examination of the code, this turns out to
    be a known problem with the code that tests for buggy
    timers. This code is not necessary when running in
    qemu, and it gets confused because it tries to do accurate
    timing checks which sometimes fail in virt. For more
    information, see:

    https://bugzilla.redhat.com/show_bug.cgi?id=698842#c8

    Adding kernel no_timer_check option appears to fix the
    problem for me, but I am still doing testing."

(2) Taking to one of the upstream KVM developers (Amos Kong), says, the above check is too strict for virtualization environments, and recommends to try "no_timer_check" option on Kernel command-line to alleviate this.

Revision history for this message
Richard Jones (rjones-redhat) wrote :

Just a note:

no_timer_check is added implicitly when the kernel detects it is booting under KVM.

However when booting under TCG (software emulation) you have to add this to the guest kernel command line explicitly.

TBH I believe this is a bug: the kernel should add this flag automatically whenever it detects any kind of virtualization being used, but it doesn't right now so you have to add it.

Revision history for this message
Attila Fazekas (afazekas) wrote :

AFAIK Detecting a not accelerated qemu as hypervisor is not an easy task even on a booted system [1].

When the image is UEC (the kernel image) is separated, nova would be able to pass no_timer_check as kernel parameter.
This is only required when the CONF.libvirt.virt_type=qemu.
Linux automatically turns off the timer_check when the hypervisor is mshyperv and kvm.
AFAIK xen also uses para virtualized clock.
This seams like this is only way to provide stable boot with existing uec images in soft qemu.

Adding nova to this bug for the above change.

Devstack automatically decide when to to use kvm or qemu.
The kvm is selected when the system is able to use hardware acceleration with qemu/kvm.

The cloud image needs to be altered in most cases, when qemu is selected type and the cloud image is not uec in order to use no_timer_check parameter.
This includes the f20 cloud image and all cloud images I saw so far.
It affects the heat-slow jobs.

Adding devstack as affected component for this change.

A Bug for Linux kernel and F20 could image will be created as well.

[1] http://fedorapeople.org/cgit/rjones/public_git/virt-what.git/tree/virt-what.in?id=8aa72773cebbc742d9378fed6b6ac13cb57b0eb3#n228

Revision history for this message
Attila Fazekas (afazekas) wrote :

Probably a good loops_per_jiffy parameter also required. The same as used on the host probably good if the architecture is the same.
By using the BogoMIPS from the /proc/cpuinfo
I got good lpj by this formula: BogoMIPS/2*1000.

The notsc kernel parameter is also recommended if the HW acceleration (kvm) not enabled.

Revision history for this message
Richard Jones (rjones-redhat) wrote :

Yes you're correct. I thought that qemu (TCG) exported a CPUID leaf so we could detect it, but it seems that it does not. As you say the kernel parameter would have to be added, either using -append (separate kernel) or by modifying the disk image.

Here's a short Python script that can modify the disk image (not suggesting we use it, this is just as a demonstration):

#!/usr/bin/python
import guestfs
import re
g = guestfs.GuestFS (python_return_dict=True)
g.add_drive ("cirros-0.3.1-x86_64-disk.img")
g.launch ()
g.mount ("/dev/sda1", "/")
lines = g.read_lines ("/boot/grub/menu.lst")
lines = [re.sub (r'^kernel(.*) console=',
                 r'kernel\1 no_timer_check console=', line)
         for line in lines]
content = "\n".join (lines)
g.write ("/boot/grub/menu.lst", content)
g.shutdown ()
g.close ()

Revision history for this message
Richard Jones (rjones-redhat) wrote :

FWIW it would be better if the host kernel exposed its lpj setting. At the moment it only exposes it through dmesg, which is useless as the setting disappears after some time. I proposed this patch some time ago:

https://lkml.org/lkml/2013/3/1/308

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (master)

Fix proposed to branch: master
Review: https://review.openstack.org/96180

Changed in devstack:
assignee: nobody → Attila Fazekas (afazekas)
status: New → In Progress
Revision history for this message
Attila Fazekas (afazekas) wrote :
Tracy Jones (tjones-i)
tags: added: libvirt
Solly Ross (sross-7)
Changed in nova:
status: New → In Progress
importance: Undecided → Low
Revision history for this message
Attila Fazekas (afazekas) wrote :

The bogomips on x86 is calculated from the lpj by simple formula (loops_per_jiffy/(500000/HZ)) and the bogomips value expected to be the same regardless to the host kernel HZ settings. The challenging part is the HZ might be non-default value on the guest kernel and it "cannot be know" unless the we add a HZ related image property to the aki image.
Would be better if the kernel would have 'loop per sec' or bogomips argument which is HZ independent. (Most system nowadays uses the NO_HZ options, which means HZ=1000)
If the kernel parameter meaning would be changed to lpj@1000HZ would be also helpful.

The kernel uses relatively big sample for lpj measurement and usually a relatively big 15% error is not fatal.
Higher then required lpj usually does not causes fatal issue, but it makes the system slower, the opposite can cause the system reads invalid data from a hardware register. Is it a real threat with an emulated hardware ?

I tried to alter the kernel parameters on the f20 image with guestfish, it is working on f20, but even with the instruction [1], it has issue on Ubuntu 12.04 [2](gate).

The guestfish approach without HW acceleration is not fast.
I will try to use a loop mount.

[1] http://libguestfs.org/guestfs-faq.1.html
[2] http://paste.openstack.org/show/81983/

Changed in nova:
assignee: nobody → Attila Fazekas (afazekas)
Revision history for this message
Richard Jones (rjones-redhat) wrote :

[lpj]
Right, this is the reason why exporting it in /proc/cpuinfo is necessary. Normalizing the lpj= parameter is another approach, although I guess it would break existing users.

[ubuntu]
It seems to be using the wrong kernel. Did you do:
chmod 0644 /boot/vmlinuz*
Also which exact versions of:
linux-image (kernel)
libguestfs
seabios
febootstrap
are installed? You should open another bug in launchpad to track this issue separately.

[loop]
Loop mounts are insecure (as well as needing root). If you're seeing performance problems, take a look at:
http://libguestfs.org/guestfs-performance.1.html

Revision history for this message
Richard Jones (rjones-redhat) wrote :

The Ubuntu bug is because the virtio-serial module isn't available in the -virtual kernel. It looks as if you need to install 'linux-image-generic' alongside it. Basically you need to make sure 'virtio_console.ko' is present in /lib/modules/<whichever-version>. This is a bug in Ubuntu.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/96090
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6b86a61fee15ce1237303fab2f7896f8c3bcad47
Submitter: Jenkins
Branch: master

commit 6b86a61fee15ce1237303fab2f7896f8c3bcad47
Author: Attila Fazekas <email address hidden>
Date: Wed May 28 09:19:29 2014 +0200

    Use no_timer_check with soft-qemu

    The Linux kernel timer check not working properly
    when the hypervisor's thread preempted by the host CPU scheduler.

    The timer check is automatically disabled with other types
    of hypervisors including the hardware accelerated kvm,
    but timer_check is not disabled when qemu used without hardware acceleration.

    This issue is frequently mischaracterized as an SSH connectivity issue and
    causes rechecks and occasional boot failures.

    This change adds no_timer_check kernel parameter when we are using
    uec images with qemu.

    Closes-Bug: #1312199
    Change-Id: I3cfdfe9048fe219fc12cdac8a399b496f237e55e

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/100065

Revision history for this message
Richard Jones (rjones-redhat) wrote :

I'm surprised this patch would work, as normally using -append would only work when using an external kernel & initrd.

Alan Pevec (apevec)
tags: added: havana-backport-potential
tags: added: gate
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/100065
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=485f25df181dedf2ba475f5e550af4f9f41089a3
Submitter: Jenkins
Branch: stable/icehouse

commit 485f25df181dedf2ba475f5e550af4f9f41089a3
Author: Attila Fazekas <email address hidden>
Date: Wed May 28 09:19:29 2014 +0200

    Use no_timer_check with soft-qemu

    The Linux kernel timer check not working properly
    when the hypervisor's thread preempted by the host CPU scheduler.

    The timer check is automatically disabled with other types
    of hypervisors including the hardware accelerated kvm,
    but timer_check is not disabled when qemu used without hardware acceleration.

    This issue is frequently mischaracterized as an SSH connectivity issue and
    causes rechecks and occasional boot failures.

    This change adds no_timer_check kernel parameter when we are using
    uec images with qemu.

    Closes-Bug: #1312199
    Change-Id: I3cfdfe9048fe219fc12cdac8a399b496f237e55e
    (cherry picked from commit 6b86a61fee15ce1237303fab2f7896f8c3bcad47)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (master)

Fix proposed to branch: master
Review: https://review.openstack.org/102793

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to devstack (master)

Reviewed: https://review.openstack.org/96180
Committed: https://git.openstack.org/cgit/openstack-dev/devstack/commit/?id=b3e722df60bef0aae1381c962b252ab26b52b589
Submitter: Jenkins
Branch: master

commit b3e722df60bef0aae1381c962b252ab26b52b589
Author: Attila Fazekas <email address hidden>
Date: Wed May 28 16:15:53 2014 +0200

    soft-qemu handling with F20 could image

    When the qume used with Linux image without
    a para-virtualized timer, various timing issues can happen,
    if the qemu process preempted in the wrong time, for example
    at timer test.

    The issues less like happens on low load, high core number
    host system, but it can happen.

    For soft qemu (TCG) generally recommended to explicitly disable the
    timer check.

    Pre-caching the Fedora `gate edition` image, which contains the
    the no_timers_check option.

    Related-Bug: #1297560
    Partial-Bug: #1312199

    Change-Id: Id5cd01a92a047b7859914e5bb017c15ee443b4d5

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/102793
Committed: https://git.openstack.org/cgit/openstack-dev/devstack/commit/?id=bfcb2ff732ffdf2ed50be6a677f1b6182a0213a4
Submitter: Jenkins
Branch: master

commit bfcb2ff732ffdf2ed50be6a677f1b6182a0213a4
Author: Attila Fazekas <email address hidden>
Date: Thu Jun 26 12:38:20 2014 +0200

    Switch to Fedora-x86_64-20-20140618-sda

    The heat-slow job using the Fedora-20 image as L2 guest, the
    currently used version is affected by the heartbleed issue and
    the Mb-Bios bug issue observed with emulation.

    This version of the Fedora cloud image uses the no_timer_check
    kernel parameter, which solves the MP-Bios Bug issue.

    The Image tested with ~3000 heat-slow job, without any issue.

    Change-Id: I9ce9b7769c2d91e630b0362f2c2c6ac9066defbc
    Closes-Bug: #1312199
    Closes-Bug: #1297560

Changed in devstack:
status: In Progress → Fix Released
Changed in nova:
milestone: none → juno-2
status: Fix Committed → Fix Released
Revision history for this message
Scott Moser (smoser) wrote :

I don't really see much we could change in cirros here, other than building in the kernel parameter. I'm not entirely opposed to doing that, but

I don't really agree with the nova fix of adding parameters "by default" to the kernel command line. Nova should be pretty dumb. The more it knows and reacts to the specific behavior of operating systems it runs under kvm, the more that knowledge can become brittle or out dated. Ie, in future linux kernels adding no_timer_check might have negative side affects, and openstack will the be in the position of having to guess whether or not to add the parameter based on perceived version of the kernel that it is going to boot. Worse, the image author will then be in the position of having to "fight" this guess.

Anyway, with regard to ubuntu bug, 14.04 is fix-released in that it has VIRTIO_CONSOLE=y. 12.04 has VIRTIO_CONSOLE=m and that is available in the linux-generic as Robert stated. Clearly it should have beein in the -virtual or builtin.

Changed in cirros:
status: New → Triaged
importance: Undecided → Low
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-2 → 2014.2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to devstack (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/130375

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on devstack (stable/icehouse)

Change abandoned by Joe Gordon (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/130375
Reason: dropping in favor of https://review.openstack.org/#/c/130611/.

Revision history for this message
Scott Moser (smoser) wrote :

marking this wont fix. if it istill a problem in future cirros with newer kernel, we can re-open.

Scott Moser (smoser)
Changed in cirros:
status: Triaged → Won't Fix
Revision history for this message
Adam Young (ayoung) wrote :

Just saw this again with both Centos and Cirros on Tripleo (virtual Machine based install, so nested virt)

[ 0.148008] ...trying to set up timer as Virtual Wire IRQ...

http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img

CentOS-7-x86_64-GenericCloud

Revision history for this message
Phani Pawan (ppawan) wrote :

Saw this on Packstack Installation on CentOS-7-x86_64 with
Cirros http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-disk.img

It is also a virtual machine based install.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.