qemu-kvm and guest kernel < 2.6.24 sporadic boot fail: Kernel panic - not syncing: IO-APIC + timer doesn't work!

Bug #559088 reported by Chris Bainbridge
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
qemu-kvm (Ubuntu)
Triaged
Low
Unassigned

Bug Description

Binary package hint: qemu-kvm

Booting Centos 4 with the default 2.6.9 kernel I get a sporadic error maybe ~5% of the time:

ENABLING IO-APIC IRQs
..TIMER: vector=0x31 pin1=2 pin2=-1
..MP-BIOS bug: 8254 timer not connected to IO-APIC
...trying to set up timer (IRQ0) through the 8259A ... failed
...trying to set up timer as Virtual Wire IRQ... failed.
...trying to set up timer as ExtINT IRQ... failed :(.
Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug and send a report. Then try booting with the 'noapic' option

This issue has been reported in the past on the linux-kvm mailing list but no resolution was found.[1]
It appears that this bug may in fact be a bug handling interrupts on boot in older kernels.[2] Interrupts were meant to be disabled in a 30ms window, but were not. An interrupt arriving in this time will cause the kernel boot to fail. Virtualization may make it more likely that this bug will appear now than it used to on real hardware, since the model in which interrupts are delivered to the kernel from the "hardware" is different. If this diagnosis is correct, then the issue is very difficult to fix since it would require detecting that a pre-2.6.24 kernel is in use, and managing the boot process so no interrupt is delivered in the problematic window. The alternative is to just accept that pre-2.6.24 kernels may have sporadic boot failures, and to make this information available so that users who run into the issue can either patch or update the kernel that they are using.

[1] http://kerneltrap.org/mailarchive/linux-kvm/2008/11/12/4077354/thread
[2] http://<email address hidden>/msg30813.html

Changed in qemu-kvm (Ubuntu):
importance: Undecided → Medium
importance: Medium → Low
status: New → Triaged
Revision history for this message
Dustin Kirkland  (kirkland) wrote :

Thanks for the research Chris.

Marking Triaged, since you've tracked this down to the upstream mailing list posts. Marking Low, since kernels <2.6.24 are (hopefully) becoming increasingly rare.

To your two suggestions... I know that upstream qemu in the past has really, really tried to avoid special case code for buggy behavior in guests. The latter is far more doable. How do you propose that we do this? If your guest is an Ubuntu kernel <2.6.24 (ie, Dapper), we can open a task and try to get it fixed. If it's another distribution, I suggest you try to contact that distribution's kernel developers.

Revision history for this message
Gerben (gerbgeus) wrote :

Since testing 10.04 beta1 / beta2 I've seen this happening quite some times on my existing guests (8.04/8.10). I cannot recall this happening on my 8.04 host.

Is this pure based on the client kernel, or can this also depend on the server?

Revision history for this message
Dustin Kirkland  (kirkland) wrote : Re: [Bug 559088] Re: qemu-kvm and guest kernel < 2.6.24 sporadic boot fail: Kernel panic - not syncing: IO-APIC + timer doesn't work!

10.04 has a new kernel that contains ~11 kvm related patches. Can you
confirm this issue on the latest kernel? 2.6.32-21-server?

Revision history for this message
Chris Bainbridge (chris-bainbridge) wrote :

If the diagnosis I posted above is correct, then the frequency with which this error appears will depend on both the virtualisation platform (in this case the KVM host) and the guest kernel. We virtualise CentOS 4.4 systems (2.6.9 kernel) and my colleague reports that, in two years of using VMware, he has not seen this error occur, and yet I can reliably repeat it with some frequency under KVM using Ubuntu Karmic.

If you have seen the same problem using kernels >=2.6.24, then maybe the above diagnosis is incorrect. It would be useful to verify the diagnosis by backporting the patch to the 2.6.9 kernel to find out whether it fixes the problem, but I don't really have time to do that.

It is quite easy to set up a test system: install a guest OS with "poweroff" in /etc/rc.local, and then write a script that loops calling "virsh start". Eventually the guest kernel will hang on boot, and you can connect to the console.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.