Guest kernel hang during boot when KVM is active on i386 host

Bug #688085 reported by Коренберг Марк
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
QEMU
Fix Released
Undecided
Unassigned
meego
Fix Released
Critical
qemu-kvm
Fix Released
Undecided
Unassigned
kvm (Ubuntu)
Invalid
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Unassigned
qemu (Ubuntu)
Invalid
Undecided
Unassigned
qemu-kvm (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

Binary package hint: qemu

Guest kernel hang during boot when KVM is active on i386 host

See the patch.
http://www.spinics.net/lists/kvm/msg40800.html

How to reproduce:
1. install Maversick x86 (not amd64)
2. ensure you have kvm support in processor
3. kvm -kernel /boot/initrd.img-2.6.35-24-generic-pae
4. kvm -no-kvm -kernel /boot/initrd.img-2.6.35-24-generic-pae works OK.

SRU Justification:
Impact: Users cannot boot KVM guests on i386 hosts
2. How bug addressed: The upstream commit at http://www.spinics.net/lists/kvm/msg40800.html fixed it
3. Patch: A kernel patch is attached to this bug.
4. Reproduce: boot an i386 kernel on a kvm-capable host. Try to boot a kvm guest.
5. Regression potential: since this is cherrypicking a commit from a future upstream which had already been changed, regression is possible. However if there is a regression, it should only affect users of KVM on i386 hosts, which currently fail anyway.

Revision history for this message
In , Fathi-boudra (fathi-boudra) wrote :

Please, specify target build and set the status to "accepted" if you work on the issue.

Revision history for this message
Коренберг Марк (socketpair) wrote :

Binary package hint: qemu

Guest kernel hang during boot when KVM is active on i386 host

See the patch.
http://www.spinics.net/lists/kvm/msg40800.html

How to reproduce:
1. install Maversick x86 (not amd64)
2. ensure you have kvm support in processor
3. kvm -kernel /boot/initrd.img-2.6.35-24-generic-pae
4. kvm -no-kvm -kernel /boot/initrd.img-2.6.35-24-generic-pae works OK.

Revision history for this message
Коренберг Марк (socketpair) wrote :

When booting another kernel (like RHEL 6.0) in guest, kernel hang on the line:
Probing EDD (edd=off to disable)... ok
Really, it hang in set_64bit inside function native_set_pmd()

Changed in qemu:
status: New → Fix Released
Changed in qemu-kvm:
status: New → Fix Released
Scott Moser (smoser)
Changed in kvm (Ubuntu):
status: New → Invalid
Changed in qemu (Ubuntu):
status: New → Invalid
Changed in qemu-kvm (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
In , Zhiyuan-lv (zhiyuan-lv) wrote :

*** Bug 11378 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Zhiyuan-lv (zhiyuan-lv) wrote :

Copied from #11271 and #11378

Some findings of the QEMU-KVM issue on Ubuntu 10.10.

The QEMU-KVM hung during booting up meego-netbook kernel. The netbook kernel is
2.6.35 with some Intel patches. I did below test:

QEMU:
1, qemugl in MeeGo 1.1 SDK, which version is 0.12.4
2, qemu installed from Ubuntu 10.10 repo, which version is 0.12.5

Kernel:
a, handset netbook kernel in MeeGo 1.1 release, major version is 2.6.35
b, Ubuntu 10.10 kernel, major version is 2.6.35

                    Result
1 to boot a FAIL
1 to boot b SUCCESS
2 to boot a FAIL
2 to boot b SUCCESS

So the problem is quite like that MeeGo kernel has something special which
cannot work well with the KVM in Ubuntu 10.10 host kernel.

Revision history for this message
In , Zhiyuan-lv (zhiyuan-lv) wrote :

With gdb to debug kernel running inside QEMU, it could be seen that QEMU hangs at below inline asm code:

kernel-netbook-2.6.35.3/linux-2.6.35/arch/x86/include/asm/cmpxchg_32.h

static inline void set_64bit(volatile u64 *ptr, u64 value)
{
        u32 low = value;
        u32 high = value >> 32;
        u64 prev = *ptr;

        asm volatile("\n1:\t"
                     LOCK_PREFIX "cmpxchg8b %0\n\t"
                     "jnz 1b"
                     : "=m" (*ptr), "+A" (prev)
                     : "b" (low), "c" (high)
                     : "memory");
......
}

The trace is as below:
#0 0xc1742bd9 in set_64bit () at /home/abuild/rpmbuild/BUILD/kernel-netbook-2.6.35.3/linux-2.6.35/arch/x86/include/asm/cmpxchg_32.h:74
#1 native_set_pmd () at /home/abuild/rpmbuild/BUILD/kernel-netbook-2.6.35.3/linux-2.6.35/arch/x86/include/asm/pgtable-3level.h:41
#2 pmd_populate_kernel () at /home/abuild/rpmbuild/BUILD/kernel-netbook-2.6.35.3/linux-2.6.35/arch/x86/include/asm/pgalloc.h:66
#3 early_ioremap_init () at arch/x86/mm/ioremap.c:382
#4 0xc173518c in ?? ()
#5 0xc1733545 in start_kernel () at init/main.c:573
#6 0xc17330ca in i386_start_kernel () at arch/x86/kernel/head32.c:72
#7 0x00000000 in ?? ()

Revision history for this message
In , Zhiyuan-lv (zhiyuan-lv) wrote :

Update some more investigation result.

The KVM (in 2.6.35.23 linux kernel on Ubuntu 10.10) seems to hang in __vcpu_run() in arch/x86/kvm/x86.c. When the client OS is running above instruction, KVM goes into dead loop in __vcpu_run(), because the "vcpu_enter_guest ()" always returns "1".

The scenario seems to be as follow: The instruction triggered page fault to exit kvm guest, and then "kvm_mmu_page_fault" was called to handle the exception. The function returns non-zero value to let instruction be run again, and suppose that the page fault has been resolved. But for some reason, the page-fault exception was still triggered, and code run infinitely in the loop.

Revision history for this message
In , Zhiyuan-lv (zhiyuan-lv) wrote :

Update some findings through GooGle:

1, Similar KVM hanging issues were reported, like:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/688085

According to the fix from Avi Kivity, it looks like a bug in KVM handling 64-bit operand of "cmpxchg8b". But it is interesting that KVM on Ubuntu 10.04 and ubuntu 9.10 did not encounter the hanging issue running the 2.6.35 guest kernel.

http://www.spinics.net/lists/kvm/msg40800.html

Will try the kernel built from latest git tree to see whether the problem could be resolved.

2, Below one seems to be discussing the similar problem but for different root cause. Just a note here. The patched mentioned in below link is still not found in the kernel gitorious tree.

http://kerneltrap.org/mailarchive/linux-kernel/2010/8/3/4601781

Revision history for this message
Froggy (thrabalek) wrote :

Must be fixed in arch/x86/include/asm/kvm_emulate.h and arch/x86/kvm/emulate.c.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Froggy (thrabalek) wrote :

Change proposal for Maverick (2.6.35) based on kernel patch in description:
   * kvm_emulate.h:
      * Change struct operand so that val and orig_val will be changed from unsigned long to union like this:
            union {
                  unsigned long val;
                  u64 val64;
            }

   * emulate.c:
      * in emulate_grp9() change all occurences of c->dst.val to c->dst.val64.
      * In x86_emulate_insn() change line "c->src.orig_val = c->src.val" to "c->src.orig_val64 = c->src.val64".

The MeeGo qemu will start work then.

Changed in qemu-kvm (Ubuntu):
assignee: nobody → Serge Hallyn (serge-hallyn)
Changed in linux (Ubuntu):
status: Confirmed → Invalid
Changed in qemu-kvm (Ubuntu):
status: Triaged → In Progress
Changed in linux (Ubuntu):
status: Invalid → Triaged
Changed in qemu-kvm (Ubuntu):
assignee: Serge Hallyn (serge-hallyn) → nobody
status: In Progress → Invalid
Changed in linux (Ubuntu):
assignee: nobody → Serge Hallyn (serge-hallyn)
status: Triaged → In Progress
description: updated
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Please test the package at

https://launchpad.net/~serge-hallyn/+archive/testkernel

If that kernel works, I'll complete the SRU process to request the maverick kernel get this patch.

Revision history for this message
Froggy (thrabalek) wrote :

Fix verified, meego qemu works well with linux 2.6.35-24.42qemui386v3.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks very much for confirming. Forwarded the patch.

Revision history for this message
kred (krzysztof-dziuba) wrote :

Error still can be reproducible on patched 2.6.35-24-generic, amd64 architecture on Athlon 64 X2.

Changed in meego:
importance: Unknown → Critical
status: Unknown → In Progress
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Patch is said to be in maverick-next.

I'd like to hear confirmation (or denial) from original bug reporter as to whether the patch fixed the problem. (I'm unconvinced that this is actually a dup of the Meego bug)

Changed in linux (Ubuntu):
assignee: Serge Hallyn (serge-hallyn) → nobody
status: In Progress → Fix Committed
Revision history for this message
Aaz (aaz2009) wrote :

Error still can be reproduced on patched 2.6.35-24-generic-pae, amd64, Athlon II X2 220, 4Gb, Maverick.

Revision history for this message
In , Zhiyuan-lv (zhiyuan-lv) wrote :

Ubuntu 10.10 will update kernel to contain a back ported kvm fix, which probably in the next version after 2.6.35.24. Meanwhile, it is possible to manually build newer version of kvm to have the problem fixed.

The tar ball of kvm source code could be downloaded from below:

http://sourceforge.net/projects/kvm/files/kvm-kmod/2.6.37/

I tried the package on my Ubuntu 10.10 (kernel 2.6.35.23) T61 laptop with Intel Core 2 due 32-bit CPU. Using the new kvm, I could boot up meego kernel successfully. Hope that could help. Thanks!

Revision history for this message
Zhiyuan-lv (zhiyuan-lv) wrote :

I verified the fix on kernel 2.6.35. After rebuilding kernel with the patch, I could boot MeeGo image with qemu-kvm successfully. Can I know when the kernel update will be available for Ubuntu 10.10? Thanks!

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi Zhiyuan-lv,

the fix is currently in maverick-proposed. You can use that kernel by adding something like

deb http://archive.ubuntu.com/ubuntu/ maverick-proposed main

to your /etc/apt/sources.list.

I don't know how long it will take for this fix to move from maverick-proposed into maverick. Perhaps someone on the kernel team has an idea?

Revision history for this message
Zhiyuan-lv (zhiyuan-lv) wrote :

Just checked the Ubuntu update repo. The latest kernel version there 2.6.35-25.44 has included the fix. Thanks!

Steve Conklin (sconklin)
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
Changed in meego:
status: In Progress → Fix Released
Revision history for this message
Julian Wiedmann (jwiedmann) wrote :

This release has reached end-of-life [0].

[0] https://wiki.ubuntu.com/Releases

Changed in linux (Ubuntu Maverick):
status: New → Invalid
Revision history for this message
Adolfo Jayme Barrientos (fitojb) wrote :

(Untargetting end-of-life release)

no longer affects: kvm (Ubuntu Maverick)
no longer affects: linux (Ubuntu Maverick)
no longer affects: qemu (Ubuntu Maverick)
no longer affects: qemu-kvm (Ubuntu Maverick)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.