Guest Kernel Panic if using "-cpu host" in qemu-kvm 1.1.1

Bug #1037675 reported by Till Schäfer on 2012-08-16
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
QEMU
Undecided
Unassigned

Bug Description

After Upgrading to qemu-kvm-1.1.1-r1 from version 1.0.1-r1 my virtual machines (running gentoo linux) panic at intel_pmu_init. (detailed information including stacktrace are in the uploaded screenshot). When i remove the "-cpu host" option, the system starts normally.

the command line from whicht the system is bootet:

qemu-kvm -vnc :7 -usbdevice tablet -daemonize -m 256 -drive file=/data/virtual_machines/wgs-l08.img,if=virtio -boot c -k de -net nic,model=virtio,macaddr=12:12:00:12:34:63,vlan=0 -net tap,ifname=qtap6,script=no,downscript=no,vlan=0 -smp 2 -enable-kvm -cpu host -monitor unix:/var/run/qemu-kvm/wgs-l08.monitor,server,nowait

also reported on gentoo bug tracker (with some more details of the host): https://bugs.gentoo.org/show_bug.cgi?id=431640

Till Schäfer (till2-schaefer) wrote :

First of all, your kernel panic screenshot is incomplete: it lacks the most important information which were scrolled off the (virtual) screen. Please enable serial console and capture whole OOPs in a text form.

Second, it isn't clear whenever this is HOST kernel panic or GUEST kernel panic. I assume it is guest.

Third, you nevrer said which guest (kernel) it is.

And forth, gentoo is very well known for breaking qemu-kvm by their "hardened" patches. Disable the hardening and retry.

/mjt

Till Schäfer (till2-schaefer) wrote :

sorry for the not very usefull information i provides above.

i will try to reproduce the the failure with a vailla kernel (host) in a few days. currently the server is in use and cannot be restarted.

the kernel panic was in the guest (thought this is clear by "my vm panic"). If it is reproducable with a vannilla kernel (host) i will provide more detailed information here. the vm is using a 3.3.8 gentoo kernel (not hardened).

Stefan Kuhn (wuodan-gentoo) wrote :

I have the same issue, but not on hardened.
Tried for 1-2 hours to send the output to serial console but failed.
The text below is what I posted at https://bugs.gentoo.org/show_bug.cgi?id=431640#c8:
###############

Same issue here (same screenshot with qemu-kvm-1.1.1-r1), but not on hardened.
Happens with 1.1.1-r3 with a different panic message (not the one
mentioned in bug #431444)

I fail miserably at "sending output to serial console or file", if anyone
would help me there I could provide the entire output of the kernel panic.

Below is what I've found out to far:
(Tests with qemu-kvm-1.1.1-r1)

Analysis:
=========
Happens with the following LiveCDs:
- Fedora-17-x86_64-netinst.iso: uname -r = 3.3.4-5.fc17.x86_64; image: /LiveOS/squashfs.img
- pentoo-x86_64-2012.0_beta1.7.iso: uname -r = 3.4.2-pentoo; image: image.squashfs

Does not happen with:
- pentoo-x86_64-2009.0.iso: uname -r = 2.6.31-pentoo-r3; image: image.squashfs
- (Gentoo) install-amd64-minimal-20120621.iso: uname -r = 3.2.12-gentoo; image.squashfs
- systemrescuecd-x86-2.8.1.iso (64bit kernel): uname -r = 3.2.23-std281-amd64: sysrcd.dat

Tried installing qemu-kvm-1.1.1-r1 without the 2 patches, no difference.

Guess:
======
This seems to happen only with newer guest-kernels (on newer CPUs).

Hardware-Info:
==============
see attachments

Correcting myself (comment #2):

> And forth, gentoo is very well known for breaking qemu-kvm by their "hardened" patches. Disable the hardening and retry.

I mean the (host) KERNEL hardering, not qemu-kvm userspace hardering there. Sorry for any potential confusion.

Till Schäfer (till2-schaefer) wrote :

thx for the patience, i am currently very busy, therefore this took a bit longer than it was planed:

- using a non hardened kernel (gentoo-sources-3.3.8) does not resolve the issue

therefore i need to use the serial console, which is somewhat new to me. i will do this as soon as i find some time. i am still on it ;)

Till Schäfer (till2-schaefer) wrote :
Download full text (4.0 KiB)

ok getting the serial console to work was not that hard. here is the relevant serial output of the failing guest (full output is attached as file):

[ 0.010706] mce: CPU supports 10 MCE banks
[ 0.011279] ACPI: Core revision 20110623
[ 0.014769] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 0.025876] CPU0: Intel(R) Core(TM)2 Quad CPU Q9300 @ 2.50GHz stepping 07
[ 0.027998] Performance Events:
[ 0.027998] general protection fault: 0000 [#1] SMP
[ 0.027998] CPU 0
[ 0.027998] Modules linked in:
[ 0.027998]
[ 0.027998] Pid: 1, comm: swapper/0 Not tainted 3.2.12-gentoo #1 Bochs Bochs
[ 0.027998] RIP: 0010:[<ffffffff81aa553f>] [<ffffffff81aa553f>] intel_pmu_init+0x283/0x85e
[ 0.027998] RSP: 0018:ffff88000f8b9ea0 EFLAGS: 00000202
[ 0.027998] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000345
[ 0.027998] RDX: 0000000000000003 RSI: 0000000007280202 RDI: ffffffff81a7efa8
[ 0.027998] RBP: ffff88000f8b9eb0 R08: 0000000000000000 R09: ffffffff81a7ee70
[ 0.027998] R10: 0000000000000001 R11: 0000000000000000 R12: ffffffff81aa4955
[ 0.027998] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 0.027998] FS: 0000000000000000(0000) GS:ffff88000fc00000(0000) knlGS:0000000000000000
[ 0.027998] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 0.027998] CR2: 0000000000000000 CR3: 0000000001a05000 CR4: 00000000000006f0
[ 0.027998] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 0.027998] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 0.027998] Process swapper/0 (pid: 1, threadinfo ffff88000f8b8000, task ffff88000f8b0000)
[ 0.027998] Stack:
[ 0.027998] 0000000000000000 ffffffff81b1b550 ffff88000f8b9ef0 ffffffff81aa4989
[ 0.027998] 0000000000000040 ffffffff81a800d8 ffff88000f8b9f20 ffffffff81b1b550
[ 0.027998] ffffffff81aa4955 0000000000000000 ffff88000f8b9f20 ffffffff810002ea
[ 0.027998] Call Trace:
[ 0.027998] [<ffffffff81aa4989>] init_hw_perf_events+0x34/0x3ef
[ 0.027998] [<ffffffff81aa4955>] ? check_bugs+0x2d/0x2d
[ 0.027998] [<ffffffff810002ea>] do_one_initcall+0x7a/0x12c
[ 0.027998] [<ffffffff81a9eb45>] kernel_init+0x7a/0x141
[ 0.027998] [<ffffffff814963b4>] kernel_thread_helper+0x4/0x10
[ 0.027998] [<ffffffff81a9eacb>] ? start_kernel+0x339/0x339
[ 0.027998] [<ffffffff814963b0>] ? gs_change+0xb/0xb
[ 0.027998] Code: 48 d3 e0 48 ff c8 41 ff ca 48 89 05 b4 99 fd ff 7e 2b 83 e2 1f b8 03 00 00 00 83 fa 02 b9 45 03 00 00 0f 4f c2 89 05 91 99 fd ff <0f> 32 48 c1 e2 20 89 c0 48 09 c2 48 89 15 ef 99 fd ff e8 cf be
[ 0.027998] RIP [<ffffffff81aa553f>] intel_pmu_init+0x283/0x85e
[ 0.027998] RSP <ffff88000f8b9ea0>
[ 0.029015] ---[ end trace 4eaa2a86a8e2da22 ]---
[ 0.030006] swapper/0 used greatest stack depth: 5576 bytes left
[ 0.031005] Kernel panic - not syncing: Attempted to kill init!
[ 0.032006] Pid: 1, comm: swapper/0 Tainted: G D 3.2.12-gentoo #1
[ 0.033000] Call Trace:
[ 0.034003] [<ffffffff8148d64e>] panic+0x8c/0x198
[ 0.035005] [<ffffffff8103e1f0>] do_exit+0x98/0x7d8
[ 0.036005] [<ffffffff8103c564>] ? kmsg_dum...

Read more...

Till Schäfer (till2-schaefer) wrote :

i started trace-cmd as suggested on http://www.linux-kvm.org/page/Tracing and started the vm. after the panic i aborted trace-cmd and here is the trace file

Till Schäfer (till2-schaefer) wrote :

i forgot to mention that the PID was 27374

Laurent COUSTET (ed-zehome) wrote :

I had exactly the same problem using custom kernel on Debian GNU/Linux / qemu-kvm 1.1.1 or 1.2.0.

CPU: Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz stepping 7 microcode 0x70b

I have two solutions: (VERY UGGLY) patch the kernel to remove intel perf events (hardware events):

--- ../linux-3.4.10-fai-server/arch/x86/kernel/cpu/perf_event.c 2012-08-27 00:02:10.000000000 +0200
+++ arch/x86/kernel/cpu/perf_event.c 2012-09-10 18:56:11.870774243 +0200
@@ -1324,7 +1328,8 @@
  struct event_constraint *c;
  int err;

- pr_info("Performance Events: ");
+ pr_info("Performance Events: (DISABLED BY UGGLY PATCH)");
+ return 0;

  switch (boot_cpu_data.x86_vendor) {
  case X86_VENDOR_INTEL:

*OR*

Add support for KVM Guest in the *GUEST* kernel config:

+CONFIG_KVM_CLOCK=y
+CONFIG_KVM_GUEST=y
+CONFIG_PARAVIRT=y
+# CONFIG_PARAVIRT_SPINLOCKS is not set
+CONFIG_PARAVIRT_CLOCK=y

Till Schäfer (till2-schaefer) wrote :

changing the guest config to enable paravirt seems to do the trick!

James Le Cuirot (chewi) wrote :

I am seeing more or less the same thing. I am trying to boot Puppy Linux from Gentoo with qemu-kvm-1.1.1-r3 and gentoo-sources-3.6.0. Interestingly, I also found that AROS crashed when using "-cpu host" but I initially chalked that up to AROS being flakey - perhaps not. Obviously I can't enable those kernel options in AROS and I don't want to have to build my own Puppy kernel.

Anton Arapov (arapov) wrote :

You can pass the level=9 param to disable the PMU emulation, this should fix your issue as well. ( e.g. -cpu host,level=9 )

Thomas Huth (th-huth) wrote :

Is there still something left to do here or can we close this bug nowadays?

Changed in qemu:
status: New → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for QEMU because there has been no activity for 60 days.]

Changed in qemu:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers