Ubuntu

[lucid] All linux guests oops in kvm_leave_lazy_mmu during boot (9.04, 9.10, 10.04)

Reported by Roman Yepishev on 2010-03-04
36
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux-2.6 (Debian)
Fix Released
Unknown
linux (Ubuntu)
High
Stefan Bader
Lucid
High
Stefan Bader
qemu-kvm (Ubuntu)
High
Unassigned
Lucid
High
Unassigned

Bug Description

Binary package hint: qemu-kvm

As of 2010-03-04 I am unable to boot any linux guests using KVM.

This is an a part of the logs where Oops happens, full guest boot log is attached:

[ 1.781721] Freeing unused kernel memory: 660k freed
[ 1.809907] Write protecting the kernel text: 4780k
[ 1.832211] Write protecting the kernel read-only data: 1908k
[ 1.844338] BUG: unable to handle kernel paging request at c01292e3
[ 1.844338] IP: [<c01292e3>] kvm_leave_lazy_mmu+0x43/0x70
[ 1.844338] *pde = 0090e067 *pte = 00129161
[ 1.844338] Oops: 0003 [#1] SMP
[ 1.844338] last sysfs file:
[ 1.844338] Modules linked in:
[ 1.844338]
[ 1.844338] Pid: 1, comm: init Not tainted (2.6.32-14-generic #20-Ubuntu) Bochs
[ 1.844338] EIP: 0060:[<c01292e3>] EFLAGS: 00010246 CPU: 0
[ 1.844338] EIP is at kvm_leave_lazy_mmu+0x43/0x70

I was unable to make it work with qemu i686 cpu as well.

qemu-kvm 0.12.3-0ubuntu4
Linux buzz 2.6.32-15-generic #22-Ubuntu SMP Tue Mar 2 02:24:17 UTC 2010 i686 GNU/Linux

ProblemType: Bug
Architecture: i386
CheckboxSubmission: b16b943d4712f4613c50f12b0ffe0cc5
CheckboxSystem: 1fd1d69a420d7665c5bbb30cf0881c53
Date: Thu Mar 4 12:23:28 2010
DistroRelease: Ubuntu 10.04
EcryptfsInUse: Yes
KvmCmdLine:
 UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
 root 5009 1 24 278092 76864 0 12:14 ? 00:02:21 /usr/bin/kvm -S -M pc-0.11 -enable-kvm -m 1024 -smp 1 -name lemon -uuid 0b2b9b75-3141-f051-6eb1-f3279b63013e -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/lemon.monitor,server,nowait -monitor chardev:monitor -boot c -drive if=ide,media=cdrom,index=2 -drive file=/home/rtg/Virtual Machines/lemonium.img,if=virtio,index=0,boot=on -net nic,macaddr=54:52:00:76:1d:5c,vlan=0,model=virtio,name=virtio.0 -net tap,fd=46,vlan=0,name=tap.0 -chardev file,id=serial0,path=/tmp/serial-output.txt -serial chardev:serial0 -parallel none -usb -vnc 127.0.0.1:0 -k en-us -vga cirrus
MachineType: Acer Aspire 5520
NonfreeKernelModules: nvidia
Package: qemu-kvm 0.12.3-0ubuntu4
ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.32-15-generic root=/dev/mapper/vg00-root ro quiet splash
ProcEnviron:
 SHELL=/bin/bash
 PATH=(custom, user)
 LANG=en_US.UTF-8
ProcVersionSignature: Ubuntu 2.6.32-15.22-generic
SourcePackage: qemu-kvm
Uname: Linux 2.6.32-15-generic i686
dmi.bios.date: 05/06/2008
dmi.bios.vendor: Acer
dmi.bios.version: V1.33
dmi.board.name: Fuquene
dmi.board.vendor: Acer
dmi.board.version: N/A
dmi.chassis.type: 10
dmi.chassis.vendor: Acer
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnAcer:bvrV1.33:bd05/06/2008:svnAcer:pnAspire5520:pvrV1.33:rvnAcer:rnFuquene:rvrN/A:cvnAcer:ct10:cvrN/A:
dmi.product.name: Aspire 5520
dmi.product.version: V1.33
dmi.sys.vendor: Acer

Roman Yepishev (rye) wrote :
Roman Yepishev (rye) on 2010-03-04
summary: - [lucid] All linux guests oops during boot (9.04, 9.10, 10.04)
+ [lucid] All linux guests oops in kvm_leave_lazy_mmu during boot (9.04,
+ 9.10, 10.04)
Chuck Short (zulcss) wrote :

Which version of kvm are you using?

Regards
chuck

Changed in qemu-kvm (Ubuntu):
importance: Undecided → Medium
status: New → Incomplete
Roman Yepishev (rye) wrote :

Hello,

I am experiencing this issue with:
qemu-kvm 0.12.3-0ubuntu4
Linux buzz 2.6.32-15-generic #22-Ubuntu SMP Tue Mar 2 02:24:17 UTC 2010 i686 GNU/Linux

I was able to start the vms by rebooting into
Linux buzz 2.6.32-14-generic #20-Ubuntu SMP Sat Feb 20 05:38:50 UTC 2010 i686 GNU/Linux

So there is something being done in -15 release that is causing this issue.

Changed in qemu-kvm (Ubuntu):
status: Incomplete → New
Changed in linux (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Changed in qemu-kvm (Ubuntu):
status: New → Confirmed
importance: Medium → High
tags: added: regression-potential
Stefan Bader (smb) on 2010-03-04
Changed in linux (Ubuntu):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
Dustin Kirkland  (kirkland) wrote :

Roman, out of curiosity, do you still see the oops if you boot without virtio disks (and use scsi or ide instead)?

Roman Yepishev (rye) wrote :

Dustin,

That happens even when I'm just trying to start ubuntu cd off an ide emulation. The guest kernel finds out that it runs in kvm and commits suicide.

Okay thanks. We're working on getting that host kernel some therapy ;-)

Stefan Bader (smb) wrote :

Ok, so it seems that there is one change which seems responsible for this:

   KVM: fix memory access during x86 emulation.

Though I cannot be completely sure this is the only thing causing problems (my system here is too erratic with gfx) I have placed some kernels without this at http://people.canonical.com/~/smb/bug531823. Maybe someone else can confirm whether this solves the problem. Thanks.

Roman Yepishev (rye) wrote :

@Stefan,

I can confirm that guests no longer die with the host kernel version you placed.
Linux buzz 2.6.32-15-generic #22+kvmfix1 SMP Thu Mar 4 22:13:32 UTC 2010 i686 GNU/Linux

P.S. The correct link is http://people.canonical.com/~smb/bug531823.

Michael Vogt (mvo) wrote :

I ran into the same problem today and the kernel from http://people.canonical.com/~smb/bug531823 fixes the bug for me. Good work :)

Dustin Kirkland  (kirkland) wrote :

Marking invalid against the qemu-kvm userspace, and in-progress against the kernel. Looks like Stefan has this in hand.

Stefan, can we get this into the next kernel build (if it's not already?)?

Changed in linux (Ubuntu):
status: Confirmed → In Progress
Changed in qemu-kvm (Ubuntu):
status: Confirmed → Invalid

Dustin Kirkland wrote:
 > Stefan, can we get this into the next kernel build (if it's not
> already?)?

The next Lucid kernel has reverted all the KVM patches until we find out a
patchset that works. I am currently trying to get upstream involved as it looks
to me the same thing would happen on stable.

Stefan Bader (smb) wrote :

I am adding another dmesg dump plus a matching kvm_leave_lazy_mmu disassembly.

Thierry Carrez (ttx) on 2010-03-11
Changed in linux (Ubuntu):
milestone: none → ubuntu-10.04-beta-1
Stefan Bader (smb) wrote :

While I want to wait for this to further go through the process upstream, there is a fix for this issue now. And some explanation why this was observed only by some people. It seems the bug was only observable on AMD based systems which seem to need certain hypercall instructions patched where Intel CPUs do not. And that code happened to be in a write-protected section which is protected by the patch that introduced the problem.
So the fix is to allow access without checking for protections when kvm itself wants to modify an instruction. Many thanks to Marcelo Tosatti for helping on this.

tags: added: patch
Andy Whitcroft (apw) wrote :

The patches which triggered this issue have been backed out for beta-1. The additional patch to fix the issue is now known. The combination will be applied and tested together to avoid a recurrence. This will happen after bete-1.

Changed in linux (Ubuntu Lucid):
status: In Progress → Fix Released
lavinog (lavinog) wrote :

I am experiencing some poor performance with beta-1 in qemu-kvm (didn't have this issue last month.)
Would the removal of the patches (mentioned in comment #14) be the cause of this?
I am using an AMD host, and just moving the mouse will cause the host cpu to spike. I tested an image of karmic, and everything works.

Stefan Bader (smb) wrote :

lavinog wrote:
> I am experiencing some poor performance with beta-1 in qemu-kvm (didn't have this issue last month.)
> Would the removal of the patches (mentioned in comment #14) be the cause of this?
> I am using an AMD host, and just moving the mouse will cause the host cpu to spike. I tested an image of karmic, and everything works.
>
The chances for that should be small. The patches removed were only present for
on version and the whole batch caused guests to crash on boot. So it either
worked and you had not the batch o it didn't

Changed in linux-2.6 (Debian):
status: Unknown → Confirmed
Gionn (giovanni.toraldo) wrote :

Any updates on this? I am testing it with the lucid release candidate, the oops still remains.

Stefan Bader (smb) wrote :

I cannot see this happen here. I just tested with 2.6.32-21.32 as host kernel for i386 and amd64. What host/guest kernel combination are you using and what type of CPU do you have?

Gionn (giovanni.toraldo) wrote :

I am using Debian Lenny as host (linux 2.6.32-bpo.3-amd64, qemu 0.12.3+dfsg-4~bpo50+2), but I was thinking that the bug was related to Ubuntu guest, isn't it? should I file a bug on Debian side?

Gionn (giovanni.toraldo) wrote :

Sorry, I didn't mention that I have AMD cpu (AMD Athlon(tm) 64 X2 Dual Core Processor 4600+).

Stefan Bader (smb) wrote :

It is a bug related to the host side. There are patches in the upstream stable
tree that will fix it. But that is just in review. Depending on how Debian picks
up stable it would get fixed then 2.6.32.12 is pulled in. So it is not related
to your Ubuntu guest.

Ben Hutchings (benh-debian) wrote :

Stefan; Giovanni: We will include the fix in Debian kernel version 2.6.32-12. The Debian bug report now includes a reference to the fix.

Changed in linux-2.6 (Debian):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.