'BUG: soft lockup' after lvm or lvm+encrypt install when using 'kvm -drive'

Bug #221032 reported by Jamie Strandboge on 2008-04-23
4
Affects Status Importance Assigned to Milestone
kvm (Ubuntu)
High
Tim Gardner

Bug Description

Using Ubuntu Server iso for amd64 [20080423.2] from http://iso.qa.ubuntu.com/qatracker/info/1612, I get the attached screenshot after installing lvm+encrypt. After entering the LUKS passphrase, I get:

[ 0.000000] BUG: soft lockup - CPU#0 stuck for 11s [lvm:2228]
[ 0.000000] BUG: soft lockup - CPU#1 stuck for 11s [udevd:2230]

over and over again. I used kvm created with virt-manager with 128M Ram, 1G non-preallocated disk, Ubuntu/Hardy, and 2 vcpus.

This also affects a regular 'lvm' install when using the above configuration.

Jamie Strandboge (jdstrand) wrote :

Reproduced again with the above configuration. Also tried with 2 vcpus, 256MB, 4G disk: same error. For completeness, my LUKS password was 'foo' and answered 'yes' to using weak encryption.

I use virt-manager, and the resulting kvm invocations end up being:
/usr/bin/kvm -M pc -m 256 -smp 2 -monitor pty -no-reboot -drive file=/srv/vms/isos/hardy/hardy-server-amd64.iso,if=ide,media=cdrom,boot=on -drive file=/home/jamie/hardy-crypt1.img,if=ide -net nic,macaddr=00:16:3e:4f:d9:81,vlan=0,model=virtio -net tap,fd=12,script=,vlan=0 -usb -vnc 127.0.0.1:1

/usr/bin/kvm -M pc -m 128 -smp 2 -monitor pty -no-reboot -drive file=/srv/vms/isos/hardy/hardy-server-amd64.iso,if=ide,media=cdrom,boot=on -drive file=/home/jamie/hardy-crypt2.img,if=ide -net nic,macaddr=00:16:3e:3b:54:6f,vlan=0,model=virtio -net tap,fd=17,script=,vlan=0 -usb -vnc 127.0.0.1:2

Jamie Strandboge (jdstrand) wrote :

Not sure if this is helpful, but tried again and found this in kern.log when launching the vm:

Apr 23 12:16:43 severus kernel: [16801.095974] device vnet2 entered promiscuous mode
Apr 23 12:16:43 severus kernel: [16801.095987] audit(1208967403.109:56): dev=vnet2 prom=256 old_prom=0 auid=4294967295
Apr 23 12:16:43 severus kernel: [16801.100399] vnet0: port 2(vnet2) entering listening state
Apr 23 12:16:43 severus kernel: [16801.694888] SIPI to vcpu 1 vector 0x10
Apr 23 12:16:49 severus kernel: [16807.109080] Ignoring de-assert INIT to vcpu 1
Apr 23 12:16:49 severus kernel: [16807.109905] SIPI to vcpu 1 vector 0x06
Apr 23 12:16:49 severus kernel: [16807.155992] SIPI to vcpu 1 vector 0x06
Apr 23 12:16:53 severus kernel: [16811.880659] vnet2: no IPv6 routers present
Apr 23 12:16:58 severus kernel: [16816.072131] vnet0: port 2(vnet2) entering learning state
Apr 23 12:17:13 severus kernel: [16831.041663] vnet0: topology change detected, propagating
Apr 23 12:17:13 severus kernel: [16831.041671] vnet0: port 2(vnet2) entering forwarding state
Apr 23 12:17:23 severus kernel: [16841.045164] heci: schedule work the heci_bh_handler failed error=0

The 'heci' line keeps repeating.

Jamie Strandboge (jdstrand) wrote :

In my testing, I did have a 2 vcpu vm a) reboot successfully once, b) get hung at the LUKS password prompt. Hang might be bug #221059.

My host system is a dual core 64bit 'Genuine Intel(R) CPU 3.00GHz' processor.

Chuck Short (zulcss) wrote :

I was not able to repdroduce this on a core 2 duo i386 with the same iso on real hardware.

Thanks
chuck

Jamie Strandboge (jdstrand) wrote :

Works fine if install the machine with:
$ qemu-img create ./hardy-crypt.img 1G
$ kvm -hda ./hardy-crypt.img -cdrom /srv/vms/isos/hardy/hardy-server-amd64.iso -m 128 -vnc localhost:10 -smp 2 -boot d
$ kvm -hda ./hardy-crypt.img -m 128 -vnc localhost:10 -smp 2

Note that reboots may hit bug #221059. This bug may be related to bug #220463.

description: updated
Jamie Strandboge (jdstrand) wrote :

For kicks I removed 'acpi' from the libvirt guest, and got the same thing. Possibly the difference between specifying '-hda' vs '-drive'?

Jamie Strandboge (jdstrand) wrote :

All testing was initially done with a '-generic' host kernel and a '-server' guest kernel. This configuration resulted in the soft lockups. If I use the '-server' host kernel and the same '-server' guest kernel, then the machine boots with no soft lockups.

Jamie Strandboge (jdstrand) wrote :

UPDATE:

-server kernel gives far few lockups with kvm62 on hardy, but it locked up twice for me. Here is my testing so far (all tests have -server guest kernel):

kvm66 with -generic host kernel: 11 FAIL 1 PASS
kvm66 with -server host kernel: 5 FAIL 0 PASS
kvm62 (hardy) -generic host kernel: 13 FAIL 0 PASS
kvm62 (hardy) -server host kernel: 2 FAIL 7 PASS

'FAIL' means soft lockup encountered and 'PASS' means successful boot to login prompt. Note that for each test I start with a blank slate by stopping all vms, killing all kvm processes, then reloading the kvm driver.

Soren Hansen (soren) wrote :

I figured I'd put my data in here as well in case anyone else feels like helping out. For some odd reason, I need to have quite a few vcpu's before I run into problems:

kvm62, generic kernel, -smp 2: 17 PASS 0 FAIL
kvm62, generic kernel, -smp 12: 6 PASS 1 FAIL
kvm62, generic kernel, -smp 16: 9 PASS 1 FAIL

Tim Gardner (timg-tpi) wrote :
Changed in kvm:
assignee: nobody → timg-tpi
importance: Undecided → High
milestone: none → ubuntu-8.04.1
status: New → Fix Committed
danyj028 (danyj028-gmail) wrote :

Hello

I have had issues with KVM (amd) on Hardy with smp option, with or without acpi - makes no difference.
Recompiling, installing and using kvm from latest source did not make any difference.

It freezes on bootup, or in X, - sometimes, keyboard stil works but mouse does not work anymore. The faults are not reliably reproduceable, Also the tap networking option sometimes fail. When X freezes, only option is to kill host X (ctrl-alt-bkspce , cant even get back to host).

Note that the same VM and kvm call worked OK with Feisty. (Don't know about Gutsy, I usually only use every 2nd release)

However all seem to be perfectly once the smp option is removed - using it now to type this email.

Tim Gardner (timg-tpi) wrote :

SRU Justification:

Impact: Host kernel hangs when starting KVM guests.
Fix Description: Don't pre-fetch guest pages until after they have been mapped into physical memory.
Patch: http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=commit;h=a7c28c405ae69a9493b04d3b6209c0bd06472afb

TEST CASE: Install Hardy server ISO into a kvm, select lvm+encrypt. Host kernel randomly hangs.

Soren Hansen (soren) wrote :

Correction:

SRU justification:

Impact: Guests hang during boot if they're using lvm and smp.
Fix description: Mark pages as write protected *before* they're prefetched, so that the host can handle page faults in the guest.
Patch: http://kernel.ubuntu.com/git?p=ubuntu/ubuntu-hardy.git;a=commit;h=a7c28c405ae69a9493b04d3b6209c0bd06472afb

TEST CASE: Boot a hardy guest that uses root on lvm (create it with the guided lvm partitioning in the installer, if you don't already have one) like so:

    kvm -smp 8 -drive file=/path/to/your/disk.img,boot=on -m 256

Depending on your host hardware, this will hang anywhere between 10% and 99.9% of the time without the patch applied. With the patch applied, the problem should go away completely.

Colin Watson (cjwatson) wrote :

Accepted into hardy-proposed.

Jamie Strandboge (jdstrand) wrote :

I am in the 99.9% hang group Soren mentioned, and updating to linux 2.6.24-17.31 fixes the problem for me.

Martin Pitt (pitti) wrote :

linux 2.6.24-17.31 copied to hardy-updates.

Changed in kvm:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers