Bug #1308341 “Multiple CPUs causes blue screen on Windows guest ...” : Bugs : qemu package : Ubuntu

Revision history for this message

Hein Gustavsen (hein-gustavsen) wrote on 2014-04-16:

#1

Blue screen Edit (20.9 KiB, image/png)

Revision history for this message

Hein Gustavsen (hein-gustavsen) wrote on 2014-04-16:

#2

Guest configuration XML from libvirt Edit (2.2 KiB, application/xml)

Revision history for this message

Hein Gustavsen (hein-gustavsen) wrote on 2014-04-16:

#3

The command line used to start the guest (from log file):
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm-spice -name win7-test -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 4,sockets=1,cores=4,threads=1 -uuid bc6a3c93-2221-4b61-ed29-07edda0a2043 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/win7-test.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/mnt/sw-test-nas/win7-test.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=23,id=hostnet0,vhost=on,vhostfd=34 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d6:60:55,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:2 -device VGA,id=video0,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6

Hein Gustavsen (hein-gustavsen) on 2014-04-16

description:	updated
description:	updated
description:	updated

Revision history for this message

Launchpad Janitor (janitor) wrote on 2014-05-07:

#4

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qemu-kvm (Ubuntu):
status:	New → Confirmed

Revision history for this message

Krzysztof Cybulski (krzysiek-cybulski) wrote on 2014-05-07:

#5

BSOD Edit (12.8 KiB, image/png)

I have Windows 7 32bit, and Windows 2008 R2 both expirence this problem, info from Windows 7 BSOD
Host system for this VM is Dell R510, qemu-kvm_2.0.0~rc1+dfsg-0ubuntu3_amd64.deb

VM command line:

qemu-system-x86_64 -enable-kvm -name win7_kc -S -machine pc-1.0,accel=kvm,usb=off -cpu kvm64,+rdtscp,+pdpe1gb,+dca,+xtpr,+tm2,+est,+vmx,+ds_cpl,+monitor,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -m 2048 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 6358fe75-bef9-3b4a-da4e-d0842e880d4f -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/win7_kc.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive file=/home/VM/win7_kc.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/mnt/a/virtio-win-0.1-74.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=31,id=hostnet0,vhost=on,vhostfd=34 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:09:18:1c,bus=pci.0,addr=0x3 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5901,addr=10.50.0.11,disable-ticketing,plaintext-channel=main,image-compression=auto_glz,seamless-migration=on -k pl -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x6 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

I have Windows 7 32bit, and Windows 2008 R2 both expirence this problem, info from Windows 7 BSOD
Host system for this VM is Dell R510,  qemu-kvm_2.0.0~rc1+dfsg-0ubuntu3_amd64.deb

VM command line:

qemu-system-x86_64 -enable-kvm -name win7_kc -S -machine pc-1.0,accel=kvm,usb=off -cpu kvm64,+rdtscp,+pdpe1gb,+dca,+xtpr,+tm2,+est,+vmx,+ds_cpl,+monitor,+pbe,+tm,+ht,+ss,+acpi,+ds,+vme -m 2048 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 6358fe75-bef9-3b4a-da4e-d0842e880d4f -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/win7_kc.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive file=/home/VM/win7_kc.img,if=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/mnt/a/virtio-win-0.1-74.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -netdev tap,fd=31,id=hostnet0,vhost=on,vhostfd=34 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:09:18:1c,bus=pci.0,addr=0x3 -chardev spicevmc,id=charchannel0,name=vdagent -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 -device usb-tablet,id=input0 -spice port=5901,addr=10.50.0.11,disable-ticketing,plaintext-channel=main,image-compression=auto_glz,seamless-migration=on -k pl -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,bus=pci.0,addr=0x2 -device intel-hda,id=sound0,bus=pci.0,addr=0x6 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -chardev spicevmc,id=charredir0,name=usbredir -device usb-redir,chardev=charredir0,id=redir0 -chardev spicevmc,id=charredir1,name=usbredir -device usb-redir,chardev=charredir1,id=redir1 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

Revision history for this message

Krzysztof Cybulski (krzysiek-cybulski) wrote on 2014-05-07:

#6

win7_kc.xml Edit (3.8 KiB, application/xml)

libvirt xml

Revision history for this message

Hein Gustavsen (hein-gustavsen) wrote on 2014-05-09:

#7

BTW, my installation was an upgrade from Ubuntu 10.04 to 12.04. Motherboard is a dual socket Xeon fra ASUS with two E5-2630 v2 CPUs.

Revision history for this message

Hein Gustavsen (hein-gustavsen) wrote on 2014-05-09:

#8

Sorry, I meant from 12.04 to 14.04. 12.04 was a fresh installation. Hyper-threading is enabled.

Revision history for this message

Krzysztof Cybulski (krzysiek-cybulski) wrote on 2014-05-12:

#9

My instalation was upgraded from 12.04 to 14.04, as well. My machine have 2 CPU, so I set Windows 7 VM to be the only guest using CPU2 (1,3,5,7,9,11,13,15) , the error still persists.

Revision history for this message

Krzysztof Cybulski (krzysiek-cybulski) wrote on 2014-05-13:

#10

It look like adding "hyperv" in "features" section to guest definition helps, my Win7 VM now is running for ~12h, when without "hyperv" it was like 3-4 hour. I will test it for few days and will post here again.

Revision history for this message

Hein Gustavsen (hein-gustavsen) wrote on 2014-05-14:

#11

Adding "hyperv" seemed to work for me too.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-05-15:

#12

Thanks, it sounds like at least we should have that enabled by default when, in virt-manager, a windows guest is selected.

Changed in qemu (Ubuntu):
importance:	Undecided → High
status:	New → Confirmed
no longer affects:	qemu-kvm (Ubuntu)
summary:	- Multiple CPUs causes blue screen on Windows guest + Multiple CPUs causes blue screen on Windows guest (14.04 regression)

Revision history for this message

Launchpad Janitor (janitor) wrote on 2014-05-15:

#13

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in virt-manager (Ubuntu):
status:	New → Confirmed

Revision history for this message

Hein Gustavsen (hein-gustavsen) wrote on 2014-05-20:

#14

After adding "hyperv" feature, the guest freezes regularly. This happens on both and Windows 7 64-bit and Windows 2012 R2 guests. When removing the "hyperv" feature the guest acts normally, but fails with a blue screen as before. This may be a completely different issue, but this renders the workaround unusable for me at least.

Revision history for this message

Gordon Kaltofen (kaltofen) wrote on 2014-05-26:

#15

Hallo to all, this is my first post here.

I have exactly the same problem occurred after Distribution Update Ubuntu Server x64 from 12.04.4 to 14.04.

1. I have Windows 7 32/64-Bit and Windows 2008 Server 64-Bit VMs, all show the same error with two dedicated cores (no pinning). In combination with the other statements I would say it is a general Windows problem - not specific.

2. I have an AMD Opteron 6272 (fam: 15, model: 01, stepping: 02, 16 cores) system. Therefore, this problem does not seem to be Intel/AMD architecture-specific.

3. I configured a couple of VMs ONE core and let it run over the weekend. They didn't crashing, but they reacted only very slowly an choppy. It seems that there is a fundamental error, which is responsible for the multi-core errors. After restarting the VM, the error is initially gone, even though the VM is still slow due to only one core.

4. I have the latest virtio drivers are installed in the Windows guest systems and use the devices Red Hat VirtIO SCSI and Ethernet (vers. 61.65.104.7400) drivers. Are these drivers installed in your VMs or do you use the IDE/SATA and RTL/Intel-NIC standard driver?

5. The VM images (qcow2) are located on a mdadm Raid1 volume of two SSDs. Since Linux kernel 3.7 ATA TRIM is possible with Linux software RAID, so I use the mount option 'discard'. I do not want to completely exclude the possibility that the error has to do with it.

Is there now an indication of the cause of the failure and possibly even a workaround?

Revision history for this message

Krzysztof Cybulski (krzysiek-cybulski) wrote on 2014-06-16:

#16

I have done clean install of the server and yes, Windows VM freezes with hyperv before as well as after reinstall. I'm have reverted my servers to 12.04.4 until this is solved.
Krzysiek

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-06-17:

#17

Tried to reproduce this overnight with a windows 8 instance run by hand with 4 cores, but no hang. I'll keep trying with some more options added from your command line.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-06-18:

#18

-smp 4 -realtime mlock=off -rtc base=localtime does not seem to help me reproduce this.

Does the system have to be under stress?

Can you reproduce this without virtio?

Revision history for this message

Steve (lp-z) wrote on 2014-06-18:

#19

I was able to work around this by downgrading the kernel on a Ubuntu 14 box to 3.12.20-031220-generic #201405160935 (and of course wasn't seeing this with Ubuntu 12).

I've periodically tried booting back to the standard Ubuntu 14 3.13 kernel to see if it's been fixed (and also tried 3.13-lowlatency) but I get a W2k8R2 server hang with KVM within the first ~24 hours of boot each time.

This is a dual-processor machine. Also, with 3.13, I was getting these messages on a semi-periodic basis (may be related):

May 30 20:23:53 kernel: [ 0.000000] Linux version 3.13.0-27-lowlatency (buildd@akateko) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #50-Ubuntu SMP PREEMPT Thu May 15 18:36:04 UTC 2014 (Ubuntu 3.13.0-27.50-lowlatency 3.13.11

May 31 14:15:40 kernel: [64348.760175] INFO: task qemu-system-x86:4151 blocked for more than 120 seconds.
May 31 14:15:40 kernel: [64348.767491] Not tainted 3.13.0-27-lowlatency #50-Ubuntu
May 31 14:15:40 kernel: [64348.773291] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 14:15:40 kernel: [64348.781205] qemu-system-x86 D ffff881fffc34600 0 4151 1 0x00000000
May 31 14:15:40 kernel: [64348.781210] ffff881fcf5e3de8 0000000000000002 ffff881fbf140000 ffff881fcf5e3fd8
May 31 14:15:40 kernel: [64348.781215] 0000000000014600 0000000000014600 ffff881fbf140000 ffff881fbf140000
May 31 14:15:40 kernel: [64348.781218] ffff883fcfac7060 ffff883fcfac7068 00007f3809e00000 ffff881fbf140000
May 31 14:15:40 kernel: [64348.781221] Call Trace:
May 31 14:15:40 kernel: [64348.781230] [<ffffffff81722b89>] schedule+0x29/0x70
May 31 14:15:40 kernel: [64348.781237] [<ffffffff8172552d>] rwsem_down_read_failed+0xcd/0x130
May 31 14:15:40 kernel: [64348.781243] [<ffffffff81374b04>] call_rwsem_down_read_failed+0x14/0x30
May 31 14:15:40 kernel: [64348.781247] [<ffffffff81725007>] ? down_read+0x17/0x20
May 31 14:15:40 kernel: [64348.781252] [<ffffffff810a0db2>] task_numa_work+0xd2/0x300
May 31 14:15:40 kernel: [64348.781254] [<ffffffff8109f87b>] ? account_user_time+0x8b/0xa0
May 31 14:15:40 kernel: [64348.781259] [<ffffffff81089e87>] task_work_run+0xa7/0xe0
May 31 14:15:40 kernel: [64348.781264] [<ffffffff81014e57>] do_notify_resume+0x97/0xb0
May 31 14:15:40 kernel: [64348.781268] [<ffffffff8172e52a>] int_signal+0x12/0x17

I'm not seeing any kernel errors with the 3.12 kernel.

I was able to work around this by downgrading the kernel on a Ubuntu 14 box to 3.12.20-031220-generic #201405160935 (and of course wasn't seeing this with Ubuntu 12).

I've periodically tried booting back to the standard Ubuntu 14 3.13 kernel to see if it's been fixed (and also tried 3.13-lowlatency) but I get a W2k8R2 server hang with KVM within the first ~24 hours of boot each time.

This is a dual-processor machine.  Also, with 3.13, I was getting these messages on a semi-periodic basis (may be related):

May 30 20:23:53 kernel: [    0.000000] Linux version 3.13.0-27-lowlatency (buildd@akateko) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #50-Ubuntu SMP PREEMPT Thu May 15 18:36:04 UTC 2014 (Ubuntu 3.13.0-27.50-lowlatency 3.13.11

May 31 14:15:40 kernel: [64348.760175] INFO: task qemu-system-x86:4151 blocked for more than 120 seconds.
May 31 14:15:40 kernel: [64348.767491]       Not tainted 3.13.0-27-lowlatency #50-Ubuntu
May 31 14:15:40 kernel: [64348.773291] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 31 14:15:40 kernel: [64348.781205] qemu-system-x86 D ffff881fffc34600     0  4151      1 0x00000000
May 31 14:15:40 kernel: [64348.781210]  ffff881fcf5e3de8 0000000000000002 ffff881fbf140000 ffff881fcf5e3fd8
May 31 14:15:40 kernel: [64348.781215]  0000000000014600 0000000000014600 ffff881fbf140000 ffff881fbf140000
May 31 14:15:40 kernel: [64348.781218]  ffff883fcfac7060 ffff883fcfac7068 00007f3809e00000 ffff881fbf140000
May 31 14:15:40 kernel: [64348.781221] Call Trace:
May 31 14:15:40 kernel: [64348.781230]  [<ffffffff81722b89>] schedule+0x29/0x70
May 31 14:15:40 kernel: [64348.781237]  [<ffffffff8172552d>] rwsem_down_read_failed+0xcd/0x130
May 31 14:15:40 kernel: [64348.781243]  [<ffffffff81374b04>] call_rwsem_down_read_failed+0x14/0x30
May 31 14:15:40 kernel: [64348.781247]  [<ffffffff81725007>] ? down_read+0x17/0x20
May 31 14:15:40 kernel: [64348.781252]  [<ffffffff810a0db2>] task_numa_work+0xd2/0x300
May 31 14:15:40 kernel: [64348.781254]  [<ffffffff8109f87b>] ? account_user_time+0x8b/0xa0
May 31 14:15:40 kernel: [64348.781259]  [<ffffffff81089e87>] task_work_run+0xa7/0xe0
May 31 14:15:40 kernel: [64348.781264]  [<ffffffff81014e57>] do_notify_resume+0x97/0xb0
May 31 14:15:40 kernel: [64348.781268]  [<ffffffff8172e52a>] int_signal+0x12/0x17

I'm not seeing any kernel errors with the 3.12 kernel.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-06-18:

#20

Thanks, given that info it seems clear to be a kernel and not a qemu bug.

no longer affects:

virt-manager (Ubuntu)

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-06-18:

#21

(Removed the task against virt-manager since hyperv is apparently *not* a safe workaround in all cases)

Revision history for this message

Brad Figg (brad-figg) wrote on 2014-06-18: Missing required logs.

#22

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1308341

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete
tags:	added: trusty

Revision history for this message

Steve (lp-z) wrote on 2014-06-20:

#23

marking as confirmed, see bug 1332409 with the apport-collect information.

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

Hein Gustavsen (hein-gustavsen) wrote on 2014-06-24:

#24

Re-installing 14.04 fixed my problem. Running with the same virtual machine configurations on the same hardware without any problems. No hyperv feature needed.

Revision history for this message

urusha (urusha) wrote on 2014-06-24:

#25

Could it be DUP of https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1307473 ?

Revision history for this message

Hein Gustavsen (hein-gustavsen) wrote on 2014-06-24:

#26

I agree. This seems to me like a duplicate of bug 1307473.

Revision history for this message

Fred Thoma (drulenberg) wrote on 2015-01-31:

#27

Just wanted to add that upgrading my kernel to a newer version fixed the problem for me, too.

Host: 2x E5-2620V2, Ubuntu 14.04 LTS
Guest: 24 virtual cores, Windows Server 2008 R2

Before fix:
sudo uname -a
Linux x.contabo.net 3.13.0-44-generic #73-Ubuntu SMP Tue Dec 16 00:22:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Bluescreen stop 0x0000005c every few hours

After fix:
sudo uname -a
Linux x.contabo.net 3.16.0-23-generic #31-Ubuntu SMP Tue Oct 21 17:56:17 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
No Bluescreens or other crashes since 7 days under full load

Upgraded with this tutorial http://askubuntu.com/questions/541775/how-can-i-install-ubuntu-14-10s-kernel-in-ubuntu-14-04-lts

Revision history for this message

Fred Thoma (drulenberg) wrote on 2015-02-02:

#28

Same bluescreen again on day 9 after the kernel upgrade.

So upgrading Kernel from 3.13 to 3.16 did not help.

Still looking for a fix.

Revision history for this message

Peter Mráz (etki) wrote on 2015-05-18:

#29

I have same problem after crash not help restarting virtual pc on next boot bsod with c5 code persist. I must force off machine and pover on.

Revision history for this message

Procion (klebed) wrote on 2015-07-18:

#30

Same issue there. 2 VMs with 2008 sp2 x86, and 2008 R2 sp1 x64 hanging simultaneously with BSOD stop 0x0000005c (0x0000010b 0x00000003 0x00000000)
Issue arrised after upgrading kernel from 3.12 to 3.13.
Nothing helps to workaround this issue so far.

Revision history for this message

Cristian Aires (caires-droid) wrote on 2015-12-08:

#31

Same problem
I using kernel 3.16.0-55-generic, Ubuntu 14.04

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2015-12-08:

#32

Hi,

could you please file a new bug with debugging information as per https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1346917/comments/11 ?

Ubuntu
qemu package

Multiple CPUs causes blue screen on Windows guest (14.04 regression)

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to
QEMU	New	Undecided	Unassigned
linux (Ubuntu)	Confirmed	Undecided	Unassigned
qemu (Ubuntu)	Confirmed	High	Unassigned

Ubuntuqemu package

Multiple CPUs causes blue screen on Windows guest (14.04 regression)

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
qemu package