Ubuntu 12.04 + QEmu 2.0 + KSM = 1 + OVS, makes Windows 2008 R2 guests to crash

Bug #1534049 reported by wangwenjian
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

hi,

Recently I met a platform case, troubled me for a long time, is there anyone encountered this problem?

Environment are as follows:

Openstack environment build with fuel.
Controller node: 3
Compute node: 30
Ceph node:9
windows virtio driver version : 61.71.104.10000
Ubuntu 12.04.4 LTS
QEMU emulator version 2.0.0 (Debian 2.0.0 + dfsg-2ubuntu1.9), Copyright (c) 2003-2008 Fabrice Bellard

root@node-96:~# ovs-vsctl --version
ovs-vsctl (Open vSwitch) 2.0.2
Compiled Nov 28 2014 21:37:07

Symptoms:
The guest of Windows virtual machines on one host occasional crash off and automatically restart. After the restart the network NIC is automatically disabled. Can't allocate ip address with dhcp. Soft reboot is not taking effect, only through hard reboot to restore the card back.

Note:
1. The crashed Windows host focused on a single physical node(HW RH2285), although there are nodes with the same type of machines, but no similar problems to happened.
Maybe it is ovs's bug, cause windows vm received irregularly packets, then resulting in windows nic crash out, later Windows system crash.
2. when windows vm crashed, there are several windows vm crash simultaneously. (about 3 or 4 not all of them)

At first i thought it was the problem of Windows virtio drivers , but the upgrade windows virtio driver is useless. It feels like qemu driver problem. i am not sure about that.

Also, I'm not sure whether this bug and the following related. I have to follow the bellow case turn off the KSM parameters on HOST, currently in testing. If someone run into the same case, please reply. Thanks.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1346917
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1338277

dmesg log:
[13766077.712750] init: libvirt-bin main process (35678) killed by KILL signal
[13766077.712822] init: libvirt-bin main process ended, respawning
[13766081.675377] ip_set: protocol 6
[13770171.259174] qbre991247b-d0: port 2(tape991247b-d0) entered disabled state
[13770171.266161] device tape991247b-d0 left promiscuous mode
[13770171.266200] qbre991247b-d0: port 2(tape991247b-d0) entered disabled state
[13770203.296136] device tape991247b-d0 entered promiscuous mode
[13770203.329022] qbre991247b-d0: port 2(tape991247b-d0) entered forwarding state
[13770203.329040] qbre991247b-d0: port 2(tape991247b-d0) entered forwarding state
[13770204.527595] kvm: zapping shadow pages for mmio generation wraparound
[13771734.263654] qbre991247b-d0: port 2(tape991247b-d0) entered disabled state
[13771734.263704] qbre991247b-d0: port 1(qvbe991247b-d0) entered disabled state
[13771847.638690] qbre991247b-d0: port 2(tape991247b-d0) entered forwarding state
[13771847.638742] qbre991247b-d0: port 2(tape991247b-d0) entered forwarding state
[13771847.638758] qbre991247b-d0: port 1(qvbe991247b-d0) entered forwarding state
[13771847.638770] qbre991247b-d0: port 1(qvbe991247b-d0) entered forwarding state
[13784647.176340] qbr03992610-e3: port 1(qvb03992610-e3) entered disabled state
[13784668.538526] qbrc9002954-09: port 1(qvbc9002954-09) entered disabled state
[13792069.237135] qbre991247b-d0: port 2(tape991247b-d0) entered disabled state
[13792069.246187] device tape991247b-d0 left promiscuous mode
[13792069.246215] qbre991247b-d0: port 2(tape991247b-d0) entered disabled state
[13792070.174570] device tape991247b-d0 entered promiscuous mode
[13792070.207159] qbre991247b-d0: port 2(tape991247b-d0) entered forwarding state
[13792070.207181] qbre991247b-d0: port 2(tape991247b-d0) entered forwarding state
[13792071.041157] kvm: zapping shadow pages for mmio generation wraparound
[13794383.653582] qbre991247b-d0: port 2(tape991247b-d0) entered disabled state
[13794383.666387] device tape991247b-d0 left promiscuous mode
[13794383.666413] qbre991247b-d0: port 2(tape991247b-d0) entered disabled state
[13794384.468924] device tape991247b-d0 entered promiscuous mode
[13794384.501689] qbre991247b-d0: port 2(tape991247b-d0) entered forwarding state
[13794384.501710] qbre991247b-d0: port 2(tape991247b-d0) entered forwarding state

/var/log/libvirt/qemu/instance-0000304b.log
qemu: terminating on signal 15 from pid 138887
2016-01-11 05:16:04.937+0000: shutting down
2016-01-11 05:16:05.709+0000: starting up
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/kvm-spice -name instance-0000304b -S -machine pc-i440fx-trusty,accel=kvm,usb=off -cpu Westmere -m 8192 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid 4a6391d2-a26f-448b-a693-ba4b10d6ee6d -smbios type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=2014.2,serial=fba59e94-bf97-4eda-9e43-d866b9eb1598,uuid=4a6391d2-a26f-448b-a693-ba4b10d6ee6d -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/instance-0000304b.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -no-hpet -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=rbd:volumes/volume-3c6afa29-7cef-4881-823c-fe523e069bea:id=compute:key=AQA1s/RUaAoTLxAAQuBPsckd8J/j6RZ2AciIJA==:auth_supported=cephx\;none:mon_host=10.14.52.4\:6789\;10.14.52.5\:6789\;10.14.52.6\:6789,if=none,id=drive-virtio-disk0,format=raw,serial=3c6afa29-7cef-4881-823c-fe523e069bea,cache=none,bps=52428800,iops=3000 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=38 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:19:30:d5,bus=pci.0,addr=0x3 -chardev file,id=charserial0,path=/var/lib/nova/instances/4a6391d2-a26f-448b-a693-ba4b10d6ee6d/console.log -device isa-serial,chardev=charserial0,id=serial0 -chardev pty,id=charserial1 -device isa-serial,chardev=charserial1,id=serial1 -device usb-tablet,id=input0 -vnc 0.0.0.0:13 -k en-us -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
Domain id=171 is tainted: high-privileges
char device redirected to /dev/pts/19 (label charserial1)

Tags: precise
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1534049

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: precise
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.4 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-wily

Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.