Bug #1307473 “guest hang due to missing clock interrupt” : Bugs : linux package : Ubuntu

Serge Hallyn (serge-hallyn) on 2014-04-14

Changed in qemu (Ubuntu):
importance:	Undecided → High

Revision history for this message

Damjan Marion (dmarion) wrote on 2014-04-15:

#1

I left over night following simple app which runs inside linux VM (pinned to CPU1). and displays how much ticks happened during the 1 second sleep. I found several occasions where sleep was taking much longer.

code:

#include<sys/time.h>
#include<time.h>
#include<stdio.h>
#include<stdint.h>

#define CPUSPEED 2533422000

static __inline__ uint64_t getticks(void)
{
     unsigned a, d;
     asm("cpuid");
     asm volatile("rdtsc" : "=a" (a), "=d" (d));
     return (((uint64_t)a) | (((uint64_t)d) << 32));
}
int main()
{
uint64_t t0,t1;
while (1) {
  t0 = getticks();
  sleep(1);
  t1 = getticks();
  printf("Ticks: %lu delta:%lu\n",t1-t0, t1-t0-CPUSPEED);
}
return 0;
}

Sample1:
Ticks: 2533748354 delta:326354
Ticks: 2533785458 delta:363458
Ticks: 2533889852 delta:467852
Ticks: 13309910165 delta:10776488165
Ticks: 2533823762 delta:401762
Ticks: 2533817164 delta:395164
Ticks: 2533894302 delta:472302

Sample2:
Ticks: 2533896753 delta:474753
Ticks: 2533876689 delta:454689
Ticks: 2533783931 delta:361931
Ticks: 20528401242 delta:17994979242
Ticks: 2533904102 delta:482102
Ticks: 2533740733 delta:318733
Ticks: 2533856266 delta:434266

Sample3:
Ticks: 2533761095 delta:339095
Ticks: 2533652242 delta:230242
Ticks: 2533855141 delta:433141
Ticks: 18943955180 delta:16410533180
Ticks: 2533780954 delta:358954
Ticks: 2533923283 delta:501283
Ticks: 2533909033 delta:487033

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-04-15: Re: [Bug 1307473] Re: guest hang due to missing clock interrupt

#2

Great, thanks for the test case!

Tried this with current git.qemu.org git HEAD on a trusty
kernel, was not able to reproduce. Trying on another host.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-04-15:

#3

I tried using 2.0.0~rc1+dfsg-0ubuntu3, using a trusty livecd iso, using the command

kvm -hda x.img -cdrom ubuntu-13.10-desktop-amd64.iso -m 1024 -realtime mlock=off -smp 4,sockets=1,cores=4,threads=1 -rtc base=localtime

but still have not seen this.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-04-15:

#4

However, you mention that you have your VM pinned to CPU 1, while the command line is doing '-cpu 4'. When I run a VM with -cpu 4 locked to a single physical cpu, it definately does not do well. I'm not sure whether to call that a bug or mis-use.

Example:

cgm create cpuset qemu
cgm setvalue cpuset qemu cpuset.cpus 0
cgm movepid cpuset qemu $$
kvm -hda x.img -cdrom ubuntu-13.10-desktop-amd64.iso -m 1024 -realtime mlock=off -smp 4,sockets=1,cores=4,threads=1 -rtc base=localtime

(resulting VM hangs; without the -smp 4,sockets=1,cores=4,threads=1' it runs fine.)

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-04-15:

#5

Reproduced just as easily with qemu.org git HEAD.

Again, this appears to only be a case when using -smp 4 while locking to 1 cpu with cpuset.

Revision history for this message

Damjan Marion (dmarion) wrote on 2014-04-15:

#6

just to clarify, i was pinning my test code inside the guest with "taskset -c 1". There was no pinning on the host side.

Also, i see the same issue with -smp 2.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-04-17:

#7

So the only thing you ran under taskset was the program in comment #1?

And if you do not run that under taskset, then it doesn't skip?

Revision history for this message

Damjan Marion (dmarion) wrote on 2014-05-05:

#8

Both systems I mentioned above were upgraded from precise to trusty. After reinstalling them with clean install issue disappear and VMs are not crashing anymore.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2014-05-09:

#9

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qemu (Ubuntu):
status:	New → Confirmed

Revision history for this message

Krzysztof Cybulski (krzysiek-cybulski) wrote on 2014-05-16:

#10

It seem to be related to https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1291321, there is solution for Windows VM there.

Revision history for this message

urusha (urusha) wrote on 2014-06-24:

#11

Download full text (4.0 KiB)

I have the same symptoms with two trusty-amd64 virtual hosts:
* win2003, linux guests hang for a period of time (~5 seconds, half of a minute and more)
* win2008 blue screen with the same message

This happens with kernels (host):
Linux vsrv7 3.13.0-27-generic #50-Ubuntu SMP Thu May 15 18:06:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Linux vsrv9 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Qemu version: 2.0.0+dfsg-2ubuntu1.1

Here are qemu params of guests that definately hang:
* precise with 3.11:
qemu-system-x86_64 -enable-kvm -name m -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid ab7f1e0b-e82e-ddb7-b743-903b8732e333 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/m.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot order=c,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -drive file=/dev/vg00/kvm_m_1,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3a:76:ad,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:3 -device VGA,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
* win 2008 r2:
qemu-system-x86_64 -enable-kvm -name ts2 -S -machine pc-1.0,accel=kvm,usb=off -m 10000 -realtime mlock=off -smp 16,sockets=16,cores=1,threads=1 -uuid 4df29f97-7e47-8af3-0009-a5395c28e3c5 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ts2.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot order=c,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x8 -drive file=/dev/vg00/kvm_ts2_1,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 -drive file=/dev/vg00/kvm_ts2_2,if=none,id=drive-scsi0-0-0-1,format=raw,cache=none,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 -drive file=/dev/vg00/kvm_ts2_3,if=none,id=drive-scsi0-0-0-2,format=raw,cache=none,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-0-2,id=scsi0-0-0-2 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=30 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ac:28:3a,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -device VGA,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
* win 2003:
qemu-system-x86_64 -enable-kvm -name ts4 -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 8192 -realtime mlock=off -smp 4,sockets=4,cores=1,threads...

I have the same symptoms with two trusty-amd64 virtual hosts:
 * win2003, linux guests hang for a period of time (~5 seconds, half of a minute and more)
 * win2008 blue screen with the same message

This happens with kernels (host):
Linux vsrv7 3.13.0-27-generic #50-Ubuntu SMP Thu May 15 18:06:16 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Linux vsrv9 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Qemu version: 2.0.0+dfsg-2ubuntu1.1

Here are qemu params of guests that definately hang:
* precise with 3.11:
qemu-system-x86_64 -enable-kvm -name m -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid ab7f1e0b-e82e-ddb7-b743-903b8732e333 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/m.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot order=c,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4 -drive file=/dev/vg00/kvm_m_1,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:3a:76:ad,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:3 -device VGA,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5
* win 2008 r2:
qemu-system-x86_64 -enable-kvm -name ts2 -S -machine pc-1.0,accel=kvm,usb=off -m 10000 -realtime mlock=off -smp 16,sockets=16,cores=1,threads=1 -uuid 4df29f97-7e47-8af3-0009-a5395c28e3c5 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ts2.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot order=c,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x8 -drive file=/dev/vg00/kvm_ts2_1,if=none,id=drive-scsi0-0-0-0,format=raw,cache=none,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 -drive file=/dev/vg00/kvm_ts2_2,if=none,id=drive-scsi0-0-0-1,format=raw,cache=none,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 -drive file=/dev/vg00/kvm_ts2_3,if=none,id=drive-scsi0-0-0-2,format=raw,cache=none,aio=native -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-0-2,id=scsi0-0-0-2 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=30 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ac:28:3a,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:0 -device VGA,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4
* win 2003:
qemu-system-x86_64 -enable-kvm -name ts4 -S -machine pc-i440fx-trusty,accel=kvm,usb=off -m 8192 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -uuid d9f9b238-5cb6-59a8-f3aa-ccfc6656040a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/ts4.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -boot order=c,menu=on,strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/vg00/kvm_ts4_1,if=none,id=drive-virtio-disk0,format=raw,cache=none,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:37:6b:8d,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:2 -device VGA,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

Revision history for this message

urusha (urusha) wrote on 2014-06-24:

#12

dmesg of precise guest while hanging Edit (4.2 KiB, text/plain)

Revision history for this message

urusha (urusha) wrote on 2014-06-24:

#13

Also, seems that these bugs are DUPs:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1308341
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1332409

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-06-25:

#14

Thanks, the soft lockup message in that dmesg may be helpful. Marking as affecting the kernel.

Revision history for this message

Brad Figg (brad-figg) wrote on 2014-06-25: Missing required logs.

#15

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1307473

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Ilya Almametov (ilya-almametov) wrote on 2014-07-01: apport information

#16

AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 июня 30 18:31 seq
crw-rw---- 1 root audio 116, 33 июня 30 18:31 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu3.2
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 14.04
HibernationDevice: RESUME=UUID=ae5e2d0f-021c-46c2-8bad-0cecbdfaff95
InstallationDate: Installed on 2012-11-14 (593 days ago)
InstallationMedia: Ubuntu-Server 12.10 "Quantal Quetzal" - Release amd64 (20121017.2)
MachineType: Intel Corporation S5500BC
Package: qemu 2.0.0+dfsg-2ubuntu1.1
PackageArchitecture: amd64
PciMultimedia:

ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-30-generic root=UUID=33d72c51-8774-4af2-9549-29b9c3bd2b62 ro nomdmonddf nomdmonisw nomdmonddf nomdmonisw
ProcVersionSignature: Ubuntu 3.13.0-30.54-generic 3.13.11.2
RelatedPackageVersions:
linux-restricted-modules-3.13.0-30-generic N/A
linux-backports-modules-3.13.0-30-generic N/A
linux-firmware 1.127.4
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty trusty
Uname: Linux 3.13.0-30-generic x86_64
UpgradeStatus: Upgraded to trusty on 2014-06-26 (4 days ago)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 09/09/2011
dmi.bios.vendor: Intel Corp.
dmi.bios.version: S5500.86B.01.00.0060.090920111354
dmi.board.asset.tag: ....................
dmi.board.name: S5500BC
dmi.board.vendor: Intel Corporation
dmi.board.version: E25124-456
dmi.chassis.asset.tag: ....................
dmi.chassis.type: 17
dmi.chassis.vendor: ..............................
dmi.chassis.version: ..................
dmi.modalias: dmi:bvnIntelCorp.:bvrS5500.86B.01.00.0060.090920111354:bd09/09/2011:svnIntelCorporation:pnS5500BC:pvr....................:rvnIntelCorporation:rnS5500BC:rvrE25124-456:cvn..............................:ct17:cvr..................:
dmi.product.name: S5500BC
dmi.product.version: ....................
dmi.sys.vendor: Intel Corporation

AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116,  1 июня  30 18:31 seq
 crw-rw---- 1 root audio 116, 33 июня  30 18:31 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu3.2
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 14.04
HibernationDevice: RESUME=UUID=ae5e2d0f-021c-46c2-8bad-0cecbdfaff95
InstallationDate: Installed on 2012-11-14 (593 days ago)
InstallationMedia: Ubuntu-Server 12.10 "Quantal Quetzal" - Release amd64 (20121017.2)
MachineType: Intel Corporation S5500BC
Package: qemu 2.0.0+dfsg-2ubuntu1.1
PackageArchitecture: amd64
PciMultimedia:
 
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-30-generic root=UUID=33d72c51-8774-4af2-9549-29b9c3bd2b62 ro nomdmonddf nomdmonisw nomdmonddf nomdmonisw
ProcVersionSignature: Ubuntu 3.13.0-30.54-generic 3.13.11.2
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-30-generic N/A
 linux-backports-modules-3.13.0-30-generic  N/A
 linux-firmware                             1.127.4
RfKill: Error: [Errno 2] No such file or directory
Tags:  trusty trusty
Uname: Linux 3.13.0-30-generic x86_64
UpgradeStatus: Upgraded to trusty on 2014-06-26 (4 days ago)
UserGroups:
 
_MarkForUpload: True
dmi.bios.date: 09/09/2011
dmi.bios.vendor: Intel Corp.
dmi.bios.version: S5500.86B.01.00.0060.090920111354
dmi.board.asset.tag: ....................
dmi.board.name: S5500BC
dmi.board.vendor: Intel Corporation
dmi.board.version: E25124-456
dmi.chassis.asset.tag: ....................
dmi.chassis.type: 17
dmi.chassis.vendor: ..............................
dmi.chassis.version: ..................
dmi.modalias: dmi:bvnIntelCorp.:bvrS5500.86B.01.00.0060.090920111354:bd09/09/2011:svnIntelCorporation:pnS5500BC:pvr....................:rvnIntelCorporation:rnS5500BC:rvrE25124-456:cvn..............................:ct17:cvr..................:
dmi.product.name: S5500BC
dmi.product.version: ....................
dmi.sys.vendor: Intel Corporation

tags:

added: apport-collected trusty

Revision history for this message

Ilya Almametov (ilya-almametov) wrote on 2014-07-01: BootDmesg.txt

#17

BootDmesg.txt Edit (78.8 KiB, text/plain)

apport information

Revision history for this message

Ilya Almametov (ilya-almametov) wrote on 2014-07-01: CurrentDmesg.txt

#18

CurrentDmesg.txt Edit (8.0 KiB, text/plain)

apport information

Revision history for this message

Ilya Almametov (ilya-almametov) wrote on 2014-07-01: Dependencies.txt

#19

Dependencies.txt Edit (6.2 KiB, text/plain)

apport information

Revision history for this message

Ilya Almametov (ilya-almametov) wrote on 2014-07-01: IwConfig.txt

#20

IwConfig.txt Edit (348 bytes, text/plain)

apport information

Revision history for this message

Ilya Almametov (ilya-almametov) wrote on 2014-07-01: Lspci.txt

#21

Lspci.txt Edit (70.1 KiB, text/plain)

apport information

Revision history for this message

Ilya Almametov (ilya-almametov) wrote on 2014-07-01: Lsusb.txt

#22

Lsusb.txt Edit (652 bytes, text/plain)

apport information

Revision history for this message

Ilya Almametov (ilya-almametov) wrote on 2014-07-01: ProcCpuinfo.txt

#23

ProcCpuinfo.txt Edit (13.8 KiB, text/plain)

apport information

Revision history for this message

Ilya Almametov (ilya-almametov) wrote on 2014-07-01: ProcEnviron.txt

#24

ProcEnviron.txt Edit (297 bytes, text/plain)

apport information

Revision history for this message

Ilya Almametov (ilya-almametov) wrote on 2014-07-01: ProcInterrupts.txt

#25

ProcInterrupts.txt Edit (9.2 KiB, text/plain)

apport information

Revision history for this message

Ilya Almametov (ilya-almametov) wrote on 2014-07-01: ProcModules.txt

#26

ProcModules.txt Edit (4.2 KiB, text/plain)

apport information

Revision history for this message

Ilya Almametov (ilya-almametov) wrote on 2014-07-01: UdevDb.txt

#27

UdevDb.txt Edit (201.9 KiB, text/plain)

apport information

Revision history for this message

Ilya Almametov (ilya-almametov) wrote on 2014-07-01: UdevLog.txt

#28

UdevLog.txt Edit (448.2 KiB, text/plain)

apport information

Revision history for this message

Ilya Almametov (ilya-almametov) wrote on 2014-07-01: WifiSyslog.txt

#29

WifiSyslog.txt Edit (5.5 KiB, text/plain)

apport information

Ilya Almametov (ilya-almametov) on 2014-07-01

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

urusha (urusha) wrote on 2014-07-03:

#30

After installing kernel 3.15.1-031501-generic from kernel-ppa, both machines work without issues from 2014-06-25. Seems it's kernel issue that have already been solved upstream.

Revision history for this message

Ilya Almametov (ilya-almametov) wrote on 2014-07-03:

#31

I can confirm that it's more kernel issue than qemu. I run kernel 3.11.0-24-generic which is left after upgrade from Saucy and have no issues for at least two days. Before that with current 3.13.0-30-generic kernel my Windows guests crashed every 3-4 hours.

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-07-03:

#32

Thanks, that's great to know!

Revision history for this message

Ondergetekende (kvdveer) wrote on 2014-07-08:

#33

I'm not confident yet we're seeing the exact same problem, but it is pretty close. We're running a somewhat wide range of hyperisor kernels, these are our observations so far.

node-1-1 3.13.0-24-generic is affected for 0% of vms
node-1-3 3.13.0-24-generic is affected for 0% of vms
node-1-5 3.13.0-24-generic is affected for 0% of vms
node-1-6 3.13.0-27-generic is affected for 0% of vms
node-1-7 3.13.0-29-generic is affected for 0% of vms
node-2-3 3.13.0-30-generic is affected for 0% of vms
node-2-4 3.13.0-27-generic is affected for 0% of vms
node-2-5 3.13.0-24-generic is affected for 0% of vms
node-1-8 3.13.0-27-generic is affected for 2% of vms
node-1-10 3.13.0-30-generic is affected for 33% of vms
node-1-2 3.13.0-29-generic is affected for 48% of vms
node-1-9 3.13.0-30-generic is affected for 32% of vms
node-2-1 3.13.0-30-generic is affected for 20% of vms
node-2-2 3.13.0-30-generic is affected for 7% of vms
node-1-4 3.13.0-29-generic is affected for 61% of vm

Revision history for this message

Ondergetekende (kvdveer) wrote on 2014-07-08:

#34

Note that my list of affected nodes also include migrated VMs, so there are some false positives (VMs that came from an affected node). The affected VMs on node 1-8 all seem to be migrated from another node.

Revision history for this message

John Johansen (jjohansen) wrote on 2014-07-08:

#35

Ondergetekende, can you provide further details to why you believe Bug #1326367 is causing this? Would you be willing to test a 3.11.0-24-generic kernel (reported stable) + the futex fix, or a chosen stable version of the 3.13 or 3.15 kernel with just the futex fix. To verify that the futex fix is the problem?

Revision history for this message

Ondergetekende (kvdveer) wrote on 2014-07-09:

#36

We haven't been able to reproduce the issues under lab conditions, and I'm not willing to use our production setup as a guinypig anymore. These issues have cost me too much credibility already.

We believe #1326367 is causing this, as we've bisected this issue to be between 3.13.0-27.50 and 3.13.0-29.53 (see our results earlier). #1326367 is the only change which felt relevant, but admittedly, this is just a hunch.

Revision history for this message

Dr. David Alan Gilbert (dgilbert-h) wrote on 2014-07-09:

#37

Ondergetekende: Physically is there *anything* different between the nodes in your #33 that exhibited no errors and those that exhibited a lot? CPU model/vendor, number of sockets, system vendor etc?
(I'm wondering about a synchronised/unsynchronised tsc type issue).

Revision history for this message

Mike Lowe (jomlowe) wrote on 2014-07-11:

#38

I believe I have the same problem, place a guest under any amount of load, let's say 'yum upgrade' and the network stack goes out to lunch for 1-5 seconds. Here is a sample of the ping statistics (host to guest) from doing such an operation on a 3.13.0-30.55 kernel:

213 packets transmitted, 213 received, 0% packet loss, time 211998ms
rtt min/avg/max/mdev = 0.136/106.283/2651.359/428.403 ms, pipe 3

And a 3.11.0-19.33 kernel:

62 packets transmitted, 62 received, 0% packet loss, time 61074ms
rtt min/avg/max/mdev = 0.189/0.434/1.987/0.228 ms

Revision history for this message

Mike Lowe (jomlowe) wrote on 2014-07-11:

#39

I can confirm that rolling back to 3.13.0-27 from 3.13.0-30 alleviated my symptoms.

Revision history for this message

Ondergetekende (kvdveer) wrote on 2014-07-14:

#40

We've resolved our issues by disabling KSM on the affected nodes. All of the non-affected nodes didn't have KSM enabled (due to a packaging bug elsewhere). After disabling KSM, our problems went away gradually in ~3 days.

This means we're no longer affected by this issue (and given the other reports, probably never were).

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2014-07-14: Re: [Bug 1307473] Re: guest hang due to missing clock interrupt

#41

Quoting Ondergetekende (<email address hidden>):
> We've resolved our issues by disabling KSM on the affected nodes. All of
> the non-affected nodes didn't have KSM enabled (due to a packaging bug
> elsewhere). After disabling KSM, our problems went away gradually in ~3
> days.
>
> This means we're no longer affected by this issue (and given the other
> reports, probably never were).

And which specific kernel are you on?

Revision history for this message

Jeff Wilson (wilson-3) wrote on 2014-07-15:

#42

vm1.xml Edit (3.2 KiB, application/xml)

I have a similar or the same problem with my Windows Server 2008 R2 virtual machines. The virtual machine stops with a Blue Screen error 101, clock interrupt was not received on a secondary processor. The error only occurs when the VM has 2 cpus. The error seems to occur when the VM is some load, over time (hours), or when I RDP to the VM after a few hours of it running.

The same VM ran perfect under Ubuntu 12.04.

Host Server
ubuntu 14.04 LTS updated from 12.04 LTS
kernel: 3.13.0-30

Virtual Machine
Windows Server 2008 R2
2 cpus (when the error occurs)

Attached is the VM xml configuration file.

I did try adding the hyperv code and it seemed to help at first, but then errored in hours.

I did boot to kernel 3.13.0-24 and the same error occurred within an hour under some load.

Do people expect this problem to be resolved soon?

Thank you for the help.

Revision history for this message

Mike Lowe (jomlowe) wrote on 2014-07-15:

#43

I need to amend comment #39, moving from 3.13.0-30 to 3.13.0-27 did not eliminate the problem. It would seem that it takes a couple of hours following a reboot for the symptoms to manifest with 3.13.0-27.

Revision history for this message

Jan Müller (jm-3) wrote on 2014-07-17:

#44

dup of #1332409?

seems to be a 3.13 only bug.

Paolo Bonzini (bonzini) on 2014-07-17

no longer affects:

qemu

Revision history for this message

Jeff Wilson (wilson-3) wrote on 2014-07-18:

#45

I have resolved my problem by running kernel 3.14.1-031401 from http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14.1-trusty/, Ubuntu 14.04 LTS. The host has been running solid for a good 24 hours with 1 Windows Server 2008 R2, 2 cpu, VM running and two additional VMs running for three hours.

The pertinent xml entries that were changed or not included in the original xml configuration file are

  <os>
    <type arch='x86_64' machine='pc-i440fx-trusty'>hvm</type> #changed from <type arch='x86_64' machine='pc-i440fx-1.7'>hvm</type>
  </os>

  <features>
    <hap/> # added entry
  </features>

  <clock offset='localtime'>
    <timer name='hypervclock'/> # added entry
  </clock>

I'm not sure what will happen when a kernel 3.14 is included in the main distribution. Will the future kernel 3.14 from the distribution replace the 3.14 kernel that was installed via dpkg?

Thank you for everyone's messages.

Chris J Arges (arges) on 2014-07-21

tags:

added: ksm-numa-guest-freeze

Revision history for this message

Chris J Arges (arges) wrote on 2014-07-22:

#46

I believe I've found the fix for this issue on 3.13.
If you can, please test the kernel posted on comment #1 on this bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1346917
Make sure KSM is enabled; and any workarounds for this bug are disabled.

If this fixes the issue for you, you are welcome to mark this bug as a duplicate of 1346917.

Thanks!

no longer affects:

qemu (Ubuntu)

Revision history for this message

Daniele Viganò (daniele-vigano) wrote on 2014-07-31:

#47

I confirm, like Jeff Wilson, that I had the same issue with 3.13 and got resolved with 3.14.1.

I cannot right now test the kernel suggested in #46.

Revision history for this message

Fred Thoma (drulenberg) wrote on 2015-01-31:

#48

Same here, had issues with 3.13.0-44-generic, upgraded to 3.16.0-23-generic and the problem was solved. Followed this tutorial http://askubuntu.com/questions/541775/how-can-i-install-ubuntu-14-10s-kernel-in-ubuntu-14-04-lts

Revision history for this message

Fred Thoma (drulenberg) wrote on 2015-02-02:

#49

Same bluescreen STOP: 0x0000005c again on day 9. So it has not been fixed by a kernel upgrade to 3.16.0-23-generic with above method from askubuntu.com.

Revision history for this message

Paolo Bonzini (bonzini) wrote on 2015-02-02:

#50

Fred, this bug is for STOP 0x101, not STOP 0x5c.

STOP 0x101 cannot be fixed by an upgrade. You have to disable the watchdog using QEMU option hv_relaxed or the equivalent in libvirt.

Ubuntu
linux package

guest hang due to missing clock interrupt

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntulinux package

guest hang due to missing clock interrupt

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package