Activity log for bug #694029

Date Who What changed Old value New value Message
2010-12-24 06:02:01 Jamie Strandboge bug added bug
2010-12-24 06:03:07 Jamie Strandboge bug task added kvm (Ubuntu)
2010-12-24 06:03:21 Jamie Strandboge affects kvm (Ubuntu) qemu-kvm (Ubuntu)
2010-12-24 06:08:08 Jamie Strandboge description After upgrading to natty's kernel I noticed that my VMs would sometimes become highly unstable, with random guest applications segfaulting and crashing in weird ways. This seems to be more pronounced when running more than one VM at a time. This does not seem to be a hardware issue-- the host is a 6 month old laptop and I ran memtest86 for 12 hours with 18 successful completions and no errors. There is no host instability or messages in dmesg that I could see that would indicate a host problem. Downgrading to the maverick kernel fixes this problem. I have a script that will launch 10 VMs and run some commands: #!/bin/sh count=0 while /bin/true ; do count=$(( $count + 1 )) echo "RUN $count" vm-stop -f -p sec sleep 3 vm-start -s -v -p sec sleep 15 vm-cmd -c -r -p sec apt-get update vm-cmd -c -r -p sec apt-get -y --force-yes dist-upgrade vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge chromium-browser* vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge chromium-browser* vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge chromium-browser* vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge chromium-browser* vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge chromium-browser* done 'vm-start' starts 10 VMs via libvirt with snapshotted qcow2 disks, and vm-stop kills them off, discarding the snapshot. 'vm-cmd' will ssh into each machine and run the command for each machine in sequence. The VMs themselves are all pristine and are resnapshotted on each loop iteration. The point of this explanation is to illustrate that while the VMs all start in the same state, they fail differently or sometimes not at all. I am able to reproduce guest instability within 4-5 iterations of this script on a natty kernel. With the maverick kernel it ran for 18 times with no errors (around 8 hours). For example, with the above, I saw a maverick/i386 guest fail once with: dpkg: parse error, in file '/var/lib/dpkg/status' near line 5914 package 'libtelepathy-glib0': 'Depends' field, reference to 'libglib2.0-0': error in version: version string is empty Another time the maverick/i386 failed with: Processing triggers for man-db ... dpkg: error processing man-db (--unpack): subprocess installed post-installation script killed by signal (Segmentation fault) Errors were encountered while processing: man-db A lucid/i386 guest failed another time with: Processing triggers for python-gmenu ... Rebuilding /usr/share/applications/desktop.en_US.utf8.cache... Segmentation fault dpkg: error processing python-gmenu (--purge): subprocess installed post-installation script returned error exit status 139 Processing triggers for man-db ... Errors were encountered while processing: python-gmenu There are many other failures.... On my laptop I have an i7 with two cores and 4 hyperthreads per core (this is the default configuration for this machine from the factory and the configuration used to report this bug). I am able to 'disable' hyperthreads in the BIOS, and if I do, I end up with 2 cores and 2 threads per core. In this configuration, I noticed that I don't have to run as many VMs to see the problem. I've seen it with as little as 2 VMs at a time. I mention this as it seems that the issue is exacerbated when the ratio of VMs to CPUs is 1:1 or higher. I can say for certain that the rc6 and rc7 kernel in natty exhibit the problem, and maverick's does not. I can also say that the natty kernel runs considerably hotter than the maverick kernel, with average temperatures being 10-15C higher underload according to /proc/acpi/ibm/thermal (I had to buy a 'chill mat' (a laptop mat with 2 fans) after upgrading to natty). My gut feeling is that it has to do with KSM or virtio, but I don't know that for sure. This may by chipset specific as a colleague was unable to reproduce this on the rc6 kernel (but also did not run my script-- only did the chromium-browser updates over and over again). I thought it could be the high temperatures causing problems, but then why wasn't the host having problems? I thought it could be the host ram, but memtest86 was ok and running the script for 8 hours on maverick fills the ram and swap and there were still no problems in the guests and no problems on the host. ProblemType: Bug DistroRelease: Ubuntu 11.04 Package: linux-image-2.6.37-11-generic 2.6.37-11.25 Regression: Yes Reproducible: Yes ProcVersionSignature: Ubuntu 2.6.37-11.25-generic 2.6.37-rc7 Uname: Linux 2.6.37-11-generic x86_64 AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23. Architecture: amd64 ArecordDevices: **** List of CAPTURE Hardware Devices **** card 0: Intel [HDA Intel], device 0: CONEXANT Analog [CONEXANT Analog] Subdevices: 1/1 Subdevice #0: subdevice #0 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC0: jamie 2360 F.... pulseaudio /dev/snd/pcmC0D0p: jamie 2360 F...m pulseaudio CRDA: Error: [Errno 2] No such file or directory Card0.Amixer.info: Card hw:0 'Intel'/'HDA Intel at 0xf2520000 irq 43' Mixer name : 'Intel IbexPeak HDMI' Components : 'HDA:14f15069,17aa2156,00100302 HDA:80862804,17aa21b5,00100000' Controls : 16 Simple ctrls : 7 Card29.Amixer.info: Card hw:29 'ThinkPadEC'/'ThinkPad Console Audio Control at EC reg 0x30, fw 6QHT28WW-1.09' Mixer name : 'ThinkPad EC 6QHT28WW-1.09' Components : '' Controls : 1 Simple ctrls : 1 Card29.Amixer.values: Simple mixer control 'Console',0 Capabilities: pswitch pswitch-joined penum Playback channels: Mono Mono: Playback [off] Date: Thu Dec 23 23:22:11 2010 EcryptfsInUse: Yes HibernationDevice: RESUME=UUID=58280e6e-d161-43ea-8593-a89fb7b6851a InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100427.1) MachineType: LENOVO 5129CTO ProcEnviron: LANGUAGE=en_US:en PATH=(custom, user) LANG=en_US.UTF-8 LC_MESSAGES=en_US.utf8 SHELL=/bin/bash ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.37-11-generic root=UUID=82571cfb-fdda-4d2f-b708-f8924aa0fe21 ro vt.handoff=7 quiet splash RelatedPackageVersions: linux-firmware 1.44 SourcePackage: linux dmi.bios.date: 04/20/2010 dmi.bios.vendor: LENOVO dmi.bios.version: 6QET44WW (1.14 ) dmi.board.name: 5129CTO dmi.board.vendor: LENOVO dmi.board.version: Not Available dmi.chassis.asset.tag: No Asset Information dmi.chassis.type: 10 dmi.chassis.vendor: LENOVO dmi.chassis.version: Not Available dmi.modalias: dmi:bvnLENOVO:bvr6QET44WW(1.14):bd04/20/2010:svnLENOVO:pn5129CTO:pvrThinkPadX201s:rvnLENOVO:rn5129CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable: dmi.product.name: 5129CTO dmi.product.version: ThinkPad X201s dmi.sys.vendor: LENOVO After upgrading to natty's kernel I noticed that my VMs would sometimes become highly unstable, with random guest applications segfaulting and crashing in weird ways. This seems to be more pronounced when running more than one VM at a time. This does not seem to be a hardware issue-- the host is a 6 month old laptop and I ran memtest86 for 12 hours with 18 successful completions and no errors. There is no host instability or messages in dmesg that I could see that would indicate a host problem. Downgrading to the maverick kernel fixes this problem. I have a script that will launch 10 VMs and run some commands: #!/bin/sh count=0 while /bin/true ; do     count=$(( $count + 1 ))     echo "RUN $count"     vm-stop -f -p sec     sleep 3     vm-start -s -v -p sec     sleep 15     vm-cmd -c -r -p sec apt-get update     vm-cmd -c -r -p sec apt-get -y --force-yes dist-upgrade     vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser     vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge chromium-browser*     vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser     vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge chromium-browser*     vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser     vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge chromium-browser*     vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser     vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge chromium-browser*     vm-cmd -c -r -p sec apt-get -y --force-yes install chromium-browser     vm-cmd -c -r -p sec apt-get -y --force-yes remove --purge chromium-browser* done 'vm-start' starts 10 VMs via libvirt with snapshotted qcow2 disks, and vm-stop kills them off, discarding the snapshot. 'vm-cmd' will ssh into each machine and run the command for each machine in sequence. The VMs themselves are all pristine and are resnapshotted on each loop iteration. The point of this explanation is to illustrate that while the VMs all start in the same state, they fail differently or sometimes not at all. I am able to reproduce guest instability within 4-5 iterations of this script on a natty kernel. With the maverick kernel it ran for 18 times with no errors (around 8 hours). For example, with the above, I saw a maverick/i386 guest fail once with: dpkg: parse error, in file '/var/lib/dpkg/status' near line 5914 package 'libtelepathy-glib0':  'Depends' field, reference to 'libglib2.0-0': error in version: version string is empty Another time the maverick/i386 failed with: Processing triggers for man-db ... dpkg: error processing man-db (--unpack):  subprocess installed post-installation script killed by signal (Segmentation fault) Errors were encountered while processing:  man-db A lucid/i386 guest failed another time with: Processing triggers for python-gmenu ... Rebuilding /usr/share/applications/desktop.en_US.utf8.cache... Segmentation fault dpkg: error processing python-gmenu (--purge):  subprocess installed post-installation script returned error exit status 139 Processing triggers for man-db ... Errors were encountered while processing:  python-gmenu There are many other failures.... On my laptop I have an i7 with two cores and 4 hyperthreads per core (this is the default configuration for this machine from the factory and the configuration used to report this bug). I am able to 'disable' hyperthreads in the BIOS, and if I do, I end up with 2 cores and 2 threads per core. In this configuration, I noticed that I don't have to run as many VMs to see the problem. I've seen it with as little as 2 VMs at a time. I mention this as it seems that the issue is exacerbated when the ratio of VMs to CPUs is 1:1 or higher. I can say for certain that the rc6 and rc7 kernel in natty exhibit the problem, and maverick's does not. I can also say that the natty kernel runs considerably hotter than the maverick kernel, with average temperatures being 10-15C higher underload according to /proc/acpi/ibm/thermal (I had to buy a 'chill mat' (a laptop mat with 2 fans) after upgrading to natty). My gut feeling is that it has to do with KSM or virtio, but I don't know that for sure. This may by chipset specific as a colleague was unable to reproduce this on the rc6 kernel (but also did not run my script-- only did the chromium-browser updates over and over again). I thought it could be the high temperatures causing problems, but then why wasn't the host having problems? I thought it could be the host ram, but memtest86 was ok and running the script for 8 hours on maverick fills the ram and swap and there were still no problems in the guests and no problems on the host. Based on the above it seems clear to me that something in the natty kernel is causing the problem. ProblemType: Bug DistroRelease: Ubuntu 11.04 Package: linux-image-2.6.37-11-generic 2.6.37-11.25 Regression: Yes Reproducible: Yes ProcVersionSignature: Ubuntu 2.6.37-11.25-generic 2.6.37-rc7 Uname: Linux 2.6.37-11-generic x86_64 AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.23. Architecture: amd64 ArecordDevices:  **** List of CAPTURE Hardware Devices ****  card 0: Intel [HDA Intel], device 0: CONEXANT Analog [CONEXANT Analog]    Subdevices: 1/1    Subdevice #0: subdevice #0 AudioDevicesInUse:  USER PID ACCESS COMMAND  /dev/snd/controlC0: jamie 2360 F.... pulseaudio  /dev/snd/pcmC0D0p: jamie 2360 F...m pulseaudio CRDA: Error: [Errno 2] No such file or directory Card0.Amixer.info:  Card hw:0 'Intel'/'HDA Intel at 0xf2520000 irq 43'    Mixer name : 'Intel IbexPeak HDMI'    Components : 'HDA:14f15069,17aa2156,00100302 HDA:80862804,17aa21b5,00100000'    Controls : 16    Simple ctrls : 7 Card29.Amixer.info:  Card hw:29 'ThinkPadEC'/'ThinkPad Console Audio Control at EC reg 0x30, fw 6QHT28WW-1.09'    Mixer name : 'ThinkPad EC 6QHT28WW-1.09'    Components : ''    Controls : 1    Simple ctrls : 1 Card29.Amixer.values:  Simple mixer control 'Console',0    Capabilities: pswitch pswitch-joined penum    Playback channels: Mono    Mono: Playback [off] Date: Thu Dec 23 23:22:11 2010 EcryptfsInUse: Yes HibernationDevice: RESUME=UUID=58280e6e-d161-43ea-8593-a89fb7b6851a InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release amd64 (20100427.1) MachineType: LENOVO 5129CTO ProcEnviron:  LANGUAGE=en_US:en  PATH=(custom, user)  LANG=en_US.UTF-8  LC_MESSAGES=en_US.utf8  SHELL=/bin/bash ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-2.6.37-11-generic root=UUID=82571cfb-fdda-4d2f-b708-f8924aa0fe21 ro vt.handoff=7 quiet splash RelatedPackageVersions: linux-firmware 1.44 SourcePackage: linux dmi.bios.date: 04/20/2010 dmi.bios.vendor: LENOVO dmi.bios.version: 6QET44WW (1.14 ) dmi.board.name: 5129CTO dmi.board.vendor: LENOVO dmi.board.version: Not Available dmi.chassis.asset.tag: No Asset Information dmi.chassis.type: 10 dmi.chassis.vendor: LENOVO dmi.chassis.version: Not Available dmi.modalias: dmi:bvnLENOVO:bvr6QET44WW(1.14):bd04/20/2010:svnLENOVO:pn5129CTO:pvrThinkPadX201s:rvnLENOVO:rn5129CTO:rvrNotAvailable:cvnLENOVO:ct10:cvrNotAvailable: dmi.product.name: 5129CTO dmi.product.version: ThinkPad X201s dmi.sys.vendor: LENOVO
2011-01-04 14:21:40 Dave Walker qemu-kvm (Ubuntu): importance Undecided Medium
2011-01-04 14:21:40 Dave Walker qemu-kvm (Ubuntu): status New Incomplete
2011-01-04 23:52:45 Ubuntu Foundations Team Bug Bot tags amd64 apport-bug kernel-uncat natty needs-upstream-testing regression-update running-unity amd64 apport-bug kernel-uncat natty needs-upstream-testing regression-release running-unity
2011-01-05 13:05:50 Jamie Strandboge qemu-kvm (Ubuntu): status Incomplete New
2011-01-05 13:06:36 Jamie Strandboge qemu-kvm (Ubuntu): status New Confirmed
2011-02-03 03:36:35 Jamie Strandboge attachment added kern.oops https://bugs.launchpad.net/ubuntu/+source/linux/+bug/694029/+attachment/1826451/+files/kern.oops
2011-02-03 03:37:52 Jamie Strandboge attachment added 2.6.38-1.28-generic.dmesg https://bugs.launchpad.net/ubuntu/+source/linux/+bug/694029/+attachment/1826452/+files/2.6.38-1.28-generic.dmesg
2011-02-04 19:01:06 Jamie Strandboge attachment removed kern.oops https://bugs.launchpad.net/ubuntu/+source/linux/+bug/694029/+attachment/1826451/+files/kern.oops
2011-02-11 21:07:43 Serge Hallyn qemu-kvm (Ubuntu): status Confirmed Invalid
2011-02-11 21:10:41 Serge Hallyn linux (Ubuntu): assignee Serge Hallyn (serge-hallyn)
2011-02-16 18:46:34 Serge Hallyn linux (Ubuntu): assignee Serge Hallyn (serge-hallyn)
2011-03-01 23:57:23 Kees Cook bug added subscriber Kees Cook
2011-03-01 23:57:23 Kees Cook linux (Ubuntu): status New Confirmed
2011-04-10 22:53:30 C de-Avillez bug added subscriber C de-Avillez
2011-05-10 04:25:55 Sebastian Kuzminsky bug added subscriber Sebastian Kuzminsky
2011-05-10 04:33:12 Sebastian Kuzminsky tags amd64 apport-bug kernel-uncat natty needs-upstream-testing regression-release running-unity amd64 apport-bug apport-collected kernel-uncat natty needs-upstream-testing regression-release running-unity
2011-05-10 04:33:18 Sebastian Kuzminsky attachment added AcpiTables.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122091/+files/AcpiTables.txt
2011-05-10 04:33:20 Sebastian Kuzminsky attachment added AlsaDevices.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122092/+files/AlsaDevices.txt
2011-05-10 04:33:22 Sebastian Kuzminsky attachment added AplayDevices.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122093/+files/AplayDevices.txt
2011-05-10 04:33:24 Sebastian Kuzminsky attachment added BootDmesg.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122094/+files/BootDmesg.txt
2011-05-10 04:33:26 Sebastian Kuzminsky attachment added Card0.Amixer.values.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122095/+files/Card0.Amixer.values.txt
2011-05-10 04:33:28 Sebastian Kuzminsky attachment added Card0.Codecs.codec.0.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122096/+files/Card0.Codecs.codec.0.txt
2011-05-10 04:33:29 Sebastian Kuzminsky attachment added Card0.Codecs.codec.3.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122097/+files/Card0.Codecs.codec.3.txt
2011-05-10 04:33:31 Sebastian Kuzminsky attachment added CurrentDmesg.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122098/+files/CurrentDmesg.txt
2011-05-10 04:33:33 Sebastian Kuzminsky attachment added IwConfig.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122099/+files/IwConfig.txt
2011-05-10 04:33:34 Sebastian Kuzminsky attachment added Lspci.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122100/+files/Lspci.txt
2011-05-10 04:33:36 Sebastian Kuzminsky attachment added Lsusb.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122101/+files/Lsusb.txt
2011-05-10 04:33:38 Sebastian Kuzminsky attachment added PciMultimedia.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122102/+files/PciMultimedia.txt
2011-05-10 04:33:40 Sebastian Kuzminsky attachment added ProcCpuinfo.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122103/+files/ProcCpuinfo.txt
2011-05-10 04:33:42 Sebastian Kuzminsky attachment added ProcCpuinfo_.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122104/+files/ProcCpuinfo_.txt
2011-05-10 04:33:44 Sebastian Kuzminsky attachment added ProcInterrupts.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122105/+files/ProcInterrupts.txt
2011-05-10 04:33:46 Sebastian Kuzminsky attachment added ProcModules.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122106/+files/ProcModules.txt
2011-05-10 04:33:48 Sebastian Kuzminsky attachment added RfKill.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122107/+files/RfKill.txt
2011-05-10 04:33:50 Sebastian Kuzminsky attachment added UdevDb.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122108/+files/UdevDb.txt
2011-05-10 04:33:53 Sebastian Kuzminsky attachment added UdevLog.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122109/+files/UdevLog.txt
2011-05-10 04:33:55 Sebastian Kuzminsky attachment added WifiSyslog.txt https://bugs.launchpad.net/bugs/694029/+attachment/2122110/+files/WifiSyslog.txt
2011-07-25 22:04:12 Jamie Strandboge nominated for series Ubuntu Natty
2011-07-25 22:04:12 Jamie Strandboge bug task added linux (Ubuntu Natty)
2011-07-25 22:04:12 Jamie Strandboge bug task added qemu-kvm (Ubuntu Natty)
2011-07-25 22:04:12 Jamie Strandboge nominated for series Ubuntu Oneiric
2011-07-25 22:04:12 Jamie Strandboge bug task added linux (Ubuntu Oneiric)
2011-07-25 22:04:12 Jamie Strandboge bug task added qemu-kvm (Ubuntu Oneiric)
2011-07-25 22:04:28 Jamie Strandboge qemu-kvm (Ubuntu Natty): status New Invalid
2011-07-25 22:04:32 Jamie Strandboge linux (Ubuntu Natty): status New Confirmed
2011-08-19 14:42:49 Jamie Strandboge summary [natty] kvm guests become unstable after a while kvm guests become unstable after a while
2011-09-06 19:21:26 Jamie Strandboge linux (Ubuntu Oneiric): status Confirmed Fix Released
2012-01-02 10:56:36 GROS-PRUGNY linux (Ubuntu): assignee GROS-PRUGNY (bouliky)
2013-05-18 16:56:06 dino99 linux (Ubuntu Natty): status Confirmed Invalid