Activity log for bug #1325560

Date Who What changed Old value New value Message
2014-06-02 12:02:30 Izhar ul Hassan bug added bug
2014-06-02 17:09:17 Izhar ul Hassan bug task added qemu
2014-06-02 17:10:19 Izhar ul Hassan bug task added libvirt
2014-06-02 17:10:47 Izhar ul Hassan bug task deleted qemu
2014-06-02 17:11:50 Izhar ul Hassan bug task added linux
2014-06-02 17:57:20 Izhar ul Hassan tags apport-collected precise third-party-packages
2014-06-02 17:57:23 Izhar ul Hassan description Networking breaks after awhile in kvm guests using virtio networking. We run data intensive jobs on our virtual cluster (OpenStack Grizzly Installed on Ubuntu 12.04 Server). The job runs fine on a single worker VM (no data transfer involved). As soon as I add more nodes where the workers need to exchange some data, one of the worker VM goes down. Ping responds with 'host unreachable'. Logging in via the serial console shows no problems: eth0 is up, can ping the local host, but no outside connectivity. Restart the network (/etc/init.d/networking restart) does nothing. Reboot the machine and it comes alive again. 14/06/01 18:30:06 INFO YarnClientClusterScheduler: YarnClientClusterScheduler.postStartHook done 14/06/01 18:30:06 INFO MemoryStore: ensureFreeSpace(190758) called with curMem=0, maxMem=308713881 14/06/01 18:30:06 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 186.3 KB, free 294.2 MB) 14/06/01 18:30:06 INFO FileInputFormat: Total input paths to process : 1 14/06/01 18:30:06 INFO NetworkTopology: Adding a new node: /default-rack/10.20.20.28:50010 14/06/01 18:30:06 INFO NetworkTopology: Adding a new node: /default-rack/10.20.20.23:50010 14/06/01 18:30:06 INFO SparkContext: Starting job: count at hello_spark.py:15 14/06/01 18:30:06 INFO DAGScheduler: Got job 0 (count at hello_spark.py:15) with 2 output partitions (allowLocal=false) 14/06/01 18:30:06 INFO DAGScheduler: Final stage: Stage 0 (count at hello_spark.py:15) 14/06/01 18:30:06 INFO DAGScheduler: Parents of final stage: List() 14/06/01 18:30:06 INFO DAGScheduler: Missing parents: List() 14/06/01 18:30:06 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[2] at count at hello_spark.py:15), which has no missing parents 14/06/01 18:30:07 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (PythonRDD[2] at count at hello_spark.py:15) 14/06/01 18:30:07 INFO YarnClientClusterScheduler: Adding task set 0.0 with 2 tasks 14/06/01 18:30:08 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@host-10-20-20-28.novalocal:44417/user/Executor#-1352071582] with ID 1 14/06/01 18:30:08 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 1: host-10-20-20-28.novalocal (PROCESS_LOCAL) 14/06/01 18:30:08 INFO TaskSetManager: Serialized task 0.0:0 as 3123 bytes in 14 ms 14/06/01 18:30:09 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager host-10-20-20-28.novalocal:42960 with 588.8 MB RAM 14/06/01 18:30:16 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_1_0 in memory on host-10-20-20-28.novalocal:42960 (size: 308.2 MB, free: 280.7 MB) 14/06/01 18:30:17 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@host-10-20-20-23.novalocal:58126/user/Executor#1079893974] with ID 2 14/06/01 18:30:17 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 2: host-10-20-20-23.novalocal (PROCESS_LOCAL) 14/06/01 18:30:17 INFO TaskSetManager: Serialized task 0.0:1 as 3123 bytes in 1 ms 14/06/01 18:30:17 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager host-10-20-20-23.novalocal:56776 with 588.8 MB RAM fj14/06/01 18:31:20 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(1, host-10-20-20-28.novalocal, 42960, 0) with no recent heart beats: 55828ms exceeds 45000ms 14/06/01 18:42:23 INFO YarnClientSchedulerBackend: Executor 2 disconnected, so removing it 14/06/01 18:42:23 ERROR YarnClientClusterScheduler: Lost executor 2 on host-10-20-20-23.novalocal: remote Akka client disassociated The same job finishes flawlessly on a single worker. System Information: ================== Description: Ubuntu 12.04.4 LTS Release: 12.04 Linux 3.8.0-35-generic #52~precise1-Ubuntu SMP Thu Jan 30 17:24:40 UTC 2014 x86_64 libvirt-bin: -------------- Installed: 1.1.1-0ubuntu8~cloud2 Candidate: 1.1.1-0ubuntu8.7~cloud1 Version table: 1.1.1-0ubuntu8.7~cloud1 0 500 http://ubuntu-cloud.archive.canonical.com/ubuntu/ precise-updates/havana/main amd64 Packages *** 1.1.1-0ubuntu8~cloud2 0 100 /var/lib/dpkg/status 0.9.8-2ubuntu17.19 0 500 http://se.archive.ubuntu.com/ubuntu/ precise-updates/main amd64 Packages 0.9.8-2ubuntu17.17 0 500 http://security.ubuntu.com/ubuntu/ precise-security/main amd64 Packages 0.9.8-2ubuntu17 0 500 http://se.archive.ubuntu.com/ubuntu/ precise/main amd64 Packages qemu-kvm: --------------- Installed: 1.5.0+dfsg-3ubuntu5~cloud0 Candidate: 1.5.0+dfsg-3ubuntu5.4~cloud0 Version table: 1.5.0+dfsg-3ubuntu5.4~cloud0 0 500 http://ubuntu-cloud.archive.canonical.com/ubuntu/ precise-updates/havana/main amd64 Packages *** 1.5.0+dfsg-3ubuntu5~cloud0 0 100 /var/lib/dpkg/status 1.0+noroms-0ubuntu14.15 0 500 http://se.archive.ubuntu.com/ubuntu/ precise-updates/main amd64 Packages 1.0+noroms-0ubuntu14.14 0 500 http://security.ubuntu.com/ubuntu/ precise-security/main amd64 Packages 1.0+noroms-0ubuntu13 0 500 http://se.archive.ubuntu.com/ubuntu/ precise/main amd64 Packages XML DUMP for a VM ----------------------------- <domain type='kvm' id='7'> <name>instance-000001b6</name> <uuid>731c2191-fa82-4a38-9f52-e48fb37e92c8</uuid> <memory unit='KiB'>8388608</memory> <currentMemory unit='KiB'>8388608</currentMemory> <vcpu placement='static'>4</vcpu> <resource> <partition>/machine</partition> </resource> <sysinfo type='smbios'> <system> <entry name='manufacturer'>OpenStack Foundation</entry> <entry name='product'>OpenStack Nova</entry> <entry name='version'>2013.2.3</entry> <entry name='serial'>01d3d524-32eb-e011-8574-441ea15e3971</entry> <entry name='uuid'>731c2191-fa82-4a38-9f52-e48fb37e92c8</entry> </system> </sysinfo> <os> <type arch='x86_64' machine='pc-i440fx-1.5'>hvm</type> <boot dev='hd'/> <smbios mode='sysinfo'/> </os> <features> <acpi/> <apic/> </features> <cpu mode='host-model'> <model fallback='allow'/> </cpu> <clock offset='utc'> <timer name='pit' tickpolicy='delay'/> <timer name='rtc' tickpolicy='catchup'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/bin/kvm-spice</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source file='/var/lib/nova/instances/731c2191-fa82-4a38-9f52-e48fb37e92c8/disk'/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <controller type='usb' index='0'> <alias name='usb0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'> <alias name='pci0'/> </controller> <interface type='bridge'> <mac address='fa:16:3e:a7:de:97'/> <source bridge='qbr43f8d3a5-e4'/> <target dev='tap43f8d3a5-e4'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='file'> <source path='/var/lib/nova/instances/731c2191-fa82-4a38-9f52-e48fb37e92c8/console.log'/> <target port='0'/> <alias name='serial0'/> </serial> <serial type='pty'> <source path='/dev/pts/6'/> <target port='1'/> <alias name='serial1'/> </serial> <console type='file'> <source path='/var/lib/nova/instances/731c2191-fa82-4a38-9f52-e48fb37e92c8/console.log'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <input type='tablet' bus='usb'> <alias name='input0'/> </input> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='5904' autoport='yes' listen='0.0.0.0' keymap='en-us'> <listen type='address' address='0.0.0.0'/> </graphics> <video> <model type='cirrus' vram='9216' heads='1'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </memballoon> </devices> <seclabel type='dynamic' model='apparmor' relabel='yes'> <label>libvirt-731c2191-fa82-4a38-9f52-e48fb37e92c8</label> <imagelabel>libvirt-731c2191-fa82-4a38-9f52-e48fb37e92c8</imagelabel> </seclabel> </domain> I am reporting this for spark but this should be valid for any applications that involve fast data transfer between VMs. The bug has been reported in centos forums as well. http://bugs.centos.org/view.php?id=5526 and an older bug report on launchpad: https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978?comments=all Networking breaks after awhile in kvm guests using virtio networking. We run data intensive jobs on our virtual cluster (OpenStack Grizzly Installed on Ubuntu 12.04 Server). The job runs fine on a single worker VM (no data transfer involved). As soon as I add more nodes where the workers need to exchange some data, one of the worker VM goes down. Ping responds with 'host unreachable'. Logging in via the serial console shows no problems: eth0 is up, can ping the local host, but no outside connectivity. Restart the network (/etc/init.d/networking restart) does nothing. Reboot the machine and it comes alive again. 14/06/01 18:30:06 INFO YarnClientClusterScheduler: YarnClientClusterScheduler.postStartHook done 14/06/01 18:30:06 INFO MemoryStore: ensureFreeSpace(190758) called with curMem=0, maxMem=308713881 14/06/01 18:30:06 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 186.3 KB, free 294.2 MB) 14/06/01 18:30:06 INFO FileInputFormat: Total input paths to process : 1 14/06/01 18:30:06 INFO NetworkTopology: Adding a new node: /default-rack/10.20.20.28:50010 14/06/01 18:30:06 INFO NetworkTopology: Adding a new node: /default-rack/10.20.20.23:50010 14/06/01 18:30:06 INFO SparkContext: Starting job: count at hello_spark.py:15 14/06/01 18:30:06 INFO DAGScheduler: Got job 0 (count at hello_spark.py:15) with 2 output partitions (allowLocal=false) 14/06/01 18:30:06 INFO DAGScheduler: Final stage: Stage 0 (count at hello_spark.py:15) 14/06/01 18:30:06 INFO DAGScheduler: Parents of final stage: List() 14/06/01 18:30:06 INFO DAGScheduler: Missing parents: List() 14/06/01 18:30:06 INFO DAGScheduler: Submitting Stage 0 (PythonRDD[2] at count at hello_spark.py:15), which has no missing parents 14/06/01 18:30:07 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (PythonRDD[2] at count at hello_spark.py:15) 14/06/01 18:30:07 INFO YarnClientClusterScheduler: Adding task set 0.0 with 2 tasks 14/06/01 18:30:08 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@host-10-20-20-28.novalocal:44417/user/Executor#-1352071582] with ID 1 14/06/01 18:30:08 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 1: host-10-20-20-28.novalocal (PROCESS_LOCAL) 14/06/01 18:30:08 INFO TaskSetManager: Serialized task 0.0:0 as 3123 bytes in 14 ms 14/06/01 18:30:09 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager host-10-20-20-28.novalocal:42960 with 588.8 MB RAM 14/06/01 18:30:16 INFO BlockManagerMasterActor$BlockManagerInfo: Added rdd_1_0 in memory on host-10-20-20-28.novalocal:42960 (size: 308.2 MB, free: 280.7 MB) 14/06/01 18:30:17 INFO YarnClientSchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@host-10-20-20-23.novalocal:58126/user/Executor#1079893974] with ID 2 14/06/01 18:30:17 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 2: host-10-20-20-23.novalocal (PROCESS_LOCAL) 14/06/01 18:30:17 INFO TaskSetManager: Serialized task 0.0:1 as 3123 bytes in 1 ms 14/06/01 18:30:17 INFO BlockManagerMasterActor$BlockManagerInfo: Registering block manager host-10-20-20-23.novalocal:56776 with 588.8 MB RAM fj14/06/01 18:31:20 WARN BlockManagerMasterActor: Removing BlockManager BlockManagerId(1, host-10-20-20-28.novalocal, 42960, 0) with no recent heart beats: 55828ms exceeds 45000ms 14/06/01 18:42:23 INFO YarnClientSchedulerBackend: Executor 2 disconnected, so removing it 14/06/01 18:42:23 ERROR YarnClientClusterScheduler: Lost executor 2 on host-10-20-20-23.novalocal: remote Akka client disassociated The same job finishes flawlessly on a single worker. System Information: ================== Description: Ubuntu 12.04.4 LTS Release: 12.04 Linux 3.8.0-35-generic #52~precise1-Ubuntu SMP Thu Jan 30 17:24:40 UTC 2014 x86_64 libvirt-bin: -------------- Installed: 1.1.1-0ubuntu8~cloud2 Candidate: 1.1.1-0ubuntu8.7~cloud1 Version table: 1.1.1-0ubuntu8.7~cloud1 0 500 http://ubuntu-cloud.archive.canonical.com/ubuntu/ precise-updates/havana/main amd64 Packages *** 1.1.1-0ubuntu8~cloud2 0 100 /var/lib/dpkg/status 0.9.8-2ubuntu17.19 0 500 http://se.archive.ubuntu.com/ubuntu/ precise-updates/main amd64 Packages 0.9.8-2ubuntu17.17 0 500 http://security.ubuntu.com/ubuntu/ precise-security/main amd64 Packages 0.9.8-2ubuntu17 0 500 http://se.archive.ubuntu.com/ubuntu/ precise/main amd64 Packages qemu-kvm: --------------- Installed: 1.5.0+dfsg-3ubuntu5~cloud0 Candidate: 1.5.0+dfsg-3ubuntu5.4~cloud0 Version table: 1.5.0+dfsg-3ubuntu5.4~cloud0 0 500 http://ubuntu-cloud.archive.canonical.com/ubuntu/ precise-updates/havana/main amd64 Packages *** 1.5.0+dfsg-3ubuntu5~cloud0 0 100 /var/lib/dpkg/status 1.0+noroms-0ubuntu14.15 0 500 http://se.archive.ubuntu.com/ubuntu/ precise-updates/main amd64 Packages 1.0+noroms-0ubuntu14.14 0 500 http://security.ubuntu.com/ubuntu/ precise-security/main amd64 Packages 1.0+noroms-0ubuntu13 0 500 http://se.archive.ubuntu.com/ubuntu/ precise/main amd64 Packages XML DUMP for a VM ----------------------------- <domain type='kvm' id='7'> <name>instance-000001b6</name> <uuid>731c2191-fa82-4a38-9f52-e48fb37e92c8</uuid> <memory unit='KiB'>8388608</memory> <currentMemory unit='KiB'>8388608</currentMemory> <vcpu placement='static'>4</vcpu> <resource> <partition>/machine</partition> </resource> <sysinfo type='smbios'> <system> <entry name='manufacturer'>OpenStack Foundation</entry> <entry name='product'>OpenStack Nova</entry> <entry name='version'>2013.2.3</entry> <entry name='serial'>01d3d524-32eb-e011-8574-441ea15e3971</entry> <entry name='uuid'>731c2191-fa82-4a38-9f52-e48fb37e92c8</entry> </system> </sysinfo> <os> <type arch='x86_64' machine='pc-i440fx-1.5'>hvm</type> <boot dev='hd'/> <smbios mode='sysinfo'/> </os> <features> <acpi/> <apic/> </features> <cpu mode='host-model'> <model fallback='allow'/> </cpu> <clock offset='utc'> <timer name='pit' tickpolicy='delay'/> <timer name='rtc' tickpolicy='catchup'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>destroy</on_crash> <devices> <emulator>/usr/bin/kvm-spice</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none'/> <source file='/var/lib/nova/instances/731c2191-fa82-4a38-9f52-e48fb37e92c8/disk'/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> <controller type='usb' index='0'> <alias name='usb0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'> <alias name='pci0'/> </controller> <interface type='bridge'> <mac address='fa:16:3e:a7:de:97'/> <source bridge='qbr43f8d3a5-e4'/> <target dev='tap43f8d3a5-e4'/> <model type='virtio'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='file'> <source path='/var/lib/nova/instances/731c2191-fa82-4a38-9f52-e48fb37e92c8/console.log'/> <target port='0'/> <alias name='serial0'/> </serial> <serial type='pty'> <source path='/dev/pts/6'/> <target port='1'/> <alias name='serial1'/> </serial> <console type='file'> <source path='/var/lib/nova/instances/731c2191-fa82-4a38-9f52-e48fb37e92c8/console.log'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <input type='tablet' bus='usb'> <alias name='input0'/> </input> <input type='mouse' bus='ps2'/> <graphics type='vnc' port='5904' autoport='yes' listen='0.0.0.0' keymap='en-us'> <listen type='address' address='0.0.0.0'/> </graphics> <video> <model type='cirrus' vram='9216' heads='1'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> </memballoon> </devices> <seclabel type='dynamic' model='apparmor' relabel='yes'> <label>libvirt-731c2191-fa82-4a38-9f52-e48fb37e92c8</label> <imagelabel>libvirt-731c2191-fa82-4a38-9f52-e48fb37e92c8</imagelabel> </seclabel> </domain> I am reporting this for spark but this should be valid for any applications that involve fast data transfer between VMs. The bug has been reported in centos forums as well. http://bugs.centos.org/view.php?id=5526 and an older bug report on launchpad: https://bugs.launchpad.net/ubuntu/+source/qemu-kvm/+bug/997978?comments=all --- ApportVersion: 2.0.1-0ubuntu17.6 Architecture: amd64 DistroRelease: Ubuntu 12.04 InstallationMedia: Ubuntu-Server 12.04.3 LTS "Precise Pangolin" - Release amd64 (20130820.2) MarkForUpload: True Package: qemu-kvm 1.5.0+dfsg-3ubuntu5~cloud0 PackageArchitecture: amd64 ProcVersionSignature: Ubuntu 3.8.0-29.42~precise1-generic 3.8.13.5 Tags: precise third-party-packages Uname: Linux 3.8.0-29-generic x86_64 UnreportableReason: This is not an official Ubuntu package. Please remove any third party package and try again. UpgradeStatus: No upgrade log present (probably fresh install) UserGroups: adm cdrom dip libvirtd lpadmin plugdev sambashare sudo
2014-06-02 17:57:25 Izhar ul Hassan attachment added Dependencies.txt https://bugs.launchpad.net/bugs/1325560/+attachment/4124282/+files/Dependencies.txt
2014-06-02 17:57:29 Izhar ul Hassan attachment added ProcEnviron.txt https://bugs.launchpad.net/bugs/1325560/+attachment/4124283/+files/ProcEnviron.txt
2014-06-02 20:36:32 Serge Hallyn qemu-kvm (Ubuntu): importance Undecided High
2014-06-02 20:36:32 Serge Hallyn qemu-kvm (Ubuntu): status New Incomplete
2014-06-03 23:17:18 Serge Hallyn bug task added linux (Ubuntu)
2014-06-03 23:30:11 Brad Figg linux: status New Incomplete
2014-06-06 16:43:03 Serge Hallyn summary kvm vm loses network connectivity under "enough" load kvm virtio netdevs lose network connectivity under "enough" load
2014-06-06 16:43:32 Serge Hallyn linux (Ubuntu): importance Undecided High
2014-07-04 10:51:11 Launchpad Janitor linux (Ubuntu): status New Confirmed
2014-07-04 10:54:57 Ivan bug added subscriber Ivan
2014-07-04 10:56:30 Alexander bug added subscriber Alexander
2014-07-26 12:29:53 Lincoln Stoll bug added subscriber Lincoln Stoll
2014-07-28 15:01:16 Andreas Ntaflos bug added subscriber Andreas Ntaflos
2014-08-04 21:32:31 Chris J Arges bug added subscriber Chris J Arges
2014-08-04 21:53:47 Chris J Arges marked as duplicate 1346917
2016-09-30 10:34:56 Philipp Hahn removed duplicate marker 1346917
2016-10-03 15:40:35 Robie Basak bug added subscriber Ubuntu Server Team
2017-09-08 10:39:39 Tommi Aropalo bug added subscriber Tommi Aropalo
2018-06-20 06:30:58 Christian Ehrhardt  removed subscriber Ubuntu Server