KVM images lose connectivity with bridged network
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Invalid
|
Undecided
|
Unassigned | ||
qemu-kvm (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Precise |
Fix Released
|
High
|
Unassigned |
Bug Description
=======
SRU Justification:
1. Impact: networking breaks after awhile in kvm guests using virtio networking
2. Development fix: The bug was fixed upstream and the fix picked up in a new
merge.
3. Stable fix: 3 virtio patches are cherrypicked from upstream:
a821ce5 virtio: order index/descriptor reads
92045d8 virtio: add missing mb() on enable notification
a281ebc virtio: add missing mb() on notification
4. Test case: Create a bridge enslaving the real NIC, and use that as the bridge
for a kvm instance with virtio networking. See comment #44 for specific test
case.
5. Regression potential: Should be low as several people have tested the fixed
package under heavy load.
=======
System:
-----------
Dell R410 Dual processor 2.4Ghz w/16G RAM
Distributor ID: Ubuntu
Description: Ubuntu 12.04 LTS
Release: 12.04
Codename: precise
Setup:
---------
We're running 3 KVM guests, all Ubuntu 12.04 LTS using bridged networking.
From the host:
# cat /etc/network/
auto br0
iface br0 inet static
address 212.XX.239.98
netmask 255.255.255.240
gateway 212.XX.239.97
bridge_fd 9
bridge_stp off
# ifconfig eth0
eth0 Link encap:Ethernet HWaddr d4:ae:52:84:2d:5a
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:11278363 errors:0 dropped:3128 overruns:0 frame:0
TX packets:14437384 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:4115980743 (4.1 GB) TX bytes:5451961979 (5.4 GB)
# ifconfig br0
br0 Link encap:Ethernet HWaddr d4:ae:52:84:2d:5a
inet addr:212.XX.239.98 Bcast:212.
inet6 addr: fe80::d6ae:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1720861 errors:0 dropped:0 overruns:0 frame:0
TX packets:1708622 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:210152198 (210.1 MB) TX bytes:300858508 (300.8 MB)
# brctl show
bridge name bridge id STP enabled interfaces
br0 8000.d4ae52842d5a no eth0
I have no default network configured to autostart in libvirt as we're using bridged networking:
# virsh net-list --all
Name State Autostart
-------
default inactive no
# arp
Address HWtype HWaddress Flags Mask Iface
mailer03.xxxx.com ether 52:54:00:82:5f:0f C br0
mailer01.xxxx.com ether 52:54:00:d2:f7:31 C br0
mailer02.xxxx.com ether 52:54:00:d3:8f:91 C br0
dxi-gw2.xxxx.com ether 00:1a:30:2a:b1:c0 C br0
From one of the guests:
<domain type='kvm' id='4'>
<name>
<uuid>
<memory>
<currentMemor
<vcpu>1</vcpu>
<os>
<type arch='x86_64' machine=
<boot dev='hd'/>
</os>
<features>
<acpi/>
</features>
<clock offset='utc'/>
<on_poweroff>
<on_reboot>
<on_crash>
<devices>
<emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/
<target dev='hda' bus='ide'/>
<alias name='ide0-0-0'/>
<address type='drive' controller='0' bus='0' unit='0'/>
</disk>
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/
<target dev='hdb' bus='ide'/>
<alias name='ide0-0-1'/>
<address type='drive' controller='0' bus='0' unit='1'/>
</disk>
<controller type='ide' index='0'>
<alias name='ide0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
</controller>
<interface type='bridge'>
<mac address=
<source bridge='br0'/>
<target dev='vnet0'/>
<model type='virtio'/>
<alias name='net0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</interface>
<serial type='pty'>
<source path='/dev/pts/0'/>
<target port='0'/>
<alias name='serial0'/>
</serial>
<console type='pty' tty='/dev/pts/0'>
<source path='/dev/pts/0'/>
<target type='serial' port='0'/>
<alias name='serial0'/>
</console>
<input type='mouse' bus='ps2'/>
<graphics type='vnc' port='5900' autoport='yes' listen='127.0.0.1'>
<listen type='address' address=
</graphics>
<video>
<model type='cirrus' vram='9216' heads='1'/>
<alias name='video0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
</video>
<memballoon model='virtio'>
<alias name='balloon0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
</memballoon>
</devices>
<seclabel type='dynamic' model='apparmor' relabel='yes'>
<label>
<imagelabel
</seclabel>
</domain>
From within the guest:
# cat /etc/network/
# The primary network interface
auto eth0
iface eth0 inet static
address 212.XX.239.100
netmask 255.255.255.240
network 212.XX.239.96
broadcast 212.XX.239.111
gateway 212.XX.239.97
# ifconfig
eth0 Link encap:Ethernet HWaddr 52:54:00:d2:f7:31
inet addr:212.XX.239.100 Bcast:212.
inet6 addr: fe80::5054:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5631830 errors:0 dropped:0 overruns:0 frame:0
TX packets:6683416 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:2027322829 (2.0 GB) TX bytes:2076698690 (2.0 GB)
A commandline which starts the KVM guest:
/usr/bin/kvm -S -M pc-1.0 -enable-kvm -m 2048 -smp 1,sockets=
Problem:
------------
Periodically (at least once a day), one or more of the guests lose network connectivity. Ping responds with 'host unreachable', even from the dom host. Logging in via the serial console shows no problems: eth0 is up, can ping the local host, but no outside connectivity. Restart the network (/etc/init.
I've verified there's no arp games going on on the primary host (the arp tables remain the same before - when it had connectivity - and after - when it doesn't.
This is a critical issue affecting production services on the latest LTS release of Ubuntu. It's similar to an issue which was 'resolved' in 10.04 but appears to have risen its ugly head again.
Changed in libvirt (Ubuntu): | |
importance: | Undecided → High |
Changed in libvirt (Ubuntu): | |
status: | Incomplete → New |
Changed in bridge-utils (Ubuntu): | |
status: | New → Incomplete |
Changed in libvirt (Ubuntu): | |
status: | New → Incomplete |
Changed in bridge-utils (Ubuntu): | |
importance: | Undecided → High |
Changed in libvirt (Ubuntu): | |
status: | Incomplete → Confirmed |
Changed in ifenslave (Ubuntu): | |
status: | New → Confirmed |
Changed in nova: | |
status: | New → Invalid |
no longer affects: | ifenslave (Ubuntu) |
no longer affects: | libvirt (Ubuntu) |
no longer affects: | bridge-utils (Ubuntu) |
no longer affects: | linux (Ubuntu) |
Changed in qemu-kvm (Ubuntu): | |
status: | Confirmed → Fix Released |
Changed in qemu-kvm (Ubuntu Precise): | |
status: | New → In Progress |
importance: | Undecided → High |
description: | updated |
Changed in qemu-kvm (Ubuntu Precise): | |
status: | Fix Committed → Fix Released |
Changed in qemu-kvm (Ubuntu): | |
status: | Fix Released → Fix Committed |
status: | Fix Committed → Fix Released |
Changed in qemu-kvm (Ubuntu Precise): | |
status: | Fix Released → Fix Committed |
tags: |
added: verification-done removed: verification-needed |
Thanks for reporting this bug. Does this also happen ifonly one of the VMs is up? Is there any pattern to the time of day or length of a vm's uptime before this happens? What does 'route -n' show before and after it happens?