VM stops receiving packets on heavy load from virtio network interface briged to a bonded interface on kvm hypervisor
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Incomplete
|
High
|
Unassigned | ||
qemu-kvm (Ubuntu) |
New
|
High
|
Unassigned |
Bug Description
Versions:
Ubuntu: 12.04.1 LTS
linux-server 3.2.0.30.32
bridge-utils 1.5-2ubuntu6
ifenslave-2.6 1.1.0-19ubuntu5
qemu-kvm 1.0+noroms-
libvirt0 0.9.8-2ubuntu17.3
iperf 2.0.5-2.1
Description:
I have got a typical setup for a kvm virtualizer with two network interfaces bonded together and bridged for vm-use.
I.e. two ethernet interfaces eth2+eth4 bonded together with IEEE 802.3ad Dynamic link aggregation to bond2.
bond2 is bridged to a bridge interface named brb2.
Then I have got kvm virtual machines attached with their interfaces to brb2 as well.
When I test network performance (and generate heavy load) after a few seconds, the bridge stops forwarding packets to the virtual machine interface.
Overview:
workstation (172.16.2.231)
|
manageable switch with IEEE 802.3ad cap.
| |
eth2 eth4
| |
bond2
|
brb2 Hypervisor (172.16.2.225)
|
vnet0
|
eth0 on virtual machine (172.16.2.222)
/etc/network/
# Slave für bond2
allow-eths eth2
iface eth2 inet manual
bond-master bond2
pre-up ip link set eth2 up
post-up ifconfig eth2 -multicast
# Slave für bond2
allow-eths eth4
iface eth4 inet manual
bond-master bond2
pre-up ip link set eth4 up
post-up ifconfig eth4 -multicast
# Master für eth2+eth4, Slave für brb2
allow-bonds bond2
iface bond2 inet manual
bond-slaves none
bond_mode 4
bond_miimon 100
post-up ip link set bond2 address $(ethtool -P eth2 | awk '{print $3}')
post-up ifconfig bond2 -multicast
# Master für bond2
allow-brbs brb2
iface brb2 inet static
address 172.16.2.225
netmask 255.255.255.0
network 172.16.2.0
broadcast 172.16.2.255
gateway 172.16.2.254
bridge_stp off
bridge_fd 0
post-up ifconfig brb2 -multicast
post-up echo 0 > /sys/devices/
post-up echo 0 > /sys/devices/
# "pre-up ip link set ethX up" is used as a workaround for buggy eth-drivers to come up
# "post-up ip link set bond2 address $(ethtool -P eth2 | awk '{print $3}')" is used to insure bond2 to get always the same MAC (the one from eth2)
# these interfaces are set up by a separate script using ifup
/etc/network/
# The primary network interface
auto eth0
iface eth0 inet static
address 172.16.2.222
netmask 255.255.255.0
network 172.16.2.0
broadcast 172.16.2.255
gateway 172.16.2.254
cat /proc/net/
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200
802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
Aggregator ID: 2
Number of ports: 1
Actor Key: 17
Partner Key: 1
Partner Mac Address: 00:15:77:c1:76:92
Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:e0:81:cd:fc:dc
Aggregator ID: 1
Slave queue ID: 0
Slave Interface: eth4
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 90:e2:ba:1b:ba:9c
Aggregator ID: 2
Slave queue ID: 0
brctl show:
bridge name bridge id STP enabled interfaces
brb2 8000.00e081cdfcdc no bond2
ifconfig:
bond2 Link encap:Ethernet Hardware Adresse 00:e0:81:cd:fc:dc
UP BROADCAST RUNNING MASTER MTU:1500 Metrik:1
RX packets:125371104 errors:0 dropped:2630 overruns:13000 frame:0
TX packets:102064536 errors:0 dropped:0 overruns:0 carrier:0
brb2 Link encap:Ethernet Hardware Adresse 00:e0:81:cd:fc:dc
inet Adresse:
UP BROADCAST RUNNING MTU:1500 Metrik:1
RX packets:347889 errors:0 dropped:19655 overruns:0 frame:0
TX packets:145157 errors:0 dropped:0 overruns:0 carrier:0
eth2 Link encap:Ethernet Hardware Adresse 00:e0:81:cd:fc:dc
UP BROADCAST RUNNING SLAVE MTU:1500 Metrik:1
RX packets:410 errors:0 dropped:410 overruns:0 frame:0
TX packets:2803 errors:0 dropped:0 overruns:0 carrier:0
eth4 Link encap:Ethernet Hardware Adresse 00:e0:81:cd:fc:dc
UP BROADCAST RUNNING SLAVE MTU:1500 Metrik:1
RX packets:125370694 errors:0 dropped:0 overruns:13000 frame:0
TX packets:102061733 errors:0 dropped:0 overruns:0 carrier:0
lo Link encap:Lokale Schleife
inet Adresse:127.0.0.1 Maske:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metrik:1
RX packets:204109 errors:0 dropped:0 overruns:0 frame:0
TX packets:204109 errors:0 dropped:0 overruns:0 carrier:0
vnet0 Link encap:Ethernet Hardware Adresse fe:54:00:5b:5a:4d
UP BROADCAST RUNNING MULTICAST MTU:1500 Metrik:1
RX packets:451926 errors:0 dropped:0 overruns:0 frame:0
TX packets:1880757 errors:0 dropped:0 overruns:0 carrier:0
First ping works on virtual machine:
ping -c 4 172.16.2.231
PING 172.16.2.231 (172.16.2.231) 56(84) bytes of data.
64 bytes from 172.16.2.231: icmp_req=1 ttl=64 time=0.503 ms
64 bytes from 172.16.2.231: icmp_req=2 ttl=64 time=0.601 ms
64 bytes from 172.16.2.231: icmp_req=3 ttl=64 time=0.608 ms
64 bytes from 172.16.2.231: icmp_req=4 ttl=64 time=0.538 ms
--- 172.16.2.231 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.503/0.
And on workstation, too:
ping -c 4 172.16.2.222
PING 172.16.2.222 (172.16.2.222) 56(84) bytes of data.
64 bytes from 172.16.2.222: icmp_seq=1 ttl=64 time=0.704 ms
64 bytes from 172.16.2.222: icmp_seq=2 ttl=64 time=0.588 ms
64 bytes from 172.16.2.222: icmp_seq=3 ttl=64 time=0.531 ms
64 bytes from 172.16.2.222: icmp_seq=4 ttl=64 time=0.560 ms
--- 172.16.2.222 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2997ms
rtt min/avg/max/mdev = 0.531/0.
Then I start iperf server on virtual machine:
iperf -s -i 1
And iperf client on workstation:
iperf -c 172.16.2.222 -t 3600 -i 1 -d
-------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
-------
-------
Client connecting to 172.16.2.222, TCP port 5001
TCP window size: 108 KByte (default)
-------
[ 5] local 172.16.2.231 port 56734 connected with 172.16.2.222 port 5001
[ 4] local 172.16.2.231 port 5001 connected with 172.16.2.222 port 38241
[ 4] 0.0- 1.0 sec 101 MBytes 850 Mbits/sec
[ 5] 0.0- 1.0 sec 54.8 MBytes 460 Mbits/sec
[ 5] 1.0- 2.0 sec 50.5 MBytes 423 Mbits/sec
[ 4] 1.0- 2.0 sec 107 MBytes 898 Mbits/sec
[ 5] 2.0- 3.0 sec 52.5 MBytes 440 Mbits/sec
[ 4] 2.0- 3.0 sec 108 MBytes 904 Mbits/sec
[ 4] 3.0- 4.0 sec 107 MBytes 899 Mbits/sec
[ 5] 3.0- 4.0 sec 52.0 MBytes 436 Mbits/sec
[ 4] 4.0- 5.0 sec 108 MBytes 904 Mbits/sec
[ 5] 4.0- 5.0 sec 50.5 MBytes 423 Mbits/sec
[ 4] 5.0- 6.0 sec 107 MBytes 900 Mbits/sec
[ 5] 5.0- 6.0 sec 51.6 MBytes 433 Mbits/sec
[ 5] 6.0- 7.0 sec 51.3 MBytes 431 Mbits/sec
[ 4] 6.0- 7.0 sec 108 MBytes 906 Mbits/sec
[ 5] 7.0- 8.0 sec 50.8 MBytes 426 Mbits/sec
[ 4] 7.0- 8.0 sec 108 MBytes 906 Mbits/sec
[ 5] 8.0- 9.0 sec 52.7 MBytes 442 Mbits/sec
[ 4] 8.0- 9.0 sec 105 MBytes 877 Mbits/sec
After a few seconds it stops receiving and tranmitting data.
Ping does not work any longer on virtual machine:
ping -c 4 172.16.2.231
PING 172.16.2.231 (172.16.2.231) 56(84) bytes of data.
--- 172.16.2.231 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 2999ms
Nor on workstation:
ping -c 4 172.16.2.222
PING 172.16.2.222 (172.16.2.222) 56(84) bytes of data.
From 172.16.2.231 icmp_seq=1 Destination Host Unreachable
From 172.16.2.231 icmp_seq=2 Destination Host Unreachable
From 172.16.2.231 icmp_seq=3 Destination Host Unreachable
From 172.16.2.231 icmp_seq=4 Destination Host Unreachable
--- 172.16.2.222 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3026ms, pipe 3
After a reboot of the virtual machine ping works again:
Ping on workstation:
ping -c 4 172.16.2.222
PING 172.16.2.222 (172.16.2.222) 56(84) bytes of data.
64 bytes from 172.16.2.222: icmp_seq=1 ttl=64 time=14.7 ms
64 bytes from 172.16.2.222: icmp_seq=2 ttl=64 time=0.774 ms
64 bytes from 172.16.2.222: icmp_seq=3 ttl=64 time=0.770 ms
64 bytes from 172.16.2.222: icmp_seq=4 ttl=64 time=0.777 ms
Thanks for reporting this bug.
Can you confirm that your switch is configured to enable 802.3ad mode?
Would it be possible to confirm that this does not happen with a container? The recipe to create the container would be:
======= ======= ======= == type=veth bridge= brb2 flags=up ======= ======= ==
sudo apt-get -y install lxc
cat > lxc2.conf << EOF
lxc.network.
lxc.network.
lxc.network.
EOF
sudo lxc-create -t ubuntu -n q1 -f lxc2.conf
=======
Then start the container with
sudo lxc-start -n q1
log in on that console with username ubuntu password ubuntu, and run the test from there?