VM stops receiving packets on heavy load from virtio network interface briged to a bonded interface on kvm hypervisor

Bug #1050934 reported by ITec
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
High
Unassigned
qemu-kvm (Ubuntu)
New
High
Unassigned

Bug Description

Versions:
Ubuntu: 12.04.1 LTS
linux-server 3.2.0.30.32
bridge-utils 1.5-2ubuntu6
ifenslave-2.6 1.1.0-19ubuntu5
qemu-kvm 1.0+noroms-0ubuntu14.1
libvirt0 0.9.8-2ubuntu17.3
iperf 2.0.5-2.1

Description:
I have got a typical setup for a kvm virtualizer with two network interfaces bonded together and bridged for vm-use.
I.e. two ethernet interfaces eth2+eth4 bonded together with IEEE 802.3ad Dynamic link aggregation to bond2.
bond2 is bridged to a bridge interface named brb2.
Then I have got kvm virtual machines attached with their interfaces to brb2 as well.

When I test network performance (and generate heavy load) after a few seconds, the bridge stops forwarding packets to the virtual machine interface.

Overview:

workstation (172.16.2.231)
|
manageable switch with IEEE 802.3ad cap.
| |
eth2 eth4
| |
bond2
|
brb2 Hypervisor (172.16.2.225)
|
vnet0
|
eth0 on virtual machine (172.16.2.222)

/etc/network/interfaces on Hypervisor:
# Slave für bond2
allow-eths eth2
iface eth2 inet manual
        bond-master bond2
        bond-primary eth2 eth4
        pre-up ip link set eth2 up
        post-up ifconfig eth2 -multicast

# Slave für bond2
allow-eths eth4
iface eth4 inet manual
        bond-master bond2
        bond-primary eth2 eth4
        pre-up ip link set eth4 up
        post-up ifconfig eth4 -multicast

# Master für eth2+eth4, Slave für brb2
allow-bonds bond2
iface bond2 inet manual
        bond-slaves none
        bond_mode 4
        bond_miimon 100
        bond_updelay 200
        bond_downdelay 200
        post-up ip link set bond2 address $(ethtool -P eth2 | awk '{print $3}')
        post-up ifconfig bond2 -multicast

# Master für bond2
allow-brbs brb2
iface brb2 inet static
        address 172.16.2.225
        netmask 255.255.255.0
        network 172.16.2.0
        broadcast 172.16.2.255
        gateway 172.16.2.254
        bridge_ports bond2
        bridge_stp off
        bridge_fd 0
        bridge_maxwait 0
        post-up ifconfig brb2 -multicast
        post-up echo 0 > /sys/devices/virtual/net/brb2/bridge/multicast_router
        post-up echo 0 > /sys/devices/virtual/net/brb2/bridge/multicast_snooping

# "pre-up ip link set ethX up" is used as a workaround for buggy eth-drivers to come up
# "post-up ip link set bond2 address $(ethtool -P eth2 | awk '{print $3}')" is used to insure bond2 to get always the same MAC (the one from eth2)
# these interfaces are set up by a separate script using ifup

/etc/network/interfaces on virtual machine:
# The primary network interface
auto eth0
iface eth0 inet static
        address 172.16.2.222
        netmask 255.255.255.0
        network 172.16.2.0
        broadcast 172.16.2.255
        gateway 172.16.2.254

cat /proc/net/bonding/bond2:
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Active Aggregator Info:
        Aggregator ID: 2
        Number of ports: 1
        Actor Key: 17
        Partner Key: 1
        Partner Mac Address: 00:15:77:c1:76:92

Slave Interface: eth2
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:e0:81:cd:fc:dc
Aggregator ID: 1
Slave queue ID: 0

Slave Interface: eth4
MII Status: up
Speed: 1000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 90:e2:ba:1b:ba:9c
Aggregator ID: 2
Slave queue ID: 0

brctl show:
bridge name bridge id STP enabled interfaces
brb2 8000.00e081cdfcdc no bond2
                                                                                    vnet0
ifconfig:
bond2 Link encap:Ethernet Hardware Adresse 00:e0:81:cd:fc:dc
          UP BROADCAST RUNNING MASTER MTU:1500 Metrik:1
          RX packets:125371104 errors:0 dropped:2630 overruns:13000 frame:0
          TX packets:102064536 errors:0 dropped:0 overruns:0 carrier:0
          Kollisionen:0 Sendewarteschlangenlänge:0
          RX-Bytes:161005246299 (161.0 GB) TX-Bytes:76855627386 (76.8 GB)

brb2 Link encap:Ethernet Hardware Adresse 00:e0:81:cd:fc:dc
          inet Adresse:172.16.2.225 Bcast:172.16.2.255 Maske:255.255.255.0
          UP BROADCAST RUNNING MTU:1500 Metrik:1
          RX packets:347889 errors:0 dropped:19655 overruns:0 frame:0
          TX packets:145157 errors:0 dropped:0 overruns:0 carrier:0
          Kollisionen:0 Sendewarteschlangenlänge:0
          RX-Bytes:97420273 (97.4 MB) TX-Bytes:259259608 (259.2 MB)

eth2 Link encap:Ethernet Hardware Adresse 00:e0:81:cd:fc:dc
          UP BROADCAST RUNNING SLAVE MTU:1500 Metrik:1
          RX packets:410 errors:0 dropped:410 overruns:0 frame:0
          TX packets:2803 errors:0 dropped:0 overruns:0 carrier:0
          Kollisionen:0 Sendewarteschlangenlänge:1000
          RX-Bytes:243540 (243.5 KB) TX-Bytes:642578 (642.5 KB)
          Interrupt:47 Speicher:fe7e0000-fe800000

eth4 Link encap:Ethernet Hardware Adresse 00:e0:81:cd:fc:dc
          UP BROADCAST RUNNING SLAVE MTU:1500 Metrik:1
          RX packets:125370694 errors:0 dropped:0 overruns:13000 frame:0
          TX packets:102061733 errors:0 dropped:0 overruns:0 carrier:0
          Kollisionen:0 Sendewarteschlangenlänge:1000
          RX-Bytes:161005002759 (161.0 GB) TX-Bytes:76854984808 (76.8 GB)
          Speicher:fcbe0000-fcc00000

lo Link encap:Lokale Schleife
          inet Adresse:127.0.0.1 Maske:255.0.0.0
          UP LOOPBACK RUNNING MTU:16436 Metrik:1
          RX packets:204109 errors:0 dropped:0 overruns:0 frame:0
          TX packets:204109 errors:0 dropped:0 overruns:0 carrier:0
          Kollisionen:0 Sendewarteschlangenlänge:0
          RX-Bytes:263565238 (263.5 MB) TX-Bytes:263565238 (263.5 MB)

vnet0 Link encap:Ethernet Hardware Adresse fe:54:00:5b:5a:4d
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metrik:1
          RX packets:451926 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1880757 errors:0 dropped:0 overruns:0 carrier:0
          Kollisionen:0 Sendewarteschlangenlänge:500
          RX-Bytes:4926362916 (4.9 GB) TX-Bytes:2461725339 (2.4 GB)

First ping works on virtual machine:
ping -c 4 172.16.2.231
PING 172.16.2.231 (172.16.2.231) 56(84) bytes of data.
64 bytes from 172.16.2.231: icmp_req=1 ttl=64 time=0.503 ms
64 bytes from 172.16.2.231: icmp_req=2 ttl=64 time=0.601 ms
64 bytes from 172.16.2.231: icmp_req=3 ttl=64 time=0.608 ms
64 bytes from 172.16.2.231: icmp_req=4 ttl=64 time=0.538 ms

--- 172.16.2.231 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.503/0.562/0.608/0.049 ms

And on workstation, too:
ping -c 4 172.16.2.222
PING 172.16.2.222 (172.16.2.222) 56(84) bytes of data.
64 bytes from 172.16.2.222: icmp_seq=1 ttl=64 time=0.704 ms
64 bytes from 172.16.2.222: icmp_seq=2 ttl=64 time=0.588 ms
64 bytes from 172.16.2.222: icmp_seq=3 ttl=64 time=0.531 ms
64 bytes from 172.16.2.222: icmp_seq=4 ttl=64 time=0.560 ms

--- 172.16.2.222 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 2997ms
rtt min/avg/max/mdev = 0.531/0.595/0.704/0.072 ms

Then I start iperf server on virtual machine:
iperf -s -i 1

And iperf client on workstation:
iperf -c 172.16.2.222 -t 3600 -i 1 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.16.2.222, TCP port 5001
TCP window size: 108 KByte (default)
------------------------------------------------------------
[ 5] local 172.16.2.231 port 56734 connected with 172.16.2.222 port 5001
[ 4] local 172.16.2.231 port 5001 connected with 172.16.2.222 port 38241
[ 4] 0.0- 1.0 sec 101 MBytes 850 Mbits/sec
[ 5] 0.0- 1.0 sec 54.8 MBytes 460 Mbits/sec
[ 5] 1.0- 2.0 sec 50.5 MBytes 423 Mbits/sec
[ 4] 1.0- 2.0 sec 107 MBytes 898 Mbits/sec
[ 5] 2.0- 3.0 sec 52.5 MBytes 440 Mbits/sec
[ 4] 2.0- 3.0 sec 108 MBytes 904 Mbits/sec
[ 4] 3.0- 4.0 sec 107 MBytes 899 Mbits/sec
[ 5] 3.0- 4.0 sec 52.0 MBytes 436 Mbits/sec
[ 4] 4.0- 5.0 sec 108 MBytes 904 Mbits/sec
[ 5] 4.0- 5.0 sec 50.5 MBytes 423 Mbits/sec
[ 4] 5.0- 6.0 sec 107 MBytes 900 Mbits/sec
[ 5] 5.0- 6.0 sec 51.6 MBytes 433 Mbits/sec
[ 5] 6.0- 7.0 sec 51.3 MBytes 431 Mbits/sec
[ 4] 6.0- 7.0 sec 108 MBytes 906 Mbits/sec
[ 5] 7.0- 8.0 sec 50.8 MBytes 426 Mbits/sec
[ 4] 7.0- 8.0 sec 108 MBytes 906 Mbits/sec
[ 5] 8.0- 9.0 sec 52.7 MBytes 442 Mbits/sec
[ 4] 8.0- 9.0 sec 105 MBytes 877 Mbits/sec

After a few seconds it stops receiving and tranmitting data.

Ping does not work any longer on virtual machine:
ping -c 4 172.16.2.231
PING 172.16.2.231 (172.16.2.231) 56(84) bytes of data.

--- 172.16.2.231 ping statistics ---
4 packets transmitted, 0 received, 100% packet loss, time 2999ms

Nor on workstation:
ping -c 4 172.16.2.222
PING 172.16.2.222 (172.16.2.222) 56(84) bytes of data.
From 172.16.2.231 icmp_seq=1 Destination Host Unreachable
From 172.16.2.231 icmp_seq=2 Destination Host Unreachable
From 172.16.2.231 icmp_seq=3 Destination Host Unreachable
From 172.16.2.231 icmp_seq=4 Destination Host Unreachable

--- 172.16.2.222 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3026ms, pipe 3

After a reboot of the virtual machine ping works again:

Ping on workstation:
ping -c 4 172.16.2.222
PING 172.16.2.222 (172.16.2.222) 56(84) bytes of data.
64 bytes from 172.16.2.222: icmp_seq=1 ttl=64 time=14.7 ms
64 bytes from 172.16.2.222: icmp_seq=2 ttl=64 time=0.774 ms
64 bytes from 172.16.2.222: icmp_seq=3 ttl=64 time=0.770 ms
64 bytes from 172.16.2.222: icmp_seq=4 ttl=64 time=0.777 ms

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Thanks for reporting this bug.

Can you confirm that your switch is configured to enable 802.3ad mode?

Would it be possible to confirm that this does not happen with a container? The recipe to create the container would be:

=======================
sudo apt-get -y install lxc
cat > lxc2.conf << EOF
lxc.network.type=veth
lxc.network.bridge=brb2
lxc.network.flags=up
EOF
sudo lxc-create -t ubuntu -n q1 -f lxc2.conf
=======================

Then start the container with

sudo lxc-start -n q1

log in on that console with username ubuntu password ubuntu, and run the test from there?

Changed in linux (Ubuntu):
status: New → Incomplete
importance: Undecided → High
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Also, please run 'apport-collect 1050934' to cause more debug information to be uploaded.

Revision history for this message
ITec (itec) wrote :

Yes, the switch is 802.3ad enabled.

LACP Group Status says:

System ID :00:15:77:C1:76:92

Key: 1
Aggregator Attached Port List
9 9,10
10

"Key" and "SystemID" correspond with "Partner Key" and "Partner Mac Address" shown in /proc/net/bonding/bond2

I tried the same tests with a container. I had to change "lxc.network.bridge=brb2" to "lxc.network.link=brb2" to get it work.

The problem did not occur in the container.
Nor does it occur with emulated NICs like e1000 in the virtual machine. But these are much slower than NICs of the type virtio. Does lxc use virtio?
It only occurs with virtio.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Is the vhost_net module loaded? If not, does it help to 'sudo modprobe vhost_net' ?

Changed in linux (Ubuntu):
status: Incomplete → New
status: New → Incomplete
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Also, does 'ifdown eth0; ifup eth0' in the guest bring networking back up? I'm wondering whether you have the same problem as bug 997978. Note that that bug was fixed for reporters by using the qemu-kvm in ppa:ubuntu-virt/backports, and we're waiting on feedback on whether the qemu-kvm version in ppa:ubuntu-virt/kvm-network-hang also fixes it.

So it should definately be worth it to test at leaset ppa:ubuntu-virt/backports.

Changed in qemu-kvm (Ubuntu):
importance: Undecided → High
status: New → Incomplete
Revision history for this message
ITec (itec) wrote :

No, the vhost_net module is not loaded.
Loading it does not help.

No, 'ifdown eth0; ifup eth0' in the guest does not reliably bring networking back up. Sometimes it works, sometimes I have to reboot the VM. The last times it did not work any more.

My problem might be the same as bug 997978.
But I only have it in conjunction with bridged bonding and I only have seen the effect after heavy load and not after time.
Maybe it could occur after a longer period of time on my system, too.

qemu-kvm in ppa:ubuntu-virt/backports and ppa:ubuntu-virt/kvm-network-hang both work well, but I did no long time testing. It still could occur after some time.

Now I am running my iperf-test with the kvm-network-hang version over some hours, but I cannot test it infinitely long.

What are the differences between the official version of qemu-kvm and the one in kvm-network-hang?

I really need to know if it is reliable in order to come to a decision, wether to use it or not in production systems!

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1050934] Re: VM stops receiving packets on heavy load from virtio network interface briged to a bonded interface on kvm hypervisor

Quoting ITec (<email address hidden>):
> What are the differences between the official version of qemu-kvm and
> the one in kvm-network-hang?

The version in kvm-network-hang is the same as the official version, plus
three virtio patches from upstream:

 a821ce5 virtio: order index/descriptor reads
 92045d8 virtio: add missing mb() on enable notification
 a281ebc virtio: add missing mb() on notification

That version will likely become the new official version, once there
are confirmations that it is solving these virtio-network related
bugs.

Revision history for this message
ITec (itec) wrote :

Would you suggest to mark this bug as a duplicate to bug 997978?

I could register myself to be affected, add some words about how the effect occurs at my system and confirm that the kvm-network-hang version helped me.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

It sounds like your bug is fixed by the same commits, but your descriptions in comment #6 make me think the actual bug may be different.

Changed in linux (Ubuntu):
status: Incomplete → New
Changed in qemu-kvm (Ubuntu):
status: Incomplete → New
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1050934

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
ITec (itec) wrote :

@brad-figg: My system is behind a firewall. I do not have direct access to the internet and "apport-bug --save=/tmp/bug.apt" does not produce any output.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

@Itec

I've marked this as a duplicate of 997978 as you suggested above. If you find that the fix for that bug does NOT solve your issue, then please go ahead and un-mark this as a duplicate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.