network slow when 2 VMs using virtio net bridged to same pyhs. network device on kvm host

Bug #1079212 reported by ITec
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
iperf (Ubuntu)
Confirmed
Medium
Unassigned
linux (Ubuntu)
Incomplete
Medium
Unassigned
qemu-kvm (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

Versions:
----------

Ubuntu: 12.10
linux: 3.5.0-18-generic
bridge-utils 1.5-4ubuntu2
qemu-kvm: 1.2.0+noroms-0ubuntu2
libvirt0: 0.9.13-0ubuntu12

Problem:
----------

When bidirectionally transmitting network data from LAN to two VMs on a KVM host via a bridged network interface, I have got a significant performance loss compared to transmissions from LAN to only one VM on the KVM host.

When bidirectionally transmitting to one VM I get about 1000 Mbit/s in one and 250 Mbit/s in the other direction.
When concurrently, bidirectionally transmitting to two VMs I get about 1000 Mbit/s in one and only 37 Mbit/s in the other direction.

Questions:
------------

So what is the bottle neck?
How can I reach full network bandwith?

Machines:
-----------

ubuntu5 KVM host
ubuntu1 host on same network-switch like ubuntu5
ubuntu4 host on same network-switch like ubuntu5
vm2 KVM virtual machine on ubuntu5
vm3 KVM virtual machine on ubuntu5

Network interfaces:
-----------------------
eth0:ubuntu1 (172.16.2.131) physical network interface eth0 on ubuntu1 1Gbit/s
eth0:ubuntu4 (172.16.2.134) physical network interface eth0 on ubuntu4 1Gbit/s
eth2:ubuntu5 physical network interface eth2 on ubuntu5 1Gbit/s
br2 (172.16.2.125) network bridge on ubuntu5 that VMs connect to
vnet0 virtual network interface that vm2 connects with to br2 (equiv. eth0 on vm2 guest system)
vnet1 virtual network interface that vm3 connects with to br2 (equiv. eth0 on vm3 guest system)
eth0:vm2 (172.16.2.122) virtual network interface on vm2 using virtio
eth0:vm2 (172.16.2.123) virtual network interface on vm3 using virtio

Network map:
----------------

eth0:ubuntu1
|
switch--eth0:ubuntu4
|
eth2:ubuntu5
|
br2-----vnet0(eth0:vm2)
|
vnet1(eth0:vm3)

Performance checks:
------------------------

Performance_01: ubuntu1 <-> vm2:
 + 1191 Mbit/s (932 Mbit/s + 259 Mbit/s)

Performance_02: ubuntu4 <-> vm3:
 + 1209 Mbit/s (938 Mbit/s + 271 Mbit/s)

Performance_03: ubuntu1 <-> vm2 & ubuntu4 <-> vm3:
 + 497 Mbit/s (475 Mbit/s + 22 Mbit/s)
 + 481 Mbit/s (466 Mbit/s + 15 Mbit/s)
---------------------------------------
 = 978 Mbit/s (941 Mbit/s + 37 Mbit/s)

Why are Performance_01 and Performance_02 not a bit more like: 2048 Mbit/s (1024 Mbit/s + 1024 Mbit/s)?
Why does it get even less, when I transfer data from/to two different VMs?
37 Mbit/s is very poor! :-(

Here is the output of the commands used:

Performance_01
-------------------

ubuntu5# bmon
  # Interface RX Rate RX # TX Rate TX #
-------------------------------------------------------------------
ubuntu5 (source: local)
  0 br2 143.00B 3 0.00B 0
  13 eth2 117.02MiB 95394 47.80MiB 57761
  19 vnet1 0.00B 0 211.00B 3
  20 vnet0 45.84MiB 26565 113.50MiB 39413

ubuntu1# iperf -c 172.16.2.122 -t 31536000 -i 10 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.16.2.122, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 5] local 172.16.2.131 port 58506 connected with 172.16.2.122 port 5001
[ 4] local 172.16.2.131 port 5001 connected with 172.16.2.122 port 47751
[ 5] 0.0-10.0 sec 1.02 GBytes 874 Mbits/sec
[ 4] 0.0-10.0 sec 500 MBytes 420 Mbits/sec
[ 5] 10.0-20.0 sec 1.08 GBytes 932 Mbits/sec
[ 4] 10.0-20.0 sec 419 MBytes 351 Mbits/sec
[ 4] 20.0-30.0 sec 450 MBytes 378 Mbits/sec
[ 5] 20.0-30.0 sec 1.08 GBytes 932 Mbits/sec
[ 4] 30.0-40.0 sec 453 MBytes 380 Mbits/sec
[ 5] 30.0-40.0 sec 1.08 GBytes 932 Mbits/sec
[ 5] 40.0-50.0 sec 1.08 GBytes 932 Mbits/sec
[ 4] 40.0-50.0 sec 309 MBytes 259 Mbits/sec

Performance_02
-------------------

ubuntu5# bmon
  # Interface RX Rate RX # TX Rate TX #
-------------------------------------------------------------------
ubuntu5 (source: local)
  0 br2 352.00B 5 0.00B 0
  13 eth2 27.59MiB 53660 117.32MiB 82784
  19 vnet1 112.38MiB 3611 26.59MiB 37871
  20 vnet0 169.00B 0 527.00B 7

ubuntu4# iperf -c 172.16.2.123 -t 31536000 -i 10 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.16.2.123, TCP port 5001
TCP window size: 92.4 KByte (default)
------------------------------------------------------------
[ 5] local 172.16.2.134 port 58403 connected with 172.16.2.123 port 5001
[ 4] local 172.16.2.134 port 5001 connected with 172.16.2.123 port 41502
[ 4] 0.0-10.0 sec 1.09 GBytes 934 Mbits/sec
[ 5] 0.0-10.0 sec 206 MBytes 173 Mbits/sec
[ 4] 10.0-20.0 sec 1.09 GBytes 940 Mbits/sec
[ 5] 10.0-20.0 sec 212 MBytes 178 Mbits/sec
[ 4] 20.0-30.0 sec 1.09 GBytes 940 Mbits/sec
[ 5] 20.0-30.0 sec 237 MBytes 199 Mbits/sec
[ 5] 30.0-40.0 sec 260 MBytes 218 Mbits/sec
[ 4] 30.0-40.0 sec 1.09 GBytes 939 Mbits/sec
[ 5] 40.0-50.0 sec 323 MBytes 271 Mbits/sec
[ 4] 40.0-50.0 sec 1.09 GBytes 938 Mbits/sec

Performance_03
-------------------

ubuntu5# bmon
  # Interface RX Rate RX # TX Rate TX #
-------------------------------------------------------------------
ubuntu5 (source: local)
  0 br2 149.00B 2 0.00B 0
  13 eth2 5.96MiB 42154 117.34MiB 81736
  19 vnet1 55.74MiB 1185 2.38MiB 20075
  20 vnet0 56.59MiB 1322 3.47MiB 20275

ubuntu1# iperf -c 172.16.2.122 -t 31536000 -i 10 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.16.2.122, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 5] local 172.16.2.131 port 54191 connected with 172.16.2.122 port 5001
[ 4] local 172.16.2.131 port 5001 connected with 172.16.2.122 port 47752
[ 5] 0.0-10.0 sec 17.8 MBytes 14.9 Mbits/sec
[ 4] 0.0-10.0 sec 686 MBytes 575 Mbits/sec
[ 5] 10.0-20.0 sec 20.5 MBytes 17.2 Mbits/sec
[ 4] 10.0-20.0 sec 666 MBytes 558 Mbits/sec
[ 4] 20.0-30.0 sec 565 MBytes 474 Mbits/sec
[ 5] 20.0-30.0 sec 18.1 MBytes 15.2 Mbits/sec
[ 5] 30.0-40.0 sec 17.4 MBytes 14.6 Mbits/sec
[ 4] 30.0-40.0 sec 563 MBytes 472 Mbits/sec
[ 4] 40.0-50.0 sec 566 MBytes 475 Mbits/sec
[ 5] 40.0-50.0 sec 26.2 MBytes 22.0 Mbits/sec

ubuntu4# iperf -c 172.16.2.123 -t 31536000 -i 10 -d
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Client connecting to 172.16.2.123, TCP port 5001
TCP window size: 150 KByte (default)
------------------------------------------------------------
[ 3] local 172.16.2.134 port 46644 connected with 172.16.2.123 port 5001
[ 5] local 172.16.2.134 port 5001 connected with 172.16.2.123 port 41503
[ 5] 0.0-10.0 sec 512 MBytes 430 Mbits/sec
[ 3] 0.0-10.0 sec 114 MBytes 96.0 Mbits/sec
[ 5] 10.0-20.0 sec 441 MBytes 370 Mbits/sec
[ 3] 10.0-20.0 sec 28.9 MBytes 24.2 Mbits/sec
[ 5] 20.0-30.0 sec 548 MBytes 460 Mbits/sec
[ 3] 20.0-30.0 sec 27.4 MBytes 23.0 Mbits/sec
[ 5] 30.0-40.0 sec 560 MBytes 469 Mbits/sec
[ 3] 30.0-40.0 sec 15.1 MBytes 12.7 Mbits/sec
[ 5] 40.0-50.0 sec 556 MBytes 466 Mbits/sec
[ 3] 40.0-50.0 sec 18.0 MBytes 15.1 Mbits/sec

Tags: kvm-linux
tags: added: kvm-linux
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1079212

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
ITec (itec) wrote :

Since I do not have direct access to the internet on that machine, I had to do:
  apport-cli -f -p linux; apport-cli -f -p qemu-kvm

Revision history for this message
ITec (itec) wrote :
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

I tried to reproduce this on a raring host I had handy and could not. The throughput definately varied, but was not consistently lopsided.

This doesn't mean it is not present in quantal, or that I didn't do something different from you.

Is the vhost_net kernel module loaded? Could you please show the xml for the vms?

Revision history for this message
ITec (itec) wrote :

Thanks for the effort to try to reproduce this.

Yes, vmhost_net is loaded.

ubuntu5# lsmod
Module Size Used by
vhost_net 31873 3
macvtap 18293 1 vhost_net
macvlan 19002 1 macvtap
ipmi_devintf 17521 0
ipmi_si 52889 0
ipmi_msghandler 45243 2 ipmi_devintf,ipmi_si
ip6table_filter 12815 0
ip6_tables 27207 1 ip6table_filter
ebtable_nat 12807 0
ebtables 30671 1 ebtable_nat
ipt_MASQUERADE 12759 3
iptable_nat 13182 1
nf_nat 25254 2 ipt_MASQUERADE,iptable_nat
nf_conntrack_ipv4 14480 4 iptable_nat,nf_nat
nf_defrag_ipv4 12729 1 nf_conntrack_ipv4
xt_state 12578 1
nf_conntrack 82633 5 ipt_MASQUERADE,iptable_nat,nf_nat,nf_conntrack_ipv4,xt_state
ipt_REJECT 12541 4
xt_CHECKSUM 12549 2
iptable_mangle 12695 1
xt_tcpudp 12603 10
iptable_filter 12810 1
ip_tables 26995 3 iptable_nat,iptable_mangle,iptable_filter
x_tables 29711 12 ip6table_filter,ip6_tables,ebtables,ipt_MASQUERADE,iptable_nat,xt_state,ipt_REJECT,xt_CHECKSUM,iptable_mangle,xt_tcpudp,iptable_filter,ip_tables
bridge 90446 0
stp 12931 1 bridge
llc 14552 2 bridge,stp
bonding 107986 0
vesafb 13797 1
kvm_amd 55604 7
kvm 414070 1 kvm_amd
dm_multipath 22828 0
scsi_dh 14554 1 dm_multipath
ghash_clmulni_intel 13180 0
aesni_intel 51037 0
cryptd 20403 2 ghash_clmulni_intel,aesni_intel
aes_x86_64 17208 1 aesni_intel
amd64_edac_mod 23904 0
microcode 22803 0
edac_core 52451 6 amd64_edac_mod
psmouse 95552 0
sp5100_tco 13697 0
edac_mce_amd 23303 1 amd64_edac_mod
i2c_piix4 13167 0
joydev 17457 0
serio_raw 13215 0
fam15h_power 13119 0
k10temp 13126 0
mac_hid 13205 0
w83627ehf 42929 0
hwmon_vid 12783 1 w83627ehf
lp 17759 0
parport 46345 1 lp
xfs 837979 6
hid_generic 12493 0
usbhid 46947 0
hid 100366 2 hid_generic,usbhid
r8169 61650 0
qla2xxx 471919 0
pata_atiixp 13204 0
scsi_transport_fc 58962 1 qla2xxx
igb 129237 0
scsi_tgt 20065 1 scsi_transport_fc
e1000e 199114 0
arcmsr 32514 6
dca 15130 1 igb

Revision history for this message
ITec (itec) wrote :

What could you have done differently?

- Did you run 2 VMs at the same time?
- Is networking on your VMs using virtio?
- Did you start "iperf -s" on both VMs?
- Did you use iperf with "-d" option on two remote hosts to connect to the VMs?
- Are all hosts (KVM hypervisor, and both remote hosts) connected to the same physical network switch by 1Gbit/s?

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 1079212] Re: network slow when 2 VMs using virtio net bridged to same pyhs. network device on kvm host

Quoting ITec (<email address hidden>):

Thanks for the .xml. Nothing stands out there...

> What could you have done differently?

The main thing is that I used raring, not quantal. This could be
somethign which was fixed in either qemu or the kernel.

> - Did you run 2 VMs at the same time?

yup

> - Is networking on your VMs using virtio?

yup

> - Did you start "iperf -s" on both VMs?

yup

> - Did you use iperf with "-d" option on two remote hosts to connect to the VMs?

No, i did so from a single host. I used two laptops connected over an
ethernet cable (with the one beign pxe+dhcp server for the other, my
regular install setup).

> - Are all hosts (KVM hypervisor, and both remote hosts) connected to the same physical network switch by 1Gbit/s?

No switch involved. I'll try using just quantal first on the VM host, but
wouldn't that be interesting if adding a switch caused this!

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Was able to easily reproduce this now with just 'iperf -c <ip> -d". When I ran your longer loop before it never seemed to happen. I believe '-n 10' just made it transfer more data, which smoothed over the inequities here.

[ ID] Interval Transfer Bandwidth
[ 5] 0.0-10.0 sec 20.5 MBytes 17.1 Mbits/sec
[ 4] 0.0-10.0 sec 564 MBytes 471 Mbits/sec

I'm not sure there's really a bug here, as opposed to the host cpu doing the best it can to load-balance too much work. I'll re-try with raring now and look for any differences.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Actually, it seems like this might be an iperf bug. The man page specifically says:

"The threading implementation is rather heinous."

Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
ITec (itec) wrote :

Yes, it seems a lot like an iperf bug.
I did the same test with uperf and got full bandwith. Even bonding works well:

 ubuntu5# bmon
  # Interface RX Rate RX # TX Rate TX #
-------------------------------------------------------------------
ubuntu5 (source: local)
  0 br2 39.00B 0 377.00B 0
  6 bond2 216.04MiB 213299 213.28MiB 179034
  12 vnet0 106.83MiB 35932 107.96MiB 51115
  15 eth1 103.92MiB 103566 102.63MiB 85928
  17 eth2 112.11MiB 109715 110.58MiB 93057
  22 vnet1 99.18MiB 32968 100.49MiB 49117

What I do not know is, why this occurs only with iperf on virtual machines talking to another iperf instance on the network.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

That's a good question :)

Changed in qemu-kvm (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Changed in iperf (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

It would be worth testing this with a container in place of a kvm VM.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.