Fuel for OpenStack

VM-to-VM 4 Gbps, expected 9-10

Bug #1445684 reported by Gregory Elkinbard on 2015-04-17

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	High	Sergey Vasilenko	Fuel for OpenStack 6.1
	6.0.x	Invalid	High	Fuel Library (Deprecated)	Fuel for OpenStack 6.0-updates

Bug Description

Scale lab has done a series of performance tests on the network.

Here is the link to the data from these runs:
http://mos-scale.vm.mirantis.net/test_results/build_6.1-192/jenkins-11_env_run_shaker-8-MSK-2015-03-26-13:50:21-2015-03-26-14:46:38/l2.html

As users run VMs for such services as LBaaS / other things in VM which pass traffic to the rest of VMs, there is a requirement to ensure that in VLAN mode (no GRE) we can get up to 9-10Gpbs in one direction, and if run iperf in both, then in sum around 19 Gbps.

See original description

Tags:

Mike Scherbakov (mihgen) on 2015-04-17

summary:

- Mirantis OpenStack Network Performance is 50% bellow wire speed
+ Fuel Network Performance is 50% bellow wire speed

Mike Scherbakov (mihgen) on 2015-04-18

summary:	- Fuel Network Performance is 50% bellow wire speed + VM-to-VM 4 Gpbs, expected 9-10
Changed in fuel:
milestone:	none → 6.1
description:	updated

Mike Scherbakov (mihgen) on 2015-04-18

summary:

- VM-to-VM 4 Gpbs, expected 9-10
+ VM-to-VM 4 Gbps, expected 9-10

Revision history for this message

Nastya Urlapova (aurlapova) wrote on 2015-04-20:

Sergey, could you provide your opinion about this issue or we have to consult with Alex Ignatov about it?

Changed in fuel:
importance:	Undecided → High
assignee:	nobody → Sergey Vasilenko (xenolog)

Vladimir Kuklin (vkuklin) on 2015-04-20

Changed in fuel:
status:	New → Confirmed

Revision history for this message

Aleksandr Shaposhnikov (alashai8) wrote on 2015-04-20:

Affected versions:
6.0 Release; 6.1 build 309 also tested and have the same behavior.

Steps to reproduce:

1. Deploy openstack cluster using ubuntu+ha+vlan+neutron on cluster. Assumed that there will be at least one 10Gb NIC per compute for tenant networks.
2. Start two VMs on different computes.
3. Add required rules to the security group. Allow all tcp/(optionally udp if udp also needed to be tested) traffic in tenant or go with specific tcp/udp ports (5000,5001)
3. Run iperf -s on one of them and iperf -c <IP of first VM> on the second one.

Observed behavior:
Speed will be around 4-4.5Gbit per second.

Expected behavior:
Sales team would like to see 90-95% of wire speed here.

Bogdan Dobrelya (bogdando) on 2015-05-12

tags:	added: l23network
tags:	added: scale

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2015-05-13:

All the related fixes have been merged:

https://review.openstack.org/#/c/181397/
https://review.openstack.org/#/c/181134/
https://review.openstack.org/#/c/182571/

Test results with these options applied show almost wire speed performance

I am marking this bug as Fix committed

Changed in fuel:
status:	Confirmed → Fix Committed

Revision history for this message

Sergey Vasilenko (xenolog) wrote on 2015-05-13:

While relative testing network performance between 6.0 and 6.1 on LAB with 2 compute nodes and 10GE switch, we achieved following results for VM2VM cases:

6.0-centos, MTU:1500 -- 2.9 gbit/s
6.0-centos, MTU:9000 -- 7.7 gbit/s

6.0-ubuntu, MTU:1500 -- 7.6 gbit/s
6.0-ubuntu, MTU:9000 -- 9.8 gbit/s

6.1-ubuntu, MTU:1500 -- 9.2 gbit/s
6.1-ubuntu, MTU:9000 -- 9.8 gbit/s

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-05-14:

Folks, we should double check results. As related bug shows https://bugs.launchpad.net/fuel/+bug/1447619/comments/33 we have some places in networking configuration we should apply fixes. Alexander Nevenchaniy and Sergii Golovatiuk may provide more details as they had discovered these issues. Briefly:
* there are poor IRQ sterring setup exist,
* some interfaces/bridges have txqueuelen:0, which impacts CPU load very much!
* rx/tx queue length for HW interfaces configured in non optimal way

Revision history for this message

Alexander Nevenchannyy (anevenchannyy) wrote on 2015-05-14:

Folks,

1) We need to increase txqueuelen at some bridges at least to 1000.
For example:
p_br-prv-0 Link encap:Ethernet HWaddr ca:a0:28:19:8d:bd
          inet6 addr: fe80::c8a0:28ff:fe19:8dbd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:152526367 errors:0 dropped:7590012 overruns:0 frame:0
          TX packets:214336421 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:177639911221 (177.6 GB) TX bytes:256069399236 (256.0 GB)

For my configuration: p_br-prv-0, p_br-floating-0, eth2.140, br-storage, br-prv, br-int, br-fw-admin, br-floating, br-ex.

2) At high network load we have 1 CPU usage at 100% (ksoftiqrqd by net_rx function) , so we need to enable packet steering
for i in `seq 0 *CPU_NUM*`; do echo ff > /sys/class/net/eth2/queues/rx-$i/rps_cpus ; done
More info at: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/network-rps.html
3) As we spoke with Vova Kuklin need to increase net.core.netdev_max_backlog at least to 262144
4) At this moment 10Gb/s ethernet cards have txqueuelen 512, we must to setup this at maximum that support hardware.

Bogdan Dobrelya (bogdando) on 2015-05-14

Changed in fuel:
status:	Fix Committed → Confirmed

Vladimir Kuklin (vkuklin) on 2015-05-14

Changed in fuel:
status:	Confirmed → Fix Committed

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2015-05-14:

We decided to submit a separate bug

Revision history for this message

Leontii Istomin (listomin) wrote on 2015-06-15:

1. switches have been configured with jumbo frames.
2. set mtu=9000 using Fuel UI
3. use 10G dedicated interface for private network.
Test:
netperf-wrapper -H 10.2.0.9 -l 60 -s 1 -f csv tcp_download
Result:
Ping (ms) ICMP:
  max: 1.46470810209
  mean: 1.032960349805072
  unit: ms
  min: 0.39013074506
TCP download:
  max: 9864.03160586
  mean: 9464.499048547048
  unit: Mbps
  min: 8619.53629759

tested with 521 build

Changed in fuel:
status:	Fix Committed → Fix Released

Revision history for this message

Leontii Istomin (listomin) wrote on 2015-06-15:

#10

continue to improve here https://bugs.launchpad.net/fuel/+bug/1463908

Revision history for this message

Gregory Elkinbard (gelkinbard) wrote on 2015-06-15:

#11

please test 64 byte and 200 byte (voice) packets.
please provide the total packet counts in millions of packets per sec
which OVS can handle.

Revision history for this message

Jay Pipes (jaypipes) wrote on 2015-06-16:

#12

Greg, what does this have to do with OVS?

Revision history for this message

Gregory Elkinbard (gelkinbard) wrote on 2015-06-16:

#13

Jay, we have different types of workloads on openstack. so a test of just 1500 byte web packets and 9000 storage packets is not enough. NFV workloads will experience short frames, especially NFV workloads which serve voice traffic. I've asked to add several other packet frame sizes which are routinely tested by other vendors. So we see how openstack and specifically ovs performs.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1445683

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.