[OpenStack-OVN] Poor network performance when use Security Group

Bug #1996593 reported by Son Do Xuan
32
This bug affects 3 people
Affects Status Importance Assigned to Milestone
networking-ovn
New
Undecided
Unassigned
neutron
Incomplete
Undecided
Unassigned

Bug Description

Hello everyone, We have a critical problem with OpenStack using ML2/OVN.

We deploy two new OpenStack clusters OpenStack (both Yoga and Victoria), neutron using ML2/OVN.

We create a new network Geneve, and create 3 VM on that network. VM name test-1, test2 and gw-hn (See video for detail).

VM test-1 and test-2 in the same compute node and test bandwidth between them use iperf3 and result:
- >= 18Gbps if NO PORT in that network is attached security group
- 6Gbps if just only one port in that network is attached security group

Please see detail in video (https://youtu.be/Xnsyo0DZZO8):
- VM gw-hn is unrelated to VM test-1 and test-2, but if we add a security group for the port of VM gw-hn, bandwidth between VM test-1 and test-2 drops sharply

OpenStack Cluster Version Detail:
- OpenStack Yoga: OVN 22.03.0, Open vSwitch 2.17.0
- OpenStack Victoria: OVN 20.03.2, Open vSwitch 2.13.5

Tags: ovn
Revision history for this message
Bernard Cafarelli (bcafarel) wrote :

Hi, I do not understand which version you installed, is it Yoga or Victoria? (it cannot be both)
Also networking-ovn as a separate project was merged back in neutron in Ussuri timeframe, so I suppose you mean the mechanism driver (ML2/OVN and ML2/OVS in second example).

I suppose this was done on same hardware and environment? Which firewall driver is used with ML2/OVS?

I need to dig up a bit to find relevant numbers, but ML2/OVN versus ML2/OVS should not show a perf degradation

Revision history for this message
Son Do Xuan (sondx25) wrote :

Hi Bernard Cafarelli,
I use Openstack both version Yoga and Victoria, on the same hardware and environment. I use ML2/OVS with firewall_driver = iptables_hybrid.

- OpenStack Yoga: OVN 22.03.0, Open vSwitch 2.17.0
- OpenStack Victoria: OVN 20.03.2, Open vSwitch 2.13.5

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Son Don Xuan:

* Can you specify what SG rule did you create? What type of traffic is used?
* What commands are you executing both in the client and the server?
* Did you try using ML2/OVS firewall_driver=openvswitch?
* What is the VM OS? What is the baremetal OS?

Regards.

Changed in neutron:
status: New → Incomplete
Revision history for this message
Son Do Xuan (sondx25) wrote :

Hello Rodolfo Alonso
- With ML2/OVN, any SG rule is used -> the bandwidth between 2 VM is very low ( I used TCP traffic)
- I use iperf3 -s for the server, and iperf3 -c <IP_address> for the client
- I tried to use ML2/OVS firewall_driver=openvswitch -> traffic is good.
- VM OS is Ubuntu 20.04

summary: - [OpenStack-OVN] Poor network performance
+ [OpenStack-OVN] Poor network performance when use Security Group
description: updated
Revision history for this message
Nguyen Thanh Cong (congnt95) wrote :

Hi Bernard Cafarelli, Rodolfo Alonso,
Please see the bug description again. We just upload a video for detail

description: updated
description: updated
description: updated
Revision history for this message
yatin (yatinkarel) wrote :

So we had discussed this with Son Do and Rodolfo on IRC, adding the details here for visibility:-

- Versions involved:- OpenStack Yoga: OVN 22.03.0, Open vSwitch 2.17.0
- Test vms created with geneve network and security group attached.
- when compute node just have two vms, the bandwidth between them is around 20Gbps
- when compute node have more than 100 vms, the bandwidth reduces to 6Gbps
- bandwidth is reduced irrespective of remote group or remote CIDR is used for security group rules.
- bandwidth is good i.e around 20Gbps when port security is disabled
- bandwidth is good i.e around 20Gbps when stateless security group is used.

We also came across https://developers.redhat.com/articles/2022/11/17/benchmarking-improved-conntrack-performance-ovs-300#performance_statistics which mentions significant improvement with ovs 3.0.0 and userspace datapath but reporter is using kernel datapath.

This is something to be checked with OVS/OVN teams if this is a known issue and any improvements are done/planned around it.

@sondx25 if i missed anything feel free to add.

Revision history for this message
Son Do Xuan (sondx25) wrote :

Hi yatin,
I will test with OVS 3.0.0 and update the results later. Thank you and Rodolfo for your enthusiastic support.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.