So we had discussed this with Son Do and Rodolfo on IRC, adding the details here for visibility:-
- Versions involved:- OpenStack Yoga: OVN 22.03.0, Open vSwitch 2.17.0
- Test vms created with geneve network and security group attached.
- when compute node just have two vms, the bandwidth between them is around 20Gbps
- when compute node have more than 100 vms, the bandwidth reduces to 6Gbps
- bandwidth is reduced irrespective of remote group or remote CIDR is used for security group rules.
- bandwidth is good i.e around 20Gbps when port security is disabled
- bandwidth is good i.e around 20Gbps when stateless security group is used.
So we had discussed this with Son Do and Rodolfo on IRC, adding the details here for visibility:-
- Versions involved:- OpenStack Yoga: OVN 22.03.0, Open vSwitch 2.17.0
- Test vms created with geneve network and security group attached.
- when compute node just have two vms, the bandwidth between them is around 20Gbps
- when compute node have more than 100 vms, the bandwidth reduces to 6Gbps
- bandwidth is reduced irrespective of remote group or remote CIDR is used for security group rules.
- bandwidth is good i.e around 20Gbps when port security is disabled
- bandwidth is good i.e around 20Gbps when stateless security group is used.
We also came across https:/ /developers. redhat. com/articles/ 2022/11/ 17/benchmarking -improved- conntrack- performance- ovs-300# performance_ statistics which mentions significant improvement with ovs 3.0.0 and userspace datapath but reporter is using kernel datapath.
This is something to be checked with OVS/OVN teams if this is a known issue and any improvements are done/planned around it.
@sondx25 if i missed anything feel free to add.