[OVN] ARP/MAC handling for routers connected to external network is scaling poorly

Bug #1897095 reported by Krzysztof Klimonda
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
New
High
Unassigned

Bug Description

With current router configuration set by neutron, a number of logical flows in lr_in_arp_resolve seems to have O(n^2) scaling where n is a number of routers connected to the external network, for example this is our test where we created 800 routers (I believe it was 800, and not 400 as stated in the linked discussion):

--8<--8<--8<--
# cat lflows.txt |grep -v Datapath |cut -d'(' -f 2 | cut -d ')' -f1 |sort | uniq -c |sort -n | tail -10
   3264 lr_in_learn_neighbor
   3386 ls_out_port_sec_l2
   4112 lr_in_admission
   4202 ls_in_port_sec_l2
   4898 lr_in_lookup_neighbor
   4900 lr_in_ip_routing
   9144 ls_in_l2_lkup
   9160 ls_in_arp_rsp
  22136 lr_in_ip_input
 671656 lr_in_arp_resolve
#
--8<--8<--8<--

I've opened a review where we set `always_learn_from_arp_request=false` and `dynamic_neigh_routers=true` on all routers, which has a significant impact on a number of logical flows:

--8<--8<--8<--
# cat lflows-new.txt |grep -v Datapath |cut -d'(' -f 2 | cut -d ')' -f1 |sort | uniq -c |sort -n | tail -10
   2170 ls_out_port_sec_l2
   2172 lr_in_learn_neighbor
   2666 lr_in_admission
   2690 ls_in_port_sec_l2
   3190 lr_in_ip_routing
   4276 lr_in_lookup_neighbor
   4873 lr_in_arp_resolve
   5864 ls_in_arp_rsp
   5873 ls_in_l2_lkup
  14343 lr_in_ip_input
# ovn-sbctl --timeout=120 lflow-list > lflows-new.txt
--8<--8<--8<--

There is however some performance penalty, which from my understanding affects east-west traffic between routers - I'm not quite sure how much of an effect it is, and it may be a good idea to make that change optional as mentioned in the mailing list discussion.

See https://mail.openvswitch.org/pipermail/ovs-discuss/2020-May/049994.html and http://lists.openstack.org/pipermail/openstack-discuss/2020-September/017370.html for related discussions.

Changed in neutron:
assignee: nobody → Krzysztof Klimonda (kklimonda)
status: New → In Progress
Revision history for this message
Brian Haley (brian-haley) wrote :
tags: added: ovn
Changed in neutron:
importance: Undecided → High
Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
assignee: Krzysztof Klimonda (kklimonda) → nobody
status: In Progress → New
tags: added: timeout-abandon
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/752678
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.