With many VMs on the same tenant, the L3 ip neigh add is too slow

Bug #1807396 reported by Thomas Goirand
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Won't Fix
Medium
Unassigned

Bug Description

In our setup, we run with DVR, and really a lot of VMs in the same tenant/project (we have currently between 1500 and 2000 VMs). In such setup, the internal function _set_subnet_arp_info of neutron/agent/l3/dvr_local_router.py is taking a way too long. Indeed, what it does is, on each compute node (since we use a Neutron L3 router on each compute), operations like:

ip neigh add

for every VM in the project. As we have both ipv4 and ipv6, the L3 agent does this twice. In our setup, this results in about 4000 Python processes that have to be spawned to execute the "ip neigh add" command. This takes between 20 and 30 minutes, each time we either:

- Add a first VM from the tenant to the host
- Restart the compute node
- Restart the L3 agent

So, there's this issue with "ip neigh add", though there's also the same kind of issue when OVS is doing:

ovs-vsctl add-flows

about 2000 times as well.

So in other words, this doesn't scale, and this needs to be addressed, so that the L3 agent can react in a reasonable mater to operations on the DVRs when there's many VMs in the same project.

Revision history for this message
Ryan Tidwell (ryan-tidwell) wrote :

It's not linked to here in the bug report for some reason, but here's a link to the review in progress https://review.openstack.org/#/c/581360/

Changed in neutron:
assignee: nobody → Thomas Goirand (thomas-goirand)
status: New → In Progress
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
assignee: Thomas Goirand (thomas-goirand) → nobody
status: In Progress → New
tags: added: timeout-abandon
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.openstack.org/581360
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Bug closed due to lack of activity, please feel free to reopen if needed.

Changed in neutron:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.