With many VMs on the same tenant, the L3 ip neigh add is too slow
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Won't Fix
|
Medium
|
Unassigned |
Bug Description
In our setup, we run with DVR, and really a lot of VMs in the same tenant/project (we have currently between 1500 and 2000 VMs). In such setup, the internal function _set_subnet_
ip neigh add
for every VM in the project. As we have both ipv4 and ipv6, the L3 agent does this twice. In our setup, this results in about 4000 Python processes that have to be spawned to execute the "ip neigh add" command. This takes between 20 and 30 minutes, each time we either:
- Add a first VM from the tenant to the host
- Restart the compute node
- Restart the L3 agent
So, there's this issue with "ip neigh add", though there's also the same kind of issue when OVS is doing:
ovs-vsctl add-flows
about 2000 times as well.
So in other words, this doesn't scale, and this needs to be addressed, so that the L3 agent can react in a reasonable mater to operations on the DVRs when there's many VMs in the same project.
Changed in neutron: | |
assignee: | nobody → Thomas Goirand (thomas-goirand) |
status: | New → In Progress |
Changed in neutron: | |
importance: | Undecided → Medium |
It's not linked to here in the bug report for some reason, but here's a link to the review in progress https:/ /review. openstack. org/#/c/ 581360/