Batch DVR ARP updates
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
The L3 agent currently issues ARP updates one at a time while processing a DVR router. Each ARP update creates an external process which has to call the neutron-rootwrap helper while also "ip netns exec <qrouter namespace>" -ing each time.
The ip command contains a "-batch <FILENAME>" option which would be able to batch all of the "ip neigh replace" commands into one external process per qrouter namespace. This would greatly reduce the amount of time it takes the L3 agent to update large numbers of ARP entries, particularly as the number of VMs in a deployment rises.
The benefit of batching ip commands can be seen in this simple bash example:
$ time for i in {0..50}; do sudo ip netns exec qrouter-
real 0m2.437s
user 0m0.183s
sys 0m0.359s
$ for i in {0..50}; do echo a >> /tmp/ip_batch_test; done
$ time sudo ip netns exec qrouter-
real 0m0.046s
user 0m0.003s
sys 0m0.007s
If just 50 arp updates are batched together, there is about a 50x speedup. Repeating this test with 500 commands showed a speedup of 250x (disclaimer: this was a rudimentary test just to get a rough estimate of the performance benefit).
Note: see comments #1-3 for less-artificial performance data.
Changed in neutron: | |
assignee: | nobody → Rawlin Peters (rawlin-peters) |
Changed in neutron: | |
status: | New → In Progress |
description: | updated |
tags: | added: neutron-proactive-backport-potential |
Here is some initial less-artificial performance data gathered in the manner of the following diff and restarting the L3 agent 10 times:
diff --git a/neutron/ agent/l3/ agent.py b/neutron/ agent/l3/ agent.py agent/l3/ agent.py agent/l3/ agent.py firewall_ l3_agent. FWaaSL3AgentRpc Callback,
continue
index 8191c5a..428ee79 100644
--- a/neutron/
+++ b/neutron/
@@ -498,7 +498,12 @@ class L3NATAgent(
try:
self. _process_ router_ if_compatible( router) CompatibleWithA gent as e:
LOG. exception( e.msg)
+ import time
+ t0 = time.time()
+ t1 = time.time()
+ delta = t1 - t0
+ LOG.debug("RAWLIN: _process_router delta: %s" % delta)
except n_exc.RouterNot
# Was the router previously handled by this agent?
WITH batched ARP updates:
4.00962495804
4.05432415009
3.92502999306
3.85153913498
3.89367389679
3.91031813622
3.93485879898
3.99531412125
3.91884207726
3.98265600204
Average: 3.94761812687
WITHOUT batched ARP updates:
4.10144209862
4.33488178253
4.28370594978
4.1496078968
4.27167916298
4.32324385643
4.16499876976
3.97995710373
4.2998650074
4.12419891357
Average: 4.20335805416
Batching the ARP updates saves about .26 seconds here, and this was on a devstack with only 5 nova instances with fips (on one net/subnet attached to one router).