On an installation with 7 compute nodes and distributed routing enabled, the time it takes to process a sync_routers request on the controller grows linearly with the number of floating ips. With 120 floating ips associated to vm instances distributed across the the compute nodes (all attached to one router), the time to associate or disassociate a floating ip to an instance has been observed to take over 40 seconds (and this with 3 load balanced controller nodes and 10 rpc worker threads and 10 api worker threads configured on each controller node).
Tracing the logs, it is observed that the highest percentage of time spent as the number of floating ips associated to vms increases is in the '_process_floating_ips' method of the l3_dvr_db.py source, which makes multiple DB requests per floating ip. The second longest time is spent in the get_sync_data method itself prior to the call to _process_floating_ips where a call is made to the get_vm_port_hostid method for each floating ip.
Fix proposed to branch: master /review. openstack. org/150110
Review: https:/