dvr router update does not scale for many fips

Bug #1413314 reported by Erik Colnick
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Erik Colnick

Bug Description

On an installation with 7 compute nodes and distributed routing enabled, the time it takes to process a sync_routers request on the controller grows linearly with the number of floating ips. With 120 floating ips associated to vm instances distributed across the the compute nodes (all attached to one router), the time to associate or disassociate a floating ip to an instance has been observed to take over 40 seconds (and this with 3 load balanced controller nodes and 10 rpc worker threads and 10 api worker threads configured on each controller node).

Tracing the logs, it is observed that the highest percentage of time spent as the number of floating ips associated to vms increases is in the '_process_floating_ips' method of the l3_dvr_db.py source, which makes multiple DB requests per floating ip. The second longest time is spent in the get_sync_data method itself prior to the call to _process_floating_ips where a call is made to the get_vm_port_hostid method for each floating ip.

Changed in neutron:
assignee: nobody → Erik Colnick (erikcolnick)
Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/150110

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/150576
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f10f1f16777572ca22d43670095c6d73ae9dcce5
Submitter: Jenkins
Branch: master

commit f10f1f16777572ca22d43670095c6d73ae9dcce5
Author: Erik Colnick <email address hidden>
Date: Tue Jan 27 13:21:18 2015 -0700

    Refactor to facilitate DVR scale performance

    To facilitate work for DVR scale performance improvements, refactor
    l3_agentschedulers_db.list_active_sync_routers_on_active_l3_agent
    to pull out logic that will be overridden into the private method
    _get_active_l3_agent_routers_sync_data

    Partial-Bug: 1413314

    Change-Id: Ia454b037529c9b6f3750a67dc0eabb7e01419ea8

Changed in neutron:
milestone: none → kilo-rc1
Kyle Mestery (mestery)
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/150110
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b96a22661290ce2ea747537512eab2fb767679e6
Submitter: Jenkins
Branch: master

commit b96a22661290ce2ea747537512eab2fb767679e6
Author: Erik Colnick <email address hidden>
Date: Fri Jan 23 12:16:28 2015 -0700

    Improve DVR scale performance

    Only process floating ips on a router that are relevant to the agent
    hosting the router (don't process floating ips assigned to a router
    if the associated vm is not hosted on the compute node requesting the
    router sync). In this way, the number of database calls made during
    the DVR router updates is optimized to eliminate unnecessary
    duplication of calls which return the same data or are made to get
    data for routers which are not relevant to the sync_routers request
    from the agent.

    Change-Id: I4e8477bb61ffff164d2f3bbebb94e95a25838ce0
    Partial-Bug: #1413314

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

It looks the two targeted bugs merged. If there are any loose ends, we should raise another, more specific bug and leave this one completed.

Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-rc1 → 2015.1.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (neutron-pecan)

Fix proposed to branch: neutron-pecan
Review: https://review.openstack.org/185072

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/231950

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/juno)

Change abandoned by Jakub Libosvar (<email address hidden>) on branch: stable/juno
Review: https://review.openstack.org/231950
Reason: Abandoned as per reviewers comment - we don't have a tool in this branch to validate the change.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.