remove_router_interface doesn't scale well with dvr routers

Bug #1420032 reported by Ed Bak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
ZongKai LI
Juno
Fix Released
Undecided
Unassigned

Bug Description

With dvr enabled , neutron remove-router-interface significantly degrades in response time as the number of l3_agents and the number of routers increases. A significant contributor to the poor performance is due to check_ports_exist_on_l3agent. The call to get_subnet_ids_on_router returns an empty list since the port has already been deleted by this point. The empty subnet list is then used as a filter to the subsequent call core_plugin.get_ports which unexpectedly returns all ports instead of an empty list of ports. Erroneously looping through the entire list of ports is the biggest contributor to the poor scalability.

Ed Bak (ed-bak2)
Changed in neutron:
assignee: nobody → Ed Bak (ed-bak2)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/154289

Changed in neutron:
status: New → In Progress
tags: added: l3-dvr-backlog
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/159623

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Ed Bak (<email address hidden>) on branch: master
Review: https://review.openstack.org/159623
Reason: This approach causes too many issues with other dvr functions.

Changed in neutron:
assignee: Ed Bak (ed-bak2) → ZongKai LI (lzklibj)
Kyle Mestery (mestery)
Changed in neutron:
milestone: none → kilo-3
Revision history for this message
ZongKai LI (zongkai) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/159338
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e99f6e00cfd397bb74d44c9462dfcfa545dbed8c
Submitter: Jenkins
Branch: master

commit e99f6e00cfd397bb74d44c9462dfcfa545dbed8c
Author: lzklibj <email address hidden>
Date: Wed Feb 25 21:19:07 2015 -0800

    fix check_ports_exist_on_l3agent in no subnet case

    If no subnets attached to the given router, this check
    should return False.

    Currently, if no subnets attached to given router, the
    following process in this method will fetch all ports
    to continue its checking, consider those ports are not
    related to the given router, the following checking
    should be invalid.

    To issue #1378066, after running "router-gateway-clear",
    _schedule_router will be triggered, and the invalid
    checking will make processing in get_candidates believe
    that all l3-agents are valid to schedule this router,
    and finally, invalid records are inserted into table
    RouterL3AgentBindings.

    Closes-Bug: #1378066
    Closes-Bug: #1417908
    Closes-Bug: #1420032

    Change-Id: If96d866c831330cca68a5fe5a0f27f178bbf40a6

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
ZongKai LI (zongkai) wrote :
Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Changed in neutron:
milestone: kilo-3 → kilo-rc1
importance: Undecided → High
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

The initial patch had to be reverted. New submit: https://review.openstack.org/#/c/154289/

Changed in neutron:
status: Fix Released → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/154289
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=760fe6a8fabc921e75367b5f02bab4fc326b8115
Submitter: Jenkins
Branch: master

commit 760fe6a8fabc921e75367b5f02bab4fc326b8115
Author: Ed Bak <email address hidden>
Date: Mon Feb 9 23:13:18 2015 +0000

    Return from check_ports_exist_on_l3agent if no subnet found

    The call to get_subnet_ids_on_router can return an empty list.
    If the subnet_ids list is empty, the subsequent call to get
    the ports on a subnet returns all ports. If this occurs
    when doing a remove_router_interface, the performance
    of a remove_router_interface degrades significantly. This change
    returns immediately from check_ports_exist_on_l3agents if no
    subnet is found. A new unit test has been added to cover
    the specific case of returning immediately without calling
    get_ports when a remove_router_interface operation is performed.

    Change-Id: I247d3bae152ab4f8ab7e00bd24d878eb08dca1ba
    Closes-Bug: #1420032
    Depends-On: I15bbf16fd4378c6431e9da8942d0968e7a012a91

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/172234

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/juno)

Reviewed: https://review.openstack.org/172234
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=96d472d732b0b484e97e644abc79e6a4c6855526
Submitter: Jenkins
Branch: stable/juno

commit 96d472d732b0b484e97e644abc79e6a4c6855526
Author: Ed Bak <email address hidden>
Date: Mon Feb 9 23:13:18 2015 +0000

    Return from check_ports_exist_on_l3agent if no subnet found

    The call to get_subnet_ids_on_router can return an empty list.
    If the subnet_ids list is empty, the subsequent call to get
    the ports on a subnet returns all ports. If this occurs
    when doing a remove_router_interface, the performance
    of a remove_router_interface degrades significantly. This change
    returns immediately from check_ports_exist_on_l3agents if no
    subnet is found. A new unit test has been added to cover
    the specific case of returning immediately without calling
    get_ports when a remove_router_interface operation is performed.

    This allows the DVR job on stable/juno to go back to normal.

    Conflicts:
            neutron/tests/unit/test_l3_schedulers.py

    Change-Id: I247d3bae152ab4f8ab7e00bd24d878eb08dca1ba
    Closes-Bug: #1420032
    Depends-On: I15bbf16fd4378c6431e9da8942d0968e7a012a91
    (cherry picked from commit 760fe6a8fabc921e75367b5f02bab4fc326b8115)

tags: added: in-stable-juno
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-rc1 → 2015.1.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (neutron-pecan)

Fix proposed to branch: neutron-pecan
Review: https://review.openstack.org/185072

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.