Large number of FIPs causes slow sync_routers response

Bug #2028185 reported by Adam Oswick
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Adam Oswick

Bug Description

Description
-----------

When in DVR mode, the sync_routers RPC call (specifically the _get_dvr_sync_data function) becomes very slow if there are a large number of FIPs configured for a router.

This appears to be due to it fetching every FIP in the network and then filtering out the ones that are needed within the Python code (and sometimes via additional DB calls) rather than only fetching the required FIPs from the database.

Preconditions
-------------
* Neutron is setup with DVR and multiple hosts
* A network is created with a significant amount of FIPs (1000s should be enough to make this issue visible)

Step by step reproduction steps
--------------------------------
* Restart the neutron_l3_agent and note the time cost logged when calling the sync_routers RPC method

Expected output
---------------
* This RPC method returns in a reasonable amount of time (10s or less)

Actual output
-------------
* This RPC method returns in 40s or more causing unnecessary load on the Neutron server

Version
-------
* OpenStack Zed

Revision history for this message
Adam Oswick (adamoswick) wrote :

Already created https://review.opendev.org/c/openstack/neutron/+/888915 to address this

Created this bug report for tracking/backporting purposes

tags: added: l3-dvr-backlog
Changed in neutron:
assignee: nobody → Adam Oswick (adamoswick)
Changed in neutron:
status: New → In Progress
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/888915
Committed: https://opendev.org/openstack/neutron/commit/96fd203a1461a74318af0e89b3d049f618c32fde
Submitter: "Zuul (22348)"
Branch: master

commit 96fd203a1461a74318af0e89b3d049f618c32fde
Author: Adam Oswick <email address hidden>
Date: Wed Jul 19 12:59:39 2023 +0100

    For hosts in DVR mode, only fetch bound FIPs

    Currently, agents in DVR mode requesting a router update fetch all the
    FIPs on a network from the DB rather than just the FIPs that are
    relevant to the specific host requesting the update.

    While not noticable in smaller networks with a limited number of
    floating IPs, this can add significant overhead in larger networks
    with many FIPs and hosts.

    That overhead comes from Python mapping the responses from the DB into
    objects, making extra DB calls per FIP returned and adding additional
    iterations to the loop in _get_dvr_sync_data. These objects are mostly
    discarded later on and not updated nor included in the RPC response.

    This change ensures that we only fetch FIPs from the DB that are bound
    to the host requesting the update or those which are in a pre-live
    migration state (as they may be migrated to the host in question).

    Closes-Bug: #2028185
    Change-Id: I199b0b1456aa15dadcc24cafc89db1072d224efd

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/neutron/+/891397

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron/+/891398

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 23.0.0.0b3

This issue was fixed in the openstack/neutron 23.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/891398
Committed: https://opendev.org/openstack/neutron/commit/fd06c7366594f218e24d1e13b10bfa0cd96cf360
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit fd06c7366594f218e24d1e13b10bfa0cd96cf360
Author: Adam Oswick <email address hidden>
Date: Wed Jul 19 12:59:39 2023 +0100

    For hosts in DVR mode, only fetch bound FIPs

    Currently, agents in DVR mode requesting a router update fetch all the
    FIPs on a network from the DB rather than just the FIPs that are
    relevant to the specific host requesting the update.

    While not noticable in smaller networks with a limited number of
    floating IPs, this can add significant overhead in larger networks
    with many FIPs and hosts.

    That overhead comes from Python mapping the responses from the DB into
    objects, making extra DB calls per FIP returned and adding additional
    iterations to the loop in _get_dvr_sync_data. These objects are mostly
    discarded later on and not updated nor included in the RPC response.

    This change ensures that we only fetch FIPs from the DB that are bound
    to the host requesting the update or those which are in a pre-live
    migration state (as they may be migrated to the host in question).

    Closes-Bug: #2028185
    Change-Id: I199b0b1456aa15dadcc24cafc89db1072d224efd
    (cherry picked from commit 96fd203a1461a74318af0e89b3d049f618c32fde)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/zed)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: stable/zed
Review: https://review.opendev.org/c/openstack/neutron/+/891397
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.