[OVN] Avoid nb_cfg update notification flooding during agents health check

Bug #1892477 reported by Lucas Alvares Gomes
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Lucas Alvares Gomes

Bug Description

The nb_cfg as a mechanism to "ping" OVN control plane is very useful in many ways. However, the current implementation will trigger update notifications flooding in the whole control plane. Each HV updates to SB the nb_cfg number and all these updates are notified to all the other HVs, which is O(n^2). Although updates are batched in fewers notifications than n^2, it still generates significant load on SB DB and also on ovn-controllers.

In order to solve this problem the core OVN team created a new table called Chassis_Private that holds the private information for the chassis and is conditionally monitored by them. That way, updates to the nb_cfg column will not affect all other hypervisors.

We need to make use of this new mechanism in the OVN driver.

Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

This is the commit addressing the problem: https://review.opendev.org/#/c/707626

Changed in neutron:
importance: Undecided → High
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/707626
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1dddbbfc92b115bfd62e15081134a5cadbc15212
Submitter: Zuul
Branch: master

commit 1dddbbfc92b115bfd62e15081134a5cadbc15212
Author: Lucas Alvares Gomes <email address hidden>
Date: Wed Feb 12 14:36:56 2020 +0000

    [OVN] Use the Chassis_Private table for agents healthcheck

    The core OVN team has introduced a new table called Chassis_Private to
    avoid nb_cfg flooding when checking for the Chassis' status. The OVN
    driver does rely on that mechanism for the agent liveness mechanism.

    This patch makes use of this new table but it's also backward
    compatible.

    For more information, check the core OVN changes at:
    https://patchwork.ozlabs.org/patch/1254394.

    Closes-Bug: #1892477
    Change-Id: Iea4263b852d1e3f81eb2557918ea3cbb7adb8016
    Signed-off-by: Lucas Alvares Gomes <email address hidden>

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/747648

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/747372
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=4a2cc2beecf213960b95240e7d471f671c44bfec
Submitter: Zuul
Branch: stable/ussuri

commit 4a2cc2beecf213960b95240e7d471f671c44bfec
Author: Terry Wilson <email address hidden>
Date: Fri Jun 19 18:30:50 2020 -0500

    Clean up some of the OVN agent API methods

    Adds ControllerAgent and MetadataAgent classes to organize some of
    the OVN Agent API code.

    Note: This is being backported because the change to fix the bug #1892477
    relies on these modifications. As #1892477 is a high priority bug I am
    backporting this work as part of effort to backport the fix for that
    issue.

    Partial-Bug: #1892477
    Change-Id: I9224edab2d27184ed96be1cdea2bd865aadff065
    (cherry picked from commit 6723f4a485e2c60e45f45732c0881c126de06608)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/747648
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c13d6f527c78d1e47b0e2ecd273152fc26f8c0ab
Submitter: Zuul
Branch: stable/ussuri

commit c13d6f527c78d1e47b0e2ecd273152fc26f8c0ab
Author: Lucas Alvares Gomes <email address hidden>
Date: Wed Feb 12 14:36:56 2020 +0000

    [OVN] Use the Chassis_Private table for agents healthcheck

    The core OVN team has introduced a new table called Chassis_Private to
    avoid nb_cfg flooding when checking for the Chassis' status. The OVN
    driver does rely on that mechanism for the agent liveness mechanism.

    This patch makes use of this new table but it's also backward
    compatible.

    For more information, check the core OVN changes at:
    https://patchwork.ozlabs.org/patch/1254394.

    Closes-Bug: #1892477
    Change-Id: Iea4263b852d1e3f81eb2557918ea3cbb7adb8016
    Signed-off-by: Lucas Alvares Gomes <email address hidden>
    (cherry picked from commit 1dddbbfc92b115bfd62e15081134a5cadbc15212)

tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 7.4.0

This issue was fixed in the openstack/networking-ovn 7.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.