OVN metadata agent liveness system generate OVN SBDB usage peak

Bug #1991817 reported by Krzysztof Tomaszewski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Committed
Medium
Krzysztof Tomaszewski

Bug Description

OpenStack Ussuri + OVN 21.06 using Chassis_Private table.

On bigger scale deployments (150+ compute hosts) neutron-ovn-metadata-agent liveness system generates CPU usage peak on OVN Southbound DB system every period of time (agent_down_time / 2). This CPU saturation time can takes dozens of seconds and it introduces a significant latency in OVN service response.

Problem is that every neutron-ovn-metadata-agent is instantly responding on event on SB_Global table and updates it's corresponding Chassis/Chassis_Private table external_ids property.
That generate flood of OVN SBDB updates.

Similar issue can be observed on different neutron agents that are using oslo.messaging system to deliver it's heartbeats (like neutron ovs agent) but in those cases the load generated by liveness system can be distributed in time just by different agent execution time.

neutron-ovn-metadata-agent heartbeat does not rely on the agent execute time but is triggered by general OVN event.

Solution could be to distribute neutron-ovn-metadata-agent heartbeat update time just by postponing it's answer in randomized period of time (where delay time range is not exceeding agent_down_time / 2 parameter).

Revision history for this message
Krzysztof Tomaszewski (labedz) wrote :
description: updated
description: updated
Revision history for this message
Lucas Alvares Gomes (lucasagomes) wrote :

Hi Krzysztof Tomaszewski,

Thanks for reporting this issue and proposing a patch. I am assigning you to the LP as you are currently working on it.

Changed in neutron:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Krzysztof Tomaszewski (labedz)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/860471
Committed: https://opendev.org/openstack/neutron/commit/628442aed7400251f12809a45605bd717f494c4e
Submitter: "Zuul (22348)"
Branch: master

commit 628442aed7400251f12809a45605bd717f494c4e
Author: labedz <email address hidden>
Date: Wed Sep 28 10:42:38 2022 +0000

    Spread OVN metadata agent heartbeat response in time

    To avoid mass response of OVN metadata agents on
    heartbeat update - event on OVN Southbound
    SB_Global table nb_cfg entry increment, this patch postpone
    Chassis/Chassis_Private table update for random number
    of seconds in range of ( cfg.CONF.agent_down_time // 2 ).

    Related-Bug: #1991817
    Change-Id: I6373a3c213b24ec957e4d2ea7fc42524517d10d5

Changed in neutron:
status: Confirmed → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/yoga)

Related fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/892745

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/zed)

Related fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/neutron/+/898939

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/898939
Committed: https://opendev.org/openstack/neutron/commit/a7e91a84aac506504090f02ba40c32ee4f0b6f36
Submitter: "Zuul (22348)"
Branch: stable/zed

commit a7e91a84aac506504090f02ba40c32ee4f0b6f36
Author: labedz <email address hidden>
Date: Wed Sep 28 10:42:38 2022 +0000

    Spread OVN metadata agent heartbeat response in time

    To avoid mass response of OVN metadata agents on
    heartbeat update - event on OVN Southbound
    SB_Global table nb_cfg entry increment, this patch postpone
    Chassis/Chassis_Private table update for random number
    of seconds in range of ( cfg.CONF.agent_down_time // 2 ).

    Related-Bug: #1991817
    Change-Id: I6373a3c213b24ec957e4d2ea7fc42524517d10d5
    (cherry picked from commit 628442aed7400251f12809a45605bd717f494c4e)

tags: added: in-stable-zed
tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/892745
Committed: https://opendev.org/openstack/neutron/commit/373f155cb72d8d5053e0914af07bfc7e24627043
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 373f155cb72d8d5053e0914af07bfc7e24627043
Author: labedz <email address hidden>
Date: Wed Sep 28 10:42:38 2022 +0000

    Spread OVN metadata agent heartbeat response in time

    To avoid mass response of OVN metadata agents on
    heartbeat update - event on OVN Southbound
    SB_Global table nb_cfg entry increment, this patch postpone
    Chassis/Chassis_Private table update for random number
    of seconds in range of ( cfg.CONF.agent_down_time // 2 ).

    Related-Bug: #1991817
    Change-Id: I6373a3c213b24ec957e4d2ea7fc42524517d10d5
    (cherry picked from commit 628442aed7400251f12809a45605bd717f494c4e)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.