[ovn] Agent liveness checks create too many writes into OVN db

Bug #1883554 reported by Daniel Alvarez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Terry Wilson

Bug Description

Every time the agent liveness check is triggered (via API or periodically every agent_down_time / 2 seconds), there are a lot of writes into the SB database on the Chassis table.
These writes triggers recomputation on ovn-controller running in all nodes having a considerable performance hit, especially under stress.

After this commit was merged [0] we avoided bumping nb_cfg too frequently but still we're performing writes into the Chassis table to often, from all the workers.

We should use the same logic in [1] to avoid writes that have happened recently.

[0] https://opendev.org/openstack/neutron/commit/647b7f63f9dafedfa9fb6e09e3d92d66fb512f0b
[1] https://github.com/openstack/neutron/blob/4de18104ae88a835544cefbf30c878aa49efc31f/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py#L1075

tags: added: ovn
Revision history for this message
Daniel Alvarez (dalvarezs) wrote :
Download full text (4.6 KiB)

Example:

{"Chassis":{"5fc4cb32-5c72-436c-9762-fffda5fc166f":{"nb_cfg":8760}},"_date":1592234432031,"_comment":"ovn-controller: registering chassis '3cab3e81-7cd1-45b3-b99a-4f21e626df6e'"}
{"Chassis":{"5fc4cb32-5c72-436c-9762-fffda5fc166f":{"external_ids":["map",[["datapath-type","system"],["iface-types","erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan"],["is-interconn","false"],["neutron-metadata-proxy-networks",""],["neutron:liveness_check_at","2020-06-15T15:20:03.666124+00:00"],["neutron:metadata_liveness_check_at","2020-06-15T15:20:03.672324+00:00"],["neutron:ovn-metadata-id","f9240536-3953-4298-a455-dd1fdc8dd123"],["neutron:ovn-metadata-sb-cfg","8760"],["ovn-bridge-mappings","datacentre:br-ex,tenant:br-isolated"],["ovn-chassis-mac-mappings",""],["ovn-cms-options",""]]]}},"_date":1592234432034}
{"Chassis":{"5fc4cb32-5c72-436c-9762-fffda5fc166f":{"external_ids":["map",[["datapath-type","system"],["iface-types","erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan"],["is-interconn","false"],["neutron-metadata-proxy-networks",""],["neutron:liveness_check_at","2020-06-15T15:20:32.062659+00:00"],["neutron:metadata_liveness_check_at","2020-06-15T15:20:03.672324+00:00"],["neutron:ovn-metadata-id","f9240536-3953-4298-a455-dd1fdc8dd123"],["neutron:ovn-metadata-sb-cfg","8760"],["ovn-bridge-mappings","datacentre:br-ex,tenant:br-isolated"],["ovn-chassis-mac-mappings",""],["ovn-cms-options",""]]]}},"_date":1592234432064}
{"Chassis":{"5fc4cb32-5c72-436c-9762-fffda5fc166f":{"external_ids":["map",[["datapath-type","system"],["iface-types","erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan"],["is-interconn","false"],["neutron-metadata-proxy-networks",""],["neutron:liveness_check_at","2020-06-15T15:20:32.062659+00:00"],["neutron:metadata_liveness_check_at","2020-06-15T15:20:32.067323+00:00"],["neutron:ovn-metadata-id","f9240536-3953-4298-a455-dd1fdc8dd123"],["neutron:ovn-metadata-sb-cfg","8760"],["ovn-bridge-mappings","datacentre:br-ex,tenant:br-isolated"],["ovn-chassis-mac-mappings",""],["ovn-cms-options",""]]]}},"_date":1592234432069}
{"Chassis":{"5fc4cb32-5c72-436c-9762-fffda5fc166f":{"external_ids":["map",[["datapath-type","system"],["iface-types","erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan"],["is-interconn","false"],["neutron-metadata-proxy-networks",""],["neutron:liveness_check_at","2020-06-15T15:20:33.821259+00:00"],["neutron:metadata_liveness_check_at","2020-06-15T15:20:32.067323+00:00"],["neutron:ovn-metadata-id","f9240536-3953-4298-a455-dd1fdc8dd123"],["neutron:ovn-metadata-sb-cfg","8760"],["ovn-bridge-mappings","datacentre:br-ex,tenant:br-isolated"],["ovn-chassis-mac-mappings",""],["ovn-cms-options",""]]]}},"_date":1592234433823}
{"Chassis":{"5fc4cb32-5c72-436c-9762-fffda5fc166f":{"external_ids":["map",[["datapath-type","system"],["iface-types","erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan"],["is-interconn","false"],["neutron-metadata-proxy-networks",""],["neutron:liveness_check_at","2020-06-15T15:20:33.821259+00:00"],["neutron:metadata_liveness_check_at","2020-06-15T15:20:33.828736+00:00"],["neutron:ovn-m...

Read more...

Changed in neutron:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/735618

Changed in neutron:
assignee: nobody → Daniel Alvarez (dalvarezs)
status: Confirmed → In Progress
Changed in neutron:
assignee: Daniel Alvarez (dalvarezs) → Terry Wilson (otherwiseguy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/735618
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e09d4d6dd0f6570400d28531e56d2166141357cb
Submitter: Zuul
Branch: master

commit e09d4d6dd0f6570400d28531e56d2166141357cb
Author: Daniel Alvarez <email address hidden>
Date: Mon Jun 15 17:17:53 2020 +0200

    [OVN] Avoid unnecessary DB writes during agent liveness check

    As stated in the bug description, there are many writes of the
    agent liveness external_ids into the Chassis table. There was a
    protection to avoid bumping nb_cfg too frequently.

    The same protection is reused to avoid writing into the Chassis
    external_ids.

    This patch reduces the number of transactions to the SB database
    and, therefore, the recomputations that it causes to ovn-controller
    in all nodes.

    Change-Id: I5db90fde8e7394772ec23c6384c711096c246621
    Closes-Bug: #1883554
    Signed-off-by: Daniel Alvarez <email address hidden>

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/737252

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/737252
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=2ba53430f1841c2c5e741073b5180799f6225d2d
Submitter: Zuul
Branch: stable/ussuri

commit 2ba53430f1841c2c5e741073b5180799f6225d2d
Author: Daniel Alvarez <email address hidden>
Date: Mon Jun 15 17:17:53 2020 +0200

    [OVN] Avoid unnecessary DB writes during agent liveness check

    As stated in the bug description, there are many writes of the
    agent liveness external_ids into the Chassis table. There was a
    protection to avoid bumping nb_cfg too frequently.

    The same protection is reused to avoid writing into the Chassis
    external_ids.

    This patch reduces the number of transactions to the SB database
    and, therefore, the recomputations that it causes to ovn-controller
    in all nodes.

    Change-Id: I5db90fde8e7394772ec23c6384c711096c246621
    Closes-Bug: #1883554
    Signed-off-by: Daniel Alvarez <email address hidden>
    (cherry picked from commit e09d4d6dd0f6570400d28531e56d2166141357cb)

tags: added: in-stable-ussuri
tags: added: neutron-proactive-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.