[OVN] GW rescheduling mechanism is triggered on every Chassis updated unnecessarily

Bug #1861510 reported by Daniel Alvarez
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Maciej Jozefczyk

Bug Description

Whenever a chassis is updated for whatever reason, we're triggering the rescheduling mechanism [0]. As the current agent liveness check involves updating the Chassis table quite frequently, we should avoid rescheduling gateways for those checks (ie. when either nb_cfg or external_ids change).

[0] https://github.com/openstack/neutron/blob/4689564fa29915b042547bdeb3dcb44bca54e20c/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py#L87

Tags: ovn
Changed in neutron:
importance: Undecided → High
assignee: nobody → Lucas Alvares Gomes (lucasagomes)
Revision history for this message
Maciej Jozefczyk (maciejjozefczyk) wrote :

I think we can remove waiting for UPDATE events there.

If chassis is stopped the corresponding row from SBDB is deleted.
When chassis is killed (like SIGKILL) it remains to be on the list, but should be deleted - something to be added to ovn-core.

Revision history for this message
Maciej Jozefczyk (maciejjozefczyk) wrote :

Humm, not really.

stack@mjozefcz-devstack-neutron-ovn-qos:~/neutron$ ovn-sbctl list chassis
_uuid : 79767b9b-dcf1-45e4-9ec2-81814e060272
encaps : [622301fe-d33e-4363-8c6d-fe6fd24e1fe7]
external_ids : {datapath-type="", iface-types="erspan,geneve,gre,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", "neutron:liveness_check_at"="2020-01-31T16:09:51.166607+00:00", ovn-bridge-mappings="public:br-ex", ovn-chassis-mac-mappings="", ovn-cms-options=enable-chassis-as-gw}
hostname : mjozefcz-devstack-neutron-ovn-qos
name : "62071d0a-9eee-4008-85c1-0697b50ca84c"
nb_cfg : 134
transport_zones : []

If there is a change in external_ids like bridge-mappings or enabled-chassis-as-gw that means the LRP should be moved out from given chassis.

Without checking UPDATES in such cases we potentially would be having lack of lrp candidates.

Maybe we could analyze what happened (what fields were changed), and based on this send rescheduling?

Revision history for this message
Terry Wilson (otherwiseguy) wrote :

Yeah, pretty easy to add a match_fn() that returns false for fields we don't care about (or only returns true for fields we specifically do care about).

Revision history for this message
Terry Wilson (otherwiseguy) wrote :

Maciej: I've put a PoC patch w/ the match_fn change for you to look at here: https://review.opendev.org/#/c/705331/. I haven't tested it other than running the test_router functional tests. But they pass even if we don't do any UPDATE processing--so I'm sure we need new tests. I'm not that familiar with the code for updating l3 gateways/phsynet changes--I just wanted to get an example up for you to look at and feel free to take over or reject since you'll be awake before me on Monday.

Changed in neutron:
status: New → Triaged
tags: added: ovn
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/705660

Changed in neutron:
assignee: Lucas Alvares Gomes (lucasagomes) → Maciej Jozefczyk (maciej.jozefczyk)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/705660
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a1735c46d8aa3a3df50818ce176e0632592924a7
Submitter: Zuul
Branch: master

commit a1735c46d8aa3a3df50818ce176e0632592924a7
Author: Maciej Józefczyk <email address hidden>
Date: Fri Mar 20 11:11:28 2020 +0000

    Don't reschedule hosts unless we need to

    Only reschedule gateways/update segments when things have changed
    that would require those actions.

    Co-Authored-By: Terry Wilson <email address hidden>

    Change-Id: I62f53dbd862c0f38af4a1434d453e97c18777eb4
    Closes-bug: #1861510
    Closes-bug: #1861509

Changed in neutron:
status: In Progress → Fix Released
tags: added: neutron-proactive-backport-potential
tags: removed: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn queens-eol

This issue was fixed in the openstack/networking-ovn queens-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn rocky-eol

This issue was fixed in the openstack/networking-ovn rocky-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.