OVN revision_number infinite update loop

Bug #1973347 reported by Renat Nurgaliyev
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
New
Undecided
Unassigned
Ussuri
New
Undecided
Unassigned
Victoria
New
Undecided
Unassigned
Wallaby
Fix Released
Undecided
Unassigned
neutron
Fix Released
High
Rodolfo Alonso
neutron (Ubuntu)
New
Undecided
Unassigned
Focal
New
Undecided
Unassigned
Jammy
Fix Released
Undecided
Unassigned

Bug Description

After the change described in https://mail.openvswitch.org/pipermail/ovs-dev/2022-May/393966.html was merged and released in stable OVN 22.03, there is a possibility to create an endless loop of revision_number update in external_ids of ports and router_ports. We have confirmed the bug in Ussuri and Yoga. When the problem happens, the Neutron log would look like this:

2022-05-13 09:30:56.318 25 ... Successfully bumped revision number for resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: router_ports) to 4815
2022-05-13 09:30:56.366 25 ... Running txn n=1 command(idx=0): CheckRevisionNumberCommand(...)
2022-05-13 09:30:56.367 25 ... Running txn n=1 command(idx=1): SetLSwitchPortCommand(...)
2022-05-13 09:30:56.367 25 ... Running txn n=1 command(idx=2): PgDelPortCommand(...)
2022-05-13 09:30:56.467 25 ... Successfully bumped revision number for resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: ports) to 4815
2022-05-13 09:30:56.880 25 ... Running txn n=1 command(idx=0): CheckRevisionNumberCommand(...)
2022-05-13 09:30:56.881 25 ... Running txn n=1 command(idx=1): UpdateLRouterPortCommand(...)
2022-05-13 09:30:56.881 25 ... Running txn n=1 command(idx=2): SetLRouterPortInLSwitchPortCommand(...)
2022-05-13 09:30:56.984 25 ... Successfully bumped revision number for resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: router_ports) to 4816
2022-05-13 09:30:57.057 25 ... Running txn n=1 command(idx=0): CheckRevisionNumberCommand(...)
2022-05-13 09:30:57.057 25 ... Running txn n=1 command(idx=1): SetLSwitchPortCommand(...)
2022-05-13 09:30:57.058 25 ... Running txn n=1 command(idx=2): PgDelPortCommand(...)
2022-05-13 09:30:57.159 25 ... Successfully bumped revision number for resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: ports) to 4816
2022-05-13 09:30:57.523 25 ... Running txn n=1 command(idx=0): CheckRevisionNumberCommand(...)
2022-05-13 09:30:57.523 25 ... Running txn n=1 command(idx=1): UpdateLRouterPortCommand(...)
2022-05-13 09:30:57.524 25 ... Running txn n=1 command(idx=2): SetLRouterPortInLSwitchPortCommand(...)
2022-05-13 09:30:57.627 25 ... Successfully bumped revision number for resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: router_ports) to 4817
2022-05-13 09:30:57.674 25 ... Running txn n=1 command(idx=0): CheckRevisionNumberCommand(...)
2022-05-13 09:30:57.674 25 ... Running txn n=1 command(idx=1): SetLSwitchPortCommand(...)
2022-05-13 09:30:57.675 25 ... Running txn n=1 command(idx=2): PgDelPortCommand(...)
2022-05-13 09:30:57.765 25 ... Successfully bumped revision number for resource 8af189bd-c5bf-48a9-b072-3fb6c69ae592 (type: ports) to 4817

(full version here: https://pastebin.com/raw/NLP1b6Qm).

In our lab environment we have confirmed that the problem is gone after mentioned change is rolled back.

tags: added: ovn
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Renat:

I've installed OVN 22.03.
  ovn-nbctl 22.03.90
  Open vSwitch Library 2.17.90
  (NB) DB Schema 6.3.0
  (SB) DB Schema 20.22.0

I don't see this loop in my environment. I've create a router and then added a subnet to this router. The NB logical_router_port is [1] and the DB port_binding related register [2]. I see how the "external_ids" of both registers match (thus I guess the OVN patch is in place).

I've tried updating the LRP changing the name, for example. I see how the revision number is bumped in both "external_ids" but I don't see any loop in the Neutron code.

Do you have a reproducer?

Regards.

[1]https://paste.opendev.org/show/bAQhY4HJFvk469DJ8pr4/
[2]https://paste.opendev.org/show/bAqJLFWnxztxKcGiRjxe/

Revision history for this message
Renat Nurgaliyev (rnurgaliyev) wrote :

Hello Rodolfo,

thanks for having a look into this issue. I could see in my experiments that the loop starts when an OpenStack router is configured with an external gateway network. After a port is attached to the router, we see the behavior described above. Maybe it is important to mention that in our lab setup we have a flat provider external network, with two gateway nodes, both of which are configured in OVN with enable-chassis-as-gw in ovn-cms-options.

If it still does not reveal the issue, I will make a clean reproducing setup today or tomorrow and will share it with you.

Thanks!

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Renat:

Confirmed: when the external GW is added to the router, the Neutron server starts this infinite loop you described: https://paste.opendev.org/show/bC9lC6yIb1wc5gl3hezP/

I'll check what is happening there.

Regards.

Changed in neutron:
importance: Undecided → High
status: New → Incomplete
status: Incomplete → Confirmed
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/842147

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/842147
Committed: https://opendev.org/openstack/neutron/commit/32e8303b3b21e047abcf365c3999cb7379467b0c
Submitter: "Zuul (22348)"
Branch: master

commit 32e8303b3b21e047abcf365c3999cb7379467b0c
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Fri May 13 00:54:27 2022 +0000

    Skip "PortBindingChassisEvent" if revision number changes

    Since [1], the "external_ids" of the NB Logical_Router_Port register
    are copied into the SB Port_Binding "external_ids". When a change
    in a Port_Binding register is received, if only the
    "external_ids:revision_number" is changed, we skip any update on the
    related Logical_Router_Port.

    If not, that will lead to an infinite loop: Neutron will update
    the Logical_Router_Port with a new revision number and OVN will
    copy this new revision number to the SB register, triggering again
    the update of the NB Logical_Router_Port

    [1]https://<email address hidden>/msg62836.html

    Closes-Bug: #1973347
    Change-Id: Ib51764778a666050c42de0dfeb9bf9b185d44bb7

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/845547

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/845550

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/yoga)

Related fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/845551

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/845552

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/xena)

Related fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/845553

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/845555

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/845556

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/845547
Committed: https://opendev.org/openstack/neutron/commit/6890204765c5de1a91284b9b0b6bf0565673f53f
Submitter: "Zuul (22348)"
Branch: master

commit 6890204765c5de1a91284b9b0b6bf0565673f53f
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu May 19 22:02:04 2022 +0000

    Move ``PortBindingChassisEvent`` checks to ``match_fn``

    Moved the event checks to ``match_fn`` method, that is the correct
    place to execute them, before the ``run`` method is called.

    This is a follow-up of
    https://review.opendev.org/c/openstack/neutron/+/842147.

    Related-Bug: #1973347
    Change-Id: I3b7c5d73d2b0d20fb06527ade30af8939b249d75

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/845550
Committed: https://opendev.org/openstack/neutron/commit/b7fb963d91af8247c5624d7f1753d5969658b616
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit b7fb963d91af8247c5624d7f1753d5969658b616
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Fri May 13 00:54:27 2022 +0000

    Skip "PortBindingChassisEvent" if revision number changes

    Since [1], the "external_ids" of the NB Logical_Router_Port register
    are copied into the SB Port_Binding "external_ids". When a change
    in a Port_Binding register is received, if only the
    "external_ids:revision_number" is changed, we skip any update on the
    related Logical_Router_Port.

    If not, that will lead to an infinite loop: Neutron will update
    the Logical_Router_Port with a new revision number and OVN will
    copy this new revision number to the SB register, triggering again
    the update of the NB Logical_Router_Port

    [1]https://<email address hidden>/msg62836.html

    Conflicts:
        neutron/tests/functional/base.py

    Closes-Bug: #1973347
    Change-Id: Ib51764778a666050c42de0dfeb9bf9b185d44bb7
    (cherry picked from commit 32e8303b3b21e047abcf365c3999cb7379467b0c)

tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/845555
Committed: https://opendev.org/openstack/neutron/commit/511f09215f2b25e6a7d1b648f3c4ddbd17937e0a
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 511f09215f2b25e6a7d1b648f3c4ddbd17937e0a
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Fri May 13 00:54:27 2022 +0000

    Skip "PortBindingChassisEvent" if revision number changes

    Since [1], the "external_ids" of the NB Logical_Router_Port register
    are copied into the SB Port_Binding "external_ids". When a change
    in a Port_Binding register is received, if only the
    "external_ids:revision_number" is changed, we skip any update on the
    related Logical_Router_Port.

    If not, that will lead to an infinite loop: Neutron will update
    the Logical_Router_Port with a new revision number and OVN will
    copy this new revision number to the SB register, triggering again
    the update of the NB Logical_Router_Port

    [1]https://<email address hidden>/msg62836.html

    Conflicts:
        neutron/tests/functional/base.py
        neutron/tests/functional/plugins/ml2/drivers/ovn/mech_driver/ovsdb/test_ovsdb_monitor.py

    Closes-Bug: #1973347
    Change-Id: Ib51764778a666050c42de0dfeb9bf9b185d44bb7
    (cherry picked from commit 32e8303b3b21e047abcf365c3999cb7379467b0c)
    (cherry picked from commit b7fb963d91af8247c5624d7f1753d5969658b616)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/845551
Committed: https://opendev.org/openstack/neutron/commit/987cb8ab3ae11c9f9c85f5d6738eb9a90508d46b
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 987cb8ab3ae11c9f9c85f5d6738eb9a90508d46b
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu May 19 22:02:04 2022 +0000

    Move ``PortBindingChassisEvent`` checks to ``match_fn``

    Moved the event checks to ``match_fn`` method, that is the correct
    place to execute them, before the ``run`` method is called.

    This is a follow-up of
    https://review.opendev.org/c/openstack/neutron/+/842147.

    Related-Bug: #1973347
    Change-Id: I3b7c5d73d2b0d20fb06527ade30af8939b249d75
    (cherry picked from commit 6890204765c5de1a91284b9b0b6bf0565673f53f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/845556
Committed: https://opendev.org/openstack/neutron/commit/c49a3fba3a67f1fd0aea1646dbf743e7b7793062
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit c49a3fba3a67f1fd0aea1646dbf743e7b7793062
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu May 19 22:02:04 2022 +0000

    Move ``PortBindingChassisEvent`` checks to ``match_fn``

    Moved the event checks to ``match_fn`` method, that is the correct
    place to execute them, before the ``run`` method is called.

    This is a follow-up of
    https://review.opendev.org/c/openstack/neutron/+/842147.

    Related-Bug: #1973347
    Change-Id: I3b7c5d73d2b0d20fb06527ade30af8939b249d75
    (cherry picked from commit 6890204765c5de1a91284b9b0b6bf0565673f53f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/845552
Committed: https://opendev.org/openstack/neutron/commit/ace772787e9d00608c30b2dd321469c1ced8318c
Submitter: "Zuul (22348)"
Branch: stable/xena

commit ace772787e9d00608c30b2dd321469c1ced8318c
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Fri May 13 00:54:27 2022 +0000

    Skip "PortBindingChassisEvent" if revision number changes

    Since [1], the "external_ids" of the NB Logical_Router_Port register
    are copied into the SB Port_Binding "external_ids". When a change
    in a Port_Binding register is received, if only the
    "external_ids:revision_number" is changed, we skip any update on the
    related Logical_Router_Port.

    If not, that will lead to an infinite loop: Neutron will update
    the Logical_Router_Port with a new revision number and OVN will
    copy this new revision number to the SB register, triggering again
    the update of the NB Logical_Router_Port

    [1]https://<email address hidden>/msg62836.html

    Conflicts:
        neutron/tests/functional/base.py

    Closes-Bug: #1973347
    Change-Id: Ib51764778a666050c42de0dfeb9bf9b185d44bb7
    (cherry picked from commit 32e8303b3b21e047abcf365c3999cb7379467b0c)
    (cherry picked from commit b7fb963d91af8247c5624d7f1753d5969658b616)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/845553
Committed: https://opendev.org/openstack/neutron/commit/bb25af021ab45c98b1f623fc7d55c5d50bfd13a8
Submitter: "Zuul (22348)"
Branch: stable/xena

commit bb25af021ab45c98b1f623fc7d55c5d50bfd13a8
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Thu May 19 22:02:04 2022 +0000

    Move ``PortBindingChassisEvent`` checks to ``match_fn``

    Moved the event checks to ``match_fn`` method, that is the correct
    place to execute them, before the ``run`` method is called.

    This is a follow-up of
    https://review.opendev.org/c/openstack/neutron/+/842147.

    Related-Bug: #1973347
    Change-Id: I3b7c5d73d2b0d20fb06527ade30af8939b249d75
    (cherry picked from commit 6890204765c5de1a91284b9b0b6bf0565673f53f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 18.5.0

This issue was fixed in the openstack/neutron 18.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 19.4.0

This issue was fixed in the openstack/neutron 19.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 20.2.0

This issue was fixed in the openstack/neutron 20.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 21.0.0.0rc1

This issue was fixed in the openstack/neutron 21.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/903235

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/903237

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/903491

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/903492

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/victoria)

Change abandoned by "Nicolas Bock <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/903237

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Nicolas Bock <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/903235

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/ussuri)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/903491
Reason: This branch of the project transitioned to End of Life. Open patches need to be abandoned.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/903492
Reason: This branch of the project transitioned to End of Life. Open patches need to be abandoned.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/victoria)

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/903235
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in neutron (Ubuntu Jammy):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.