OVN DB Sync utility cannot find NB DB Port Group

Bug #2008943 reported by Miro Tomaska
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
Ussuri
Fix Released
High
Unassigned
Victoria
Fix Released
High
Unassigned
Wallaby
Fix Committed
High
Unassigned
Xena
Fix Released
High
Unassigned
neutron
In Progress
Medium
Miro Tomaska
neutron (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
High
Unassigned

Bug Description

Runtime exception:

ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Port_Group with name=pg_aa9f203b_ec51_4893_9bda_cfadbff9f800

can occure while performing database sync between Neutron db and OVN NB db using neutron-ovn-db-sync-util.
This exception occures when the `sync_networks_ports_and_dhcp_opts()` function ends up implicitly creating a new default security group for a tenant/project id. This is normally ok but the problem is that `sync_port_groups` was already called and thus the port_group does not exists in NB DB. When the `sync_acls()` is called later there is no port group found and exception occurs.

Quick way to reproduce on ML2/OVN:
- openstack project create test_project
- openstack create network --project test_project test_network
- openstack port delete $(openstack port list --network test_network -c ID -f value) # since this is an empty network only the metadata port should get listed and subsequently deleted
- openstack security group delete test_project

So now that you have a network without a metadata port in it and no default security group for the project/tenant that this network belongs to run

neutron-ovn-db-sync-util --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --ovn-neutron_sync_mode migrate

The exeption should occur

Here is a more realistic scenario how we can run into this with ML2/OVS to ML2/OVN migration. I am also including why the code runs into it.

1. ML2/OVS enviroment with a network but no default security group for the project/tenant associated with the network
2. Perform ML2/OVS to ML2/OVN migration. This migration process will run neutron-ovn-db-sync-util with --migrate
3. During the sync we first sync port groups[1] from Neutron DB to OVN DB
4. Then we sync network ports [2]. The process will detect that the network in question is not part of OVN NB. It will create that network in OVN NB db and along with that it will create a metadata port for it(OVN network requires metadataport). The Port_create call will implicitly notify _ensure_default_security_group_handler which will not find securty group for that tenant/project id and create one. Now you have a new security group with 4 new default security group rules.
5. When sync_acls[4] runs it will pick up those 4 new rules but commit to NB DB will fail since the port_group(aka security group) does not exists in NB DB

[1] https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_db_sync.py#L104
[2] https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_db_sync.py#L10
[3] https://opendev.org/openstack/neutron/src/branch/master/neutron/db/securitygroups_db.py#L915
[4] https://opendev.org/openstack/neutron/src/branch/master/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_db_sync.py#L107

===== Ubuntu SRU Details =====
[Impact]
See bug description.

[Test Case]
Deploy openstack with OVN. Follow steps in "Quick way to reproduce on ML2/OVN" from bug description.

[Where problems could occur]
The fix mitigates the occurrence of the runtime exception, however the fix retries to sync port groups one more time, so there is potential for the same runtime exception to be raised.

Miro Tomaska (mtomaska)
Changed in neutron:
assignee: nobody → Miro Tomaska (mtomaska)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/875989

Changed in neutron:
status: New → In Progress
Revision history for this message
Lajos Katona (lajos-katona) wrote :
tags: added: ovn
Changed in neutron:
importance: Undecided → Medium
Miro Tomaska (mtomaska)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 20.4.0

This issue was fixed in the openstack/neutron 20.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/890118
Committed: https://opendev.org/openstack/neutron/commit/da9401aa7455a88f65953efedb9e0ce797674445
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit da9401aa7455a88f65953efedb9e0ce797674445
Author: Miro Tomaska <email address hidden>
Date: Wed Mar 1 16:32:50 2023 -0600

    Fix ACL sync when default sg group is created

    Port group not being available in NB DB during ACL sync
    is bit of a corner case but possible during the ML2/OVS
    to ML2/OVN migration sync. It can also happen in ML2/OVN
    only enviroment. See my detailed description of both
    scenarios in the linked Bug.
    The easiest fix is to just retry ALL port groups sync
    one more time if ACL sync cant find a port group row. This
    additional resync is really quick.

    Closes-Bug: #2008943
    Change-Id: Iac1472f7f896ea434deacb6d236ab469f4f6ed56
    (cherry picked from commit 33cf2cdc83a8cee9ee075eb371f779c3d356cf48)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/888651
Committed: https://opendev.org/openstack/neutron/commit/a6f5160b6c20144db5293b615f3c9e35c8fc59c7
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit a6f5160b6c20144db5293b615f3c9e35c8fc59c7
Author: Miro Tomaska <email address hidden>
Date: Wed Mar 1 16:32:50 2023 -0600

    Fix ACL sync when default sg group is created

    Port group not being available in NB DB during ACL sync
    is bit of a corner case but possible during the ML2/OVS
    to ML2/OVN migration sync. It can also happen in ML2/OVN
    only enviroment. See my detailed description of both
    scenarios in the linked Bug.
    The easiest fix is to just retry ALL port groups sync
    one more time if ACL sync cant find a port group row. This
    additional resync is really quick.

    Conflicts:
      neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_db_sync.py

    Closes-Bug: #2008943
    Change-Id: Iac1472f7f896ea434deacb6d236ab469f4f6ed56
    (cherry picked from commit 33cf2cdc83a8cee9ee075eb371f779c3d356cf48)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 23.0.0.0b3

This issue was fixed in the openstack/neutron 23.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 22.1.0

This issue was fixed in the openstack/neutron 22.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 21.2.0

This issue was fixed in the openstack/neutron 21.2.0 release.

Changed in neutron (Ubuntu Focal):
status: New → Triaged
importance: Undecided → High
description: updated
Revision history for this message
Corey Bryant (corey.bryant) wrote :

A new package version with this fix has been uploaded to the focal unapproved queue and victoria/wallaby/xena staging PPAs.

Changed in neutron (Ubuntu):
status: New → Fix Released
Changed in cloud-archive:
status: New → Fix Released
Revision history for this message
Andreas Hasenack (ahasenack) wrote : Please test proposed package

Hello Miro, or anyone else affected,

Accepted neutron into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:16.4.2-0ubuntu6.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in neutron (Ubuntu Focal):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron ussuri-eol

This issue was fixed in the openstack/neutron ussuri-eol release.

Revision history for this message
Brian Haley (brian-haley) wrote :

Verified this was fixed on a juju-deployed focal/ussuri cloud, no errors/exceptions when following reproduction steps where with previous 6.3 version there was.

tags: added: verification-done-focal
removed: verification-needed verification-needed-focal
tags: added: verification-done
Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of the Stable Release Update for neutron has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package neutron - 2:16.4.2-0ubuntu6.4

---------------
neutron (2:16.4.2-0ubuntu6.4) focal; urgency=medium

  [ Corey Bryant ]
  * d/p/ovn-db-sync-continue-on-duplicate-normalise.patch: Cherry-picked
    from upstream to allow ovn_db_sync to continue on duplicate normalised
    CIDR (LP: #1961112).
  * d/p/ovn-db-sync-check-for-router-port-differences.patch:
    Cherry-picked from upstream to ensure router ports are marked
    for needing updates only if they have changed (LP: #2030773).
  * d/p/ovn-specify-port-type-if-router-port-when-updating.patch:
    Specify port type if it's a router port when updating to avoid
    port flapping (LP: #1955578).
  * d/p/fix-acl-sync-when-default-sg-group-created.patch:
    Cherry-picked form upstream to fix ACL sync when default security
    group is created (LP: #2008943).

  [ Mustafa Kemal GILOR ]
  * d/p/add_uplink_status_propagation.patch: Add the
    'uplink-status-propagation' extension to ML2/OVN (LP: #2032770).

 -- Corey Bryant <email address hidden> Wed, 08 Nov 2023 11:41:21 -0500

Changed in neutron (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
James Page (james-page) wrote : Please test proposed package

Hello Miro, or anyone else affected,

Accepted neutron into ussuri-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:ussuri-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-ussuri-needed to verification-ussuri-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-ussuri-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-ussuri-needed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron victoria-eom

This issue was fixed in the openstack/neutron victoria-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron wallaby-eom

This issue was fixed in the openstack/neutron wallaby-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron xena-eom

This issue was fixed in the openstack/neutron xena-eom release.

Revision history for this message
Brian Haley (brian-haley) wrote :

I have tested neutron version 2:16.4.2-0ubuntu6.4~cloud0 from the cloud-archive:ussuri-proposed repository and can verify the code has this change, and the failure does not occur. I followed the steps from the bug description:

Quick way to reproduce on ML2/OVN:
- openstack project create test_project
- openstack create network --project test_project test_network
- openstack port delete $(openstack port list --network test_network -c ID -f value) # since this is an empty network only the metadata port should get listed and subsequently deleted
- openstack security group delete test_project

So now that you have a network without a metadata port in it and no default security group for the project/tenant that this network belongs to run

neutron-ovn-db-sync-util --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini --ovn-neutron_sync_mode migrate

tags: added: verification-ussuri-done
removed: verification-ussuri-needed
Revision history for this message
James Page (james-page) wrote : Update Released

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
James Page (james-page) wrote :

This bug was fixed in the package neutron - 2:16.4.2-0ubuntu6.4~cloud0
---------------

 neutron (2:16.4.2-0ubuntu6.4~cloud0) bionic-ussuri; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 neutron (2:16.4.2-0ubuntu6.4) focal; urgency=medium
 .
   [ Corey Bryant ]
   * d/p/ovn-db-sync-continue-on-duplicate-normalise.patch: Cherry-picked
     from upstream to allow ovn_db_sync to continue on duplicate normalised
     CIDR (LP: #1961112).
   * d/p/ovn-db-sync-check-for-router-port-differences.patch:
     Cherry-picked from upstream to ensure router ports are marked
     for needing updates only if they have changed (LP: #2030773).
   * d/p/ovn-specify-port-type-if-router-port-when-updating.patch:
     Specify port type if it's a router port when updating to avoid
     port flapping (LP: #1955578).
   * d/p/fix-acl-sync-when-default-sg-group-created.patch:
     Cherry-picked form upstream to fix ACL sync when default security
     group is created (LP: #2008943).
 .
   [ Mustafa Kemal GILOR ]
   * d/p/add_uplink_status_propagation.patch: Add the
     'uplink-status-propagation' extension to ML2/OVN (LP: #2032770).

Revision history for this message
James Page (james-page) wrote : Please test proposed package

Hello Miro, or anyone else affected,

Accepted neutron into wallaby-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:wallaby-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-wallaby-needed to verification-wallaby-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-wallaby-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-wallaby-needed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.