[OVN] neutron_pg_drop port group table creation race condition

Bug #1866068 reported by Jakub Libosvar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Unassigned

Bug Description

With HA controllers, when first two ports are created simultaneously and each request is picked by a different neutron-server, it can happen one port fails the creation because it fails creating neutron_pg_drop port group entry in OVN.

This is because neutron_pg_drop entry is unique in the whole cloud and is created on the first attempt of port creation, if it doesn't exist.

The solution can be creating the entry during server start.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/711404

Changed in neutron:
status: New → In Progress
Revision history for this message
Bence Romsics (bence-romsics) wrote :
Download full text (7.4 KiB)

I managed to reproduce this with a single q-svc having api_workers=2 config.

Added a time.sleep(5) to just before pg_add() here:

https://opendev.org/openstack/neutron/src/commit/77616fd1773ebca4a6261eec91b2fc5c4ece8e77/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py#L2170

Rebuilt my ovn devstack with NEUTRON_CREATE_INITIAL_NETWORKS=False to prevent the creation of port group 'neutron_pg_drop' during stack.sh.

This triggered the error:

openstack network create net0
openstack port create port0 --network net0 &
openstack port create port1 --network net0 &

One port create succeeded, the other failed with:

HttpException: 500: Server Error for url: http://192.168.122.169:9696/v2.0/ports, Request Failed: internal server error while processing your request.

In the logs:

márc 05 16:22:10 devstack0 neutron-server[11115]: ERROR ovsdbapp.backend.ovs_idl.transaction [None req-6c61fbbe-b81d-4d3d-9e16-2782c64e2128 admin admin] Traceback (most recent call last):
márc 05 16:22:10 devstack0 neutron-server[11115]: File "/usr/local/lib/python3.6/dist-packages/ovsdbapp/backend/ovs_idl/connection.py", line 122, in run
márc 05 16:22:10 devstack0 neutron-server[11115]: txn.results.put(txn.do_commit())
márc 05 16:22:10 devstack0 neutron-server[11115]: File "/usr/local/lib/python3.6/dist-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 123, in do_commit
márc 05 16:22:10 devstack0 neutron-server[11115]: self.post_commit(txn)
márc 05 16:22:10 devstack0 neutron-server[11115]: File "/usr/local/lib/python3.6/dist-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 70, in post_commit
márc 05 16:22:10 devstack0 neutron-server[11115]: command.post_commit(txn)
márc 05 16:22:10 devstack0 neutron-server[11115]: File "/usr/local/lib/python3.6/dist-packages/ovsdbapp/backend/ovs_idl/command.py", line 79, in post_commit
márc 05 16:22:10 devstack0 neutron-server[11115]: row = self.api.tables[self.table_name].rows[real_uuid]
márc 05 16:22:10 devstack0 neutron-server[11115]: File "/usr/lib/python3.6/collections/__init__.py", line 991, in __getitem__
márc 05 16:22:10 devstack0 neutron-server[11115]: raise KeyError(key)
márc 05 16:22:10 devstack0 neutron-server[11115]: KeyError: <ovsdbapp.backend.ovs_idl.rowview.RowView object at 0x7ff6eae66860>
márc 05 16:22:10 devstack0 neutron-server[11115]:
márc 05 16:22:10 devstack0 neutron-server[11115]: ERROR neutron.plugins.ml2.managers [None req-6c61fbbe-b81d-4d3d-9e16-2782c64e2128 admin admin] Mechanism driver 'ovn' failed in create_port_postcommit: KeyError: <
ovsdbapp.backend.ovs_idl.rowview.RowView object at 0x7ff6eae66860>
márc 05 16:22:10 devstack0 neutron-server[11115]: ERROR neutron.plugins.ml2.managers Traceback (most recent call last):
márc 05 16:22:10 devstack0 neutron-server[11115]: ERROR neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/managers.py", line 477, in _call_on_drivers
márc 05 16:22:10 devstack0 neutron-server[11115]: ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context)
márc 05 16:22:10 devstack0 neutron-server[11115]: ERROR neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/driv...

Read more...

Changed in neutron:
status: In Progress → Triaged
importance: Undecided → Medium
Revision history for this message
Bence Romsics (bence-romsics) wrote :

Also verified with 'ovn-nbctl list Port_Group' that 'neutron_pg_drop' did not exist before the port creates but it did after.

Revision history for this message
Jakub Libosvar (libosvar) wrote :

Adding the original trace for search purposes:

2020-02-29 02:54:15.684 22 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=1): PgAddCommand(name=neutron_pg_drop, may_exist=True, columns={'acls': []}) do_commit /usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
...
2020-02-29 02:54:15.690 22 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: {"details":"Transaction causes multiple rows in \"Port_Group\" table to have identical values (neutron_pg_drop) for index on column \"name\". First row, with UUID 2025ff8b-f8b0-4f39-9fdf-9b48b1a02b7a, was inserted by this transaction. Second row, with UUID f516334e-d700-4f65-8aaa-abb5c4fd465a, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}
2020-02-29 02:54:15.691 22 ERROR ovsdbapp.backend.ovs_idl.transaction [req-ab175c93-b83b-4345-bec6-deef1829aec6 3cbd95c02380422ca96bc3de8418bcc0 99ee3dee19754d168d47c35e9337db20 - default default] Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 122, in run
    txn.results.put(txn.do_commit())
  File "/usr/lib/python3.6/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 115, in do_commit
    raise RuntimeError(msg)
RuntimeError: OVSDB Error: {"details":"Transaction causes multiple rows in \"Port_Group\" table to have identical values (neutron_pg_drop) for index on column \"name\". First row, with UUID 2025ff8b-f8b0-4f39-9fdf-9b48b1a02b7a, was inserted by this transaction. Second row, with UUID f516334e-d700-4f65-8aaa-abb5c4fd465a, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}

Revision history for this message
Bence Romsics (bence-romsics) wrote :

As discussed in the proposed fix my reproduction in comment #2 is likely catching a different bug. Instead please see the proper details in the stack trace posted by Jakub in comment #4.

Changed in neutron:
status: Triaged → In Progress
Changed in neutron:
assignee: Jakub Libosvar (libosvar) → Terry Wilson (otherwiseguy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/711404
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d7c23431ad3959eb5fd74e42ea95d446e4e7566d
Submitter: Zuul
Branch: master

commit d7c23431ad3959eb5fd74e42ea95d446e4e7566d
Author: Jakub Libosvar <email address hidden>
Date: Wed Mar 18 14:27:17 2020 +0000

    [ovn]: Create neutron_pg_drop Port Group on init

    The patch adds a short living connection in pre-fork routine that
    creates neutron_pg_drop Port Group. Later after workers are spawned,
    each worker also creates a short living connection and waits for an
    event that the Port Group was created.

    The short living IDLs limit its tables only for relevant tables so it
    doesn't fetch the whole OVS DB to the local copy.

    Closes-bug: #1866068

    Change-Id: I1f5af36b8c3d5650f890edfed3c33dc206869824
    Signed-off-by: Jakub Libosvar <email address hidden>

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/802528

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/802528
Committed: https://opendev.org/openstack/neutron/commit/a6106ac2bd4ab38a0e3a5e80e44da6f02d5322ce
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit a6106ac2bd4ab38a0e3a5e80e44da6f02d5322ce
Author: Jakub Libosvar <email address hidden>
Date: Wed Mar 18 14:27:17 2020 +0000

    [ovn]: Create neutron_pg_drop Port Group on init

    The patch adds a short living connection in pre-fork routine that
    creates neutron_pg_drop Port Group. Later after workers are spawned,
    each worker also creates a short living connection and waits for an
    event that the Port Group was created.

    The short living IDLs limit its tables only for relevant tables so it
    doesn't fetch the whole OVS DB to the local copy.

    Closes-bug: #1866068

     Conflicts:
            neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py
            neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py
            neutron/tests/functional/plugins/ml2/drivers/ovn/mech_driver/test_mech_driver.py

    Change-Id: I1f5af36b8c3d5650f890edfed3c33dc206869824
    Signed-off-by: Jakub Libosvar <email address hidden>
    (cherry picked from commit d7c23431ad3959eb5fd74e42ea95d446e4e7566d)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 16.4.1

This issue was fixed in the openstack/neutron 16.4.1 release.

Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
assignee: Terry Wilson (otherwiseguy) → nobody
tags: added: timeout-abandon
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.