Maintenance Task: Race condition while creating resources

Bug #1833445 reported by Lucas Alvares Gomes
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-ovn
Fix Released
High
Lucas Alvares Gomes

Bug Description

The maintenance task check_for_inconsistencies() method should take into account how old is the inconsistency to avoid a race between the commit() in the neutron database and the commit in the OVN database.

The logs below shows a race that resulted in the following error:

[root@overcloud-controller-2 ~]# zgrep "Transaction causes multiple rows" /var/log/containers/neutron/server.log*
/var/log/containers/neutron/server.log.3.gz:2019-06-19 13:27:59.917 29 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table to have identical values (\"89ebd403-68af-43a2-a4e4-5fa86d57d2a2\") for index on column \"name\". First row, with UUID 79530ab2-5929-4cf5-9509-c4ef841c1c11, was inserted by this transaction. Second row, with UUID 721ce5f9-c88d-4d96-a532-20a1c7d142b3, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}: RevisionConflict: OVN revision number for d8164b13-bc8d-47a0-ac5a-2382c534b195 (type: ports) is equal or higher than the given resource. Skipping update
/var/log/containers/neutron/server.log.3.gz:RuntimeError: OVSDB Error: {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table to have identical values (\"89ebd403-68af-43a2-a4e4-5fa86d57d2a2\") for index on column \"name\". First row, with UUID 79530ab2-5929-4cf5-9509-c4ef841c1c11, was inserted by this transaction. Second row, with UUID 721ce5f9-c88d-4d96-a532-20a1c7d142b3, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}
/var/log/containers/neutron/server.log.3.gz:2019-06-19 13:27:59.919 29 ERROR neutron.plugins.ml2.managers [req-86e63ddd-1721-4cc4-90df-1fd41fd12dd9 0a35a35d6b2d4c40a42907adf3d0c535 c569705ecaf049de940f02db5f59aeae - default default] Mechanism driver 'ovn' failed in create_port_postcommit: RuntimeError: OVSDB Error: {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table to have identical values (\"89ebd403-68af-43a2-a4e4-5fa86d57d2a2\") for index on column \"name\". First row, with UUID 79530ab2-5929-4cf5-9509-c4ef841c1c11, was inserted by this transaction. Second row, with UUID 721ce5f9-c88d-4d96-a532-20a1c7d142b3, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}
/var/log/containers/neutron/server.log.3.gz:2019-06-19 13:27:59.919 29 ERROR neutron.plugins.ml2.managers RuntimeError: OVSDB Error: {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table to have identical values (\"89ebd403-68af-43a2-a4e4-5fa86d57d2a2\") for index on column \"name\". First row, with UUID 79530ab2-5929-4cf5-9509-c4ef841c1c11, was inserted by this transaction. Second row, with UUID 721ce5f9-c88d-4d96-a532-20a1c7d142b3, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}

This was caused because when the port 89ebd403-68af-43a2-a4e4-5fa86d57d2a2 was created by the mainteance task (in controller-0, see logs below) before the create_port_postcommit() method finished (in controller-2, also see logs below).

[root@overcloud-controller-2 ~]# zgrep AddLSwitchPort /var/log/containers/neutron/server.log* | grep 89ebd403-68af-43a2-a4e4-5fa86d57d2a2
/var/log/containers/neutron/server.log.3.gz:2019-06-19 13:27:48.835 29 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): AddLSwitchPortCommand(lswitch=neutron-1cc70e0d-0242-43b0-9c3c-f86
73944d483, may_exist=True, lport=89ebd403-68af-43a2-a4e4-5fa86d57d2a2, columns={'parent_name': [], 'addresses': [u'fa:16:3e:a7:33:a8 10.10.0.123 fd5b:1fd5:8295:5339::3b'], 'enabled': True, 'port_security': [u'fa
:16:3e:a7:33:a8 10.10.0.123 fd5b:1fd5:8295:5339::3b 0.0.0.0/0 ::/0'], 'dhcpv6_options': [UUID('375a50a6-1d6b-4122-88ea-fd833daaad03')], 'external_ids': {'neutron:cidrs': '10.10.0.123/21 fd5b:1fd5:8295:5339::3b/6
4', 'neutron:device_id': u'', 'neutron:revision_number': '8', 'neutron:port_name': u'ieatenmc12a01_msap_6cc23192-c59f-4cc8-8ebc-3d4832a8b6ec-msap-jdk4e5seknru-1-sfkllkac7mzi-msap_definition_vm_interface1_port-5o
hzqseeiipw', 'neutron:network_name': u'neutron-1cc70e0d-0242-43b0-9c3c-f8673944d483', 'neutron:project_id': u'c569705ecaf049de940f02db5f59aeae', 'neutron:device_owner': u'', 'neutron:security_group_ids': u'18ab8
fcd-9aa1-471e-b781-06891985ebe0 6dd2edbc-1413-4b2d-8639-90b9362afbac'}, 'tag': [], 'type': '', 'options': {'requested-chassis': ''}, 'dhcpv4_options': [UUID('0e5ab835-1579-404d-8a32-c155767d2713')]}) do_commit /
usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
[root@overcloud-controller-2 ~]# zgrep -B3 AddLSwitchPort /var/log/containers/neutron/server.log* | grep -B3 89ebd403-68af-43a2-a4e4-5fa86d57d2a2
/var/log/containers/neutron/server.log.3.gz:2019-06-19 13:27:48.814 39 INFO neutron.wsgi [-] 10.151.110.7 "OPTIONS / HTTP/1.0" status: 200 len: 249 time: 0.0011911
/var/log/containers/neutron/server.log.3.gz:2019-06-19 13:27:48.820 37 DEBUG neutron.wsgi [-] (37) accepted ('10.151.110.8', 51504) server /usr/lib/python2.7/site-packages/eventlet/wsgi.py:883
/var/log/containers/neutron/server.log.3.gz:2019-06-19 13:27:48.823 37 INFO neutron.wsgi [-] 10.151.110.8 "OPTIONS / HTTP/1.0" status: 200 len: 249 time: 0.0015821
/var/log/containers/neutron/server.log.3.gz:2019-06-19 13:27:48.835 29 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): AddLSwitchPortCommand(lswitch=neutron-1cc70e0d-0242-43b0-9c3c-f8673944d483, may_exist=True, lport=89ebd403-68af-43a2-a4e4-5fa86d57d2a2, columns={'parent_name': [], 'addresses': [u'fa:16:3e:a7:33:a8 10.10.0.123 fd5b:1fd5:8295:5339::3b'], 'enabled': True, 'port_security': [u'fa:16:3e:a7:33:a8 10.10.0.123 fd5b:1fd5:8295:5339::3b 0.0.0.0/0 ::/0'], 'dhcpv6_options': [UUID('375a50a6-1d6b-4122-88ea-fd833daaad03')], 'external_ids': {'neutron:cidrs': '10.10.0.123/21 fd5b:1fd5:8295:5339::3b/64', 'neutron:device_id': u'', 'neutron:revision_number': '8', 'neutron:port_name': u'ieatenmc12a01_msap_6cc23192-c59f-4cc8-8ebc-3d4832a8b6ec-msap-jdk4e5seknru-1-sfkllkac7mzi-msap_definition_vm_interface1_port-5ohzqseeiipw', 'neutron:network_name': u'neutron-1cc70e0d-0242-43b0-9c3c-f8673944d483', 'neutron:project_id': u'c569705ecaf049de940f02db5f59aeae', 'neutron:device_owner': u'', 'neutron:security_group_ids': u'18ab8fcd-9aa1-471e-b781-06891985ebe0 6dd2edbc-1413-4b2d-8639-90b9362afbac'}, 'tag': [], 'type': '', 'options': {'requested-chassis': ''}, 'dhcpv4_options': [UUID('0e5ab835-1579-404d-8a32-c155767d2713')]}) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84

[root@overcloud-controller-0 ~]# zgrep -B3 AddLSwitchPort /var/log/containers/neutron/server.log* | grep -B3 89ebd403-68af-43a2-a4e4-5fa86d57d2a2
/var/log/containers/neutron/server.log.3:2019-06-19 13:27:47.981 36 DEBUG neutron.wsgi [-] (36) accepted ('10.151.110.7', 55898) server /usr/lib/python2.7/site-packages/eventlet/wsgi.py:883
/var/log/containers/neutron/server.log.3:2019-06-19 13:27:48.157 53 DEBUG futurist.periodics [req-7f04858b-7d73-4640-9883-4e1a41d844bc - - - - -] Submitting periodic callback 'networking_ovn.common.maintenance.DBInconsistenciesPeriodics.check_for_inconsistencies' _process_scheduled /usr/lib/python2.7/site-packages/futurist/periodics.py:639
/var/log/containers/neutron/server.log.3:2019-06-19 13:27:48.174 53 WARNING networking_ovn.common.maintenance [req-7f04858b-7d73-4640-9883-4e1a41d844bc - - - - -] Inconsistencies found in the database!
/var/log/containers/neutron/server.log.3:2019-06-19 13:27:48.681 53 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): AddLSwitchPortCommand(lswitch=neutron-1cc70e0d-0242-43b0-9c3c-f8673944d483, may_exist=True, lport=89ebd403-68af-43a2-a4e4-5fa86d57d2a2, columns={'parent_name': [], 'addresses': [u'fa:16:3e:a7:33:a8 10.10.0.123 fd5b:1fd5:8295:5339::3b'], 'enabled': True, 'port_security': [u'fa:16:3e:a7:33:a8 10.10.0.123 fd5b:1fd5:8295:5339::3b 0.0.0.0/0 ::/0'], 'dhcpv6_options': [UUID('375a50a6-1d6b-4122-88ea-fd833daaad03')], 'external_ids': {'neutron:cidrs': '10.10.0.123/21 fd5b:1fd5:8295:5339::3b/64', 'neutron:device_id': u'', 'neutron:revision_number': '8', 'neutron:port_name': u'ieatenmc12a01_msap_6cc23192-c59f-4cc8-8ebc-3d4832a8b6ec-msap-jdk4e5seknru-1-sfkllkac7mzi-msap_definition_vm_interface1_port-5ohzqseeiipw', 'neutron:network_name': u'neutron-1cc70e0d-0242-43b0-9c3c-f8673944d483', 'neutron:project_id': u'c569705ecaf049de940f02db5f59aeae', 'neutron:device_owner': u'', 'neutron:security_group_ids': u'18ab8fcd-9aa1-471e-b781-06891985ebe0 6dd2edbc-1413-4b2d-8639-90b9362afbac'}, 'tag': [], 'type': '', 'options': {'requested-chassis': u''}, 'dhcpv4_options': [UUID('0e5ab835-1579-404d-8a32-c155767d2713')]}) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84

Changed in networking-ovn:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Lucas Alvares Gomes (lucasagomes)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (master)

Fix proposed to branch: master
Review: https://review.opendev.org/666549

Changed in networking-ovn:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/666825

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/666826

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-ovn (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/666827

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (master)

Reviewed: https://review.opendev.org/666549
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=1e0c9370b23315d7e734cf5ebc960e76c454bb61
Submitter: Zuul
Branch: master

commit 1e0c9370b23315d7e734cf5ebc960e76c454bb61
Author: Lucas Alvares Gomes <email address hidden>
Date: Thu Jun 20 10:53:43 2019 +0100

    Maintenance task: Fix race condition at resource creation

    When the maintenance task runs (every 5 minutes) it should take in
    consideration how old is the resource before considering it to be an
    inconsitency.

    Prior to this patch, what could happen is that a resource being created
    was committed to the neutron database but not yet to the OVN database and
    ovn_revision_numbers cache table. In the meantime, the maintenance
    task could run and create that resource before the create_*_postcommit()
    completed resulting in a race condition.

    Using a port as an example, the following error would be seeing when
    that happens (more detailed example in the LP ticket):

    ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error:
    {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\"
    table to have identical values (\"89ebd403-68af-43a2-a4e4-5fa86d57d2a2\")
    for index on column \"name\".

    This patch adds a new filter to the get_inconsistent_resources() method
    which checks the "created_at" column of the resource to see if it's older
    than 60 seconds and ignore if it's newer.

    Closes-Bug: #1833445
    Change-Id: If157e9d4660cc6c566ecad44aa6734af2bd14db1
    Signed-off-by: Lucas Alvares Gomes <email address hidden>

Changed in networking-ovn:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (stable/stein)

Reviewed: https://review.opendev.org/666825
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=ff0458122f136df6912d0c68a967c60eefe3f1c7
Submitter: Zuul
Branch: stable/stein

commit ff0458122f136df6912d0c68a967c60eefe3f1c7
Author: Lucas Alvares Gomes <email address hidden>
Date: Thu Jun 20 10:53:43 2019 +0100

    Maintenance task: Fix race condition at resource creation

    When the maintenance task runs (every 5 minutes) it should take in
    consideration how old is the resource before considering it to be an
    inconsitency.

    Prior to this patch, what could happen is that a resource being created
    was committed to the neutron database but not yet to the OVN database and
    ovn_revision_numbers cache table. In the meantime, the maintenance
    task could run and create that resource before the create_*_postcommit()
    completed resulting in a race condition.

    Using a port as an example, the following error would be seeing when
    that happens (more detailed example in the LP ticket):

    ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error:
    {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\"
    table to have identical values (\"89ebd403-68af-43a2-a4e4-5fa86d57d2a2\")
    for index on column \"name\".

    This patch adds a new filter to the get_inconsistent_resources() method
    which checks the "created_at" column of the resource to see if it's older
    than 60 seconds and ignore if it's newer.

    Closes-Bug: #1833445
    Change-Id: If157e9d4660cc6c566ecad44aa6734af2bd14db1
    Signed-off-by: Lucas Alvares Gomes <email address hidden>
    (cherry picked from commit 1e0c9370b23315d7e734cf5ebc960e76c454bb61)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (stable/queens)

Reviewed: https://review.opendev.org/666827
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=ef5a81d1274d46bd77c2ac4639e47a706dbe33b3
Submitter: Zuul
Branch: stable/queens

commit ef5a81d1274d46bd77c2ac4639e47a706dbe33b3
Author: Lucas Alvares Gomes <email address hidden>
Date: Thu Jun 20 10:53:43 2019 +0100

    Maintenance task: Fix race condition at resource creation

    When the maintenance task runs (every 5 minutes) it should take in
    consideration how old is the resource before considering it to be an
    inconsitency.

    Prior to this patch, what could happen is that a resource being created
    was committed to the neutron database but not yet to the OVN database and
    ovn_revision_numbers cache table. In the meantime, the maintenance
    task could run and create that resource before the create_*_postcommit()
    completed resulting in a race condition.

    Using a port as an example, the following error would be seeing when
    that happens (more detailed example in the LP ticket):

    ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error:
    {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\"
    table to have identical values (\"89ebd403-68af-43a2-a4e4-5fa86d57d2a2\")
    for index on column \"name\".

    This patch adds a new filter to the get_inconsistent_resources() method
    which checks the "created_at" column of the resource to see if it's older
    than 60 seconds and ignore if it's newer.

    Closes-Bug: #1833445
    Change-Id: If157e9d4660cc6c566ecad44aa6734af2bd14db1
    Signed-off-by: Lucas Alvares Gomes <email address hidden>
    (cherry picked from commit 1e0c9370b23315d7e734cf5ebc960e76c454bb61)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-ovn (stable/rocky)

Reviewed: https://review.opendev.org/666826
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=72c57893f3f7b382d10c20b0db7a030bfa11be3b
Submitter: Zuul
Branch: stable/rocky

commit 72c57893f3f7b382d10c20b0db7a030bfa11be3b
Author: Lucas Alvares Gomes <email address hidden>
Date: Thu Jun 20 10:53:43 2019 +0100

    Maintenance task: Fix race condition at resource creation

    When the maintenance task runs (every 5 minutes) it should take in
    consideration how old is the resource before considering it to be an
    inconsitency.

    Prior to this patch, what could happen is that a resource being created
    was committed to the neutron database but not yet to the OVN database and
    ovn_revision_numbers cache table. In the meantime, the maintenance
    task could run and create that resource before the create_*_postcommit()
    completed resulting in a race condition.

    Using a port as an example, the following error would be seeing when
    that happens (more detailed example in the LP ticket):

    ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error:
    {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\"
    table to have identical values (\"89ebd403-68af-43a2-a4e4-5fa86d57d2a2\")
    for index on column \"name\".

    This patch adds a new filter to the get_inconsistent_resources() method
    which checks the "created_at" column of the resource to see if it's older
    than 60 seconds and ignore if it's newer.

    Closes-Bug: #1833445
    Change-Id: If157e9d4660cc6c566ecad44aa6734af2bd14db1
    Signed-off-by: Lucas Alvares Gomes <email address hidden>
    (cherry picked from commit 1e0c9370b23315d7e734cf5ebc960e76c454bb61)

tags: added: in-stable-rocky
tags: added: networking-ovn-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 7.0.0.0b1

This issue was fixed in the openstack/networking-ovn 7.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 4.0.4

This issue was fixed in the openstack/networking-ovn 4.0.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 6.0.1

This issue was fixed in the openstack/networking-ovn 6.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-ovn 5.1.0

This issue was fixed in the openstack/networking-ovn 5.1.0 release.

tags: removed: networking-ovn-proactive-backport-potential
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.