Maintenance Task: Race condition while creating resources

Bug #1833445 reported by Lucas Alvares Gomes on 2019-06-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-ovn
High
Lucas Alvares Gomes

Bug Description

The maintenance task check_for_inconsistencies() method should take into account how old is the inconsistency to avoid a race between the commit() in the neutron database and the commit in the OVN database.

The logs below shows a race that resulted in the following error:

[root@overcloud-controller-2 ~]# zgrep "Transaction causes multiple rows" /var/log/containers/neutron/server.log*
/var/log/containers/neutron/server.log.3.gz:2019-06-19 13:27:59.917 29 ERROR ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error: {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table to have identical values (\"89ebd403-68af-43a2-a4e4-5fa86d57d2a2\") for index on column \"name\". First row, with UUID 79530ab2-5929-4cf5-9509-c4ef841c1c11, was inserted by this transaction. Second row, with UUID 721ce5f9-c88d-4d96-a532-20a1c7d142b3, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}: RevisionConflict: OVN revision number for d8164b13-bc8d-47a0-ac5a-2382c534b195 (type: ports) is equal or higher than the given resource. Skipping update
/var/log/containers/neutron/server.log.3.gz:RuntimeError: OVSDB Error: {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table to have identical values (\"89ebd403-68af-43a2-a4e4-5fa86d57d2a2\") for index on column \"name\". First row, with UUID 79530ab2-5929-4cf5-9509-c4ef841c1c11, was inserted by this transaction. Second row, with UUID 721ce5f9-c88d-4d96-a532-20a1c7d142b3, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}
/var/log/containers/neutron/server.log.3.gz:2019-06-19 13:27:59.919 29 ERROR neutron.plugins.ml2.managers [req-86e63ddd-1721-4cc4-90df-1fd41fd12dd9 0a35a35d6b2d4c40a42907adf3d0c535 c569705ecaf049de940f02db5f59aeae - default default] Mechanism driver 'ovn' failed in create_port_postcommit: RuntimeError: OVSDB Error: {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table to have identical values (\"89ebd403-68af-43a2-a4e4-5fa86d57d2a2\") for index on column \"name\". First row, with UUID 79530ab2-5929-4cf5-9509-c4ef841c1c11, was inserted by this transaction. Second row, with UUID 721ce5f9-c88d-4d96-a532-20a1c7d142b3, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}
/var/log/containers/neutron/server.log.3.gz:2019-06-19 13:27:59.919 29 ERROR neutron.plugins.ml2.managers RuntimeError: OVSDB Error: {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\" table to have identical values (\"89ebd403-68af-43a2-a4e4-5fa86d57d2a2\") for index on column \"name\". First row, with UUID 79530ab2-5929-4cf5-9509-c4ef841c1c11, was inserted by this transaction. Second row, with UUID 721ce5f9-c88d-4d96-a532-20a1c7d142b3, existed in the database before this transaction and was not modified by the transaction.","error":"constraint violation"}

This was caused because when the port 89ebd403-68af-43a2-a4e4-5fa86d57d2a2 was created by the mainteance task (in controller-0, see logs below) before the create_port_postcommit() method finished (in controller-2, also see logs below).

[root@overcloud-controller-2 ~]# zgrep AddLSwitchPort /var/log/containers/neutron/server.log* | grep 89ebd403-68af-43a2-a4e4-5fa86d57d2a2
/var/log/containers/neutron/server.log.3.gz:2019-06-19 13:27:48.835 29 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): AddLSwitchPortCommand(lswitch=neutron-1cc70e0d-0242-43b0-9c3c-f86
73944d483, may_exist=True, lport=89ebd403-68af-43a2-a4e4-5fa86d57d2a2, columns={'parent_name': [], 'addresses': [u'fa:16:3e:a7:33:a8 10.10.0.123 fd5b:1fd5:8295:5339::3b'], 'enabled': True, 'port_security': [u'fa
:16:3e:a7:33:a8 10.10.0.123 fd5b:1fd5:8295:5339::3b 0.0.0.0/0 ::/0'], 'dhcpv6_options': [UUID('375a50a6-1d6b-4122-88ea-fd833daaad03')], 'external_ids': {'neutron:cidrs': '10.10.0.123/21 fd5b:1fd5:8295:5339::3b/6
4', 'neutron:device_id': u'', 'neutron:revision_number': '8', 'neutron:port_name': u'ieatenmc12a01_msap_6cc23192-c59f-4cc8-8ebc-3d4832a8b6ec-msap-jdk4e5seknru-1-sfkllkac7mzi-msap_definition_vm_interface1_port-5o
hzqseeiipw', 'neutron:network_name': u'neutron-1cc70e0d-0242-43b0-9c3c-f8673944d483', 'neutron:project_id': u'c569705ecaf049de940f02db5f59aeae', 'neutron:device_owner': u'', 'neutron:security_group_ids': u'18ab8
fcd-9aa1-471e-b781-06891985ebe0 6dd2edbc-1413-4b2d-8639-90b9362afbac'}, 'tag': [], 'type': '', 'options': {'requested-chassis': ''}, 'dhcpv4_options': [UUID('0e5ab835-1579-404d-8a32-c155767d2713')]}) do_commit /
usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84
[root@overcloud-controller-2 ~]# zgrep -B3 AddLSwitchPort /var/log/containers/neutron/server.log* | grep -B3 89ebd403-68af-43a2-a4e4-5fa86d57d2a2
/var/log/containers/neutron/server.log.3.gz:2019-06-19 13:27:48.814 39 INFO neutron.wsgi [-] 10.151.110.7 "OPTIONS / HTTP/1.0" status: 200 len: 249 time: 0.0011911
/var/log/containers/neutron/server.log.3.gz:2019-06-19 13:27:48.820 37 DEBUG neutron.wsgi [-] (37) accepted ('10.151.110.8', 51504) server /usr/lib/python2.7/site-packages/eventlet/wsgi.py:883
/var/log/containers/neutron/server.log.3.gz:2019-06-19 13:27:48.823 37 INFO neutron.wsgi [-] 10.151.110.8 "OPTIONS / HTTP/1.0" status: 200 len: 249 time: 0.0015821
/var/log/containers/neutron/server.log.3.gz:2019-06-19 13:27:48.835 29 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): AddLSwitchPortCommand(lswitch=neutron-1cc70e0d-0242-43b0-9c3c-f8673944d483, may_exist=True, lport=89ebd403-68af-43a2-a4e4-5fa86d57d2a2, columns={'parent_name': [], 'addresses': [u'fa:16:3e:a7:33:a8 10.10.0.123 fd5b:1fd5:8295:5339::3b'], 'enabled': True, 'port_security': [u'fa:16:3e:a7:33:a8 10.10.0.123 fd5b:1fd5:8295:5339::3b 0.0.0.0/0 ::/0'], 'dhcpv6_options': [UUID('375a50a6-1d6b-4122-88ea-fd833daaad03')], 'external_ids': {'neutron:cidrs': '10.10.0.123/21 fd5b:1fd5:8295:5339::3b/64', 'neutron:device_id': u'', 'neutron:revision_number': '8', 'neutron:port_name': u'ieatenmc12a01_msap_6cc23192-c59f-4cc8-8ebc-3d4832a8b6ec-msap-jdk4e5seknru-1-sfkllkac7mzi-msap_definition_vm_interface1_port-5ohzqseeiipw', 'neutron:network_name': u'neutron-1cc70e0d-0242-43b0-9c3c-f8673944d483', 'neutron:project_id': u'c569705ecaf049de940f02db5f59aeae', 'neutron:device_owner': u'', 'neutron:security_group_ids': u'18ab8fcd-9aa1-471e-b781-06891985ebe0 6dd2edbc-1413-4b2d-8639-90b9362afbac'}, 'tag': [], 'type': '', 'options': {'requested-chassis': ''}, 'dhcpv4_options': [UUID('0e5ab835-1579-404d-8a32-c155767d2713')]}) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84

[root@overcloud-controller-0 ~]# zgrep -B3 AddLSwitchPort /var/log/containers/neutron/server.log* | grep -B3 89ebd403-68af-43a2-a4e4-5fa86d57d2a2
/var/log/containers/neutron/server.log.3:2019-06-19 13:27:47.981 36 DEBUG neutron.wsgi [-] (36) accepted ('10.151.110.7', 55898) server /usr/lib/python2.7/site-packages/eventlet/wsgi.py:883
/var/log/containers/neutron/server.log.3:2019-06-19 13:27:48.157 53 DEBUG futurist.periodics [req-7f04858b-7d73-4640-9883-4e1a41d844bc - - - - -] Submitting periodic callback 'networking_ovn.common.maintenance.DBInconsistenciesPeriodics.check_for_inconsistencies' _process_scheduled /usr/lib/python2.7/site-packages/futurist/periodics.py:639
/var/log/containers/neutron/server.log.3:2019-06-19 13:27:48.174 53 WARNING networking_ovn.common.maintenance [req-7f04858b-7d73-4640-9883-4e1a41d844bc - - - - -] Inconsistencies found in the database!
/var/log/containers/neutron/server.log.3:2019-06-19 13:27:48.681 53 DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn command(idx=0): AddLSwitchPortCommand(lswitch=neutron-1cc70e0d-0242-43b0-9c3c-f8673944d483, may_exist=True, lport=89ebd403-68af-43a2-a4e4-5fa86d57d2a2, columns={'parent_name': [], 'addresses': [u'fa:16:3e:a7:33:a8 10.10.0.123 fd5b:1fd5:8295:5339::3b'], 'enabled': True, 'port_security': [u'fa:16:3e:a7:33:a8 10.10.0.123 fd5b:1fd5:8295:5339::3b 0.0.0.0/0 ::/0'], 'dhcpv6_options': [UUID('375a50a6-1d6b-4122-88ea-fd833daaad03')], 'external_ids': {'neutron:cidrs': '10.10.0.123/21 fd5b:1fd5:8295:5339::3b/64', 'neutron:device_id': u'', 'neutron:revision_number': '8', 'neutron:port_name': u'ieatenmc12a01_msap_6cc23192-c59f-4cc8-8ebc-3d4832a8b6ec-msap-jdk4e5seknru-1-sfkllkac7mzi-msap_definition_vm_interface1_port-5ohzqseeiipw', 'neutron:network_name': u'neutron-1cc70e0d-0242-43b0-9c3c-f8673944d483', 'neutron:project_id': u'c569705ecaf049de940f02db5f59aeae', 'neutron:device_owner': u'', 'neutron:security_group_ids': u'18ab8fcd-9aa1-471e-b781-06891985ebe0 6dd2edbc-1413-4b2d-8639-90b9362afbac'}, 'tag': [], 'type': '', 'options': {'requested-chassis': u''}, 'dhcpv4_options': [UUID('0e5ab835-1579-404d-8a32-c155767d2713')]}) do_commit /usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py:84

Changed in networking-ovn:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Lucas Alvares Gomes (lucasagomes)

Fix proposed to branch: master
Review: https://review.opendev.org/666549

Changed in networking-ovn:
status: Confirmed → In Progress

Reviewed: https://review.opendev.org/666549
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=1e0c9370b23315d7e734cf5ebc960e76c454bb61
Submitter: Zuul
Branch: master

commit 1e0c9370b23315d7e734cf5ebc960e76c454bb61
Author: Lucas Alvares Gomes <email address hidden>
Date: Thu Jun 20 10:53:43 2019 +0100

    Maintenance task: Fix race condition at resource creation

    When the maintenance task runs (every 5 minutes) it should take in
    consideration how old is the resource before considering it to be an
    inconsitency.

    Prior to this patch, what could happen is that a resource being created
    was committed to the neutron database but not yet to the OVN database and
    ovn_revision_numbers cache table. In the meantime, the maintenance
    task could run and create that resource before the create_*_postcommit()
    completed resulting in a race condition.

    Using a port as an example, the following error would be seeing when
    that happens (more detailed example in the LP ticket):

    ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error:
    {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\"
    table to have identical values (\"89ebd403-68af-43a2-a4e4-5fa86d57d2a2\")
    for index on column \"name\".

    This patch adds a new filter to the get_inconsistent_resources() method
    which checks the "created_at" column of the resource to see if it's older
    than 60 seconds and ignore if it's newer.

    Closes-Bug: #1833445
    Change-Id: If157e9d4660cc6c566ecad44aa6734af2bd14db1
    Signed-off-by: Lucas Alvares Gomes <email address hidden>

Changed in networking-ovn:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/666825
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=ff0458122f136df6912d0c68a967c60eefe3f1c7
Submitter: Zuul
Branch: stable/stein

commit ff0458122f136df6912d0c68a967c60eefe3f1c7
Author: Lucas Alvares Gomes <email address hidden>
Date: Thu Jun 20 10:53:43 2019 +0100

    Maintenance task: Fix race condition at resource creation

    When the maintenance task runs (every 5 minutes) it should take in
    consideration how old is the resource before considering it to be an
    inconsitency.

    Prior to this patch, what could happen is that a resource being created
    was committed to the neutron database but not yet to the OVN database and
    ovn_revision_numbers cache table. In the meantime, the maintenance
    task could run and create that resource before the create_*_postcommit()
    completed resulting in a race condition.

    Using a port as an example, the following error would be seeing when
    that happens (more detailed example in the LP ticket):

    ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error:
    {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\"
    table to have identical values (\"89ebd403-68af-43a2-a4e4-5fa86d57d2a2\")
    for index on column \"name\".

    This patch adds a new filter to the get_inconsistent_resources() method
    which checks the "created_at" column of the resource to see if it's older
    than 60 seconds and ignore if it's newer.

    Closes-Bug: #1833445
    Change-Id: If157e9d4660cc6c566ecad44aa6734af2bd14db1
    Signed-off-by: Lucas Alvares Gomes <email address hidden>
    (cherry picked from commit 1e0c9370b23315d7e734cf5ebc960e76c454bb61)

tags: added: in-stable-stein

Reviewed: https://review.opendev.org/666827
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=ef5a81d1274d46bd77c2ac4639e47a706dbe33b3
Submitter: Zuul
Branch: stable/queens

commit ef5a81d1274d46bd77c2ac4639e47a706dbe33b3
Author: Lucas Alvares Gomes <email address hidden>
Date: Thu Jun 20 10:53:43 2019 +0100

    Maintenance task: Fix race condition at resource creation

    When the maintenance task runs (every 5 minutes) it should take in
    consideration how old is the resource before considering it to be an
    inconsitency.

    Prior to this patch, what could happen is that a resource being created
    was committed to the neutron database but not yet to the OVN database and
    ovn_revision_numbers cache table. In the meantime, the maintenance
    task could run and create that resource before the create_*_postcommit()
    completed resulting in a race condition.

    Using a port as an example, the following error would be seeing when
    that happens (more detailed example in the LP ticket):

    ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error:
    {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\"
    table to have identical values (\"89ebd403-68af-43a2-a4e4-5fa86d57d2a2\")
    for index on column \"name\".

    This patch adds a new filter to the get_inconsistent_resources() method
    which checks the "created_at" column of the resource to see if it's older
    than 60 seconds and ignore if it's newer.

    Closes-Bug: #1833445
    Change-Id: If157e9d4660cc6c566ecad44aa6734af2bd14db1
    Signed-off-by: Lucas Alvares Gomes <email address hidden>
    (cherry picked from commit 1e0c9370b23315d7e734cf5ebc960e76c454bb61)

tags: added: in-stable-queens

Reviewed: https://review.opendev.org/666826
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=72c57893f3f7b382d10c20b0db7a030bfa11be3b
Submitter: Zuul
Branch: stable/rocky

commit 72c57893f3f7b382d10c20b0db7a030bfa11be3b
Author: Lucas Alvares Gomes <email address hidden>
Date: Thu Jun 20 10:53:43 2019 +0100

    Maintenance task: Fix race condition at resource creation

    When the maintenance task runs (every 5 minutes) it should take in
    consideration how old is the resource before considering it to be an
    inconsitency.

    Prior to this patch, what could happen is that a resource being created
    was committed to the neutron database but not yet to the OVN database and
    ovn_revision_numbers cache table. In the meantime, the maintenance
    task could run and create that resource before the create_*_postcommit()
    completed resulting in a race condition.

    Using a port as an example, the following error would be seeing when
    that happens (more detailed example in the LP ticket):

    ovsdbapp.backend.ovs_idl.transaction [-] OVSDB Error:
    {"details":"Transaction causes multiple rows in \"Logical_Switch_Port\"
    table to have identical values (\"89ebd403-68af-43a2-a4e4-5fa86d57d2a2\")
    for index on column \"name\".

    This patch adds a new filter to the get_inconsistent_resources() method
    which checks the "created_at" column of the resource to see if it's older
    than 60 seconds and ignore if it's newer.

    Closes-Bug: #1833445
    Change-Id: If157e9d4660cc6c566ecad44aa6734af2bd14db1
    Signed-off-by: Lucas Alvares Gomes <email address hidden>
    (cherry picked from commit 1e0c9370b23315d7e734cf5ebc960e76c454bb61)

tags: added: in-stable-rocky
tags: added: networking-ovn-proactive-backport-potential
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers