Possible double row.delete() call in ACL code

Bug #1857016 reported by Terry Wilson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
networking-ovn
New
Undecided
Unassigned
neutron
Fix Released
Undecided
Unassigned

Bug Description

In the field, we've seen:

2019-11-11 14:10:04.765 54 ERROR ovsdbapp.backend.ovs_idl.transaction [req-bb3c2ce5-6a46-47d0-b544-772efa71158f - - - - -] Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/connection.py", line 99, in run
    txn.results.put(txn.do_commit())
  File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 86, in do_commit
    command.run_idl(txn)
  File "/usr/lib/python2.7/site-packages/networking_ovn/ovsdb/commands.py", line 616, in run_idl
    acl_del_obj.delete()
  File "/usr/lib64/python2.7/site-packages/ovs/db/idl.py", line 970, in delete
    assert self._changes is not None
AssertionError

2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance [req-bb3c2ce5-6a46-47d0-b544-772efa71158f - - - - -] Failed to fix resource ebc7e039-81af-4f18-babf-8750fe24d5f0 (type: ports): AssertionError
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance Traceback (most recent call last):
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance File "/usr/lib/python2.7/site-packages/networking_ovn/common/maintenance.py", line 232, in check_for_inconsistencies
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance self._fix_create_update(row)
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance File "/usr/lib/python2.7/site-packages/networking_ovn/common/maintenance.py", line 178, in _fix_create_update
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance res_map['ovn_update'](n_obj)
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance File "/usr/lib/python2.7/site-packages/networking_ovn/common/ovn_client.py", line 495, in update_port
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance self.add_txns_to_remove_port_dns_records(txn, port_object)
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance self.gen.next()
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance File "/usr/lib/python2.7/site-packages/networking_ovn/ovsdb/impl_idl_ovn.py", line 139, in transaction
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance yield t
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance self.gen.next()
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 102, in transaction
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance del self._nested_txns_map[cur_thread_id]
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance File "/usr/lib/python2.7/site-packages/ovsdbapp/api.py", line 59, in __exit__
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance self.result = self.commit()
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance File "/usr/lib/python2.7/site-packages/ovsdbapp/backend/ovs_idl/transaction.py", line 62, in commit
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance raise result.ex
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance AssertionError
2019-11-11 14:10:04.767 54 ERROR networking_ovn.common.maintenance

A cursory look at the python-ovs code makes it look like the maintenance thread might be trying to delete the same row twice since the only way Row._changes = None is in delete() already.

The ACL code passes around dicts which, being unhashable, can't be added to sets to ensure uniqueness. In addition, from a db-schema perspective the same ACL could be referenced from multiple objects. Ultimately this code should be refactored, but a simple workaround for now would be to do a try/except AssertionError around the row.delete() since ignoring a 2nd attempted delete of the same row in a transaction is safe.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-ovn (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/700001

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-ovn (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/700008

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-ovn (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/700063

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-ovn (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.opendev.org/700064

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-ovn (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/700065

Revision history for this message
Bence Romsics (bence-romsics) wrote :

Hi Terry,

Thank you for reporting this bug.

Do you know a way to reproduce this error? If yes, please describe the steps here.

The proposed code change is in a file that seems to be migrated to the neutron repo already:

neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/commands.py

How do we track changes to the ovn codebase during the networking-ovn -> neutron migration? My naive expectation would have been that if the file was migrated to neutron then changes only should be made to it in neutron and no longer in networking-ovn. How do we make sure we don't lose changes like the proposed one?

Unless you are 1000% sure that your proposed fix to master will merge without any change, it usually spares you some work if you only upload backports after the fix merged to master.

Revision history for this message
Bence Romsics (bence-romsics) wrote :

Okay I found the neutron version: https://review.opendev.org/700007

I don't know why launchpad does not link to it. Now I see that we are making the same changes in both places.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/700007
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=474bff078c9c9e736f86b7394320661bde70eaf4
Submitter: Zuul
Branch: master

commit 474bff078c9c9e736f86b7394320661bde70eaf4
Author: Terry Wilson <email address hidden>
Date: Thu Dec 19 09:17:50 2019 -0600

    Work around potential double row.delete() call

    The update_acls code can potentially iterate over the same ACL
    twice. This temporary workaround silently ignores an attempt to
    delete the same row twice in the same transaction, since that
    should be safe. Ultimately, refactoring the ACL code to use sets
    and possibly handle the fact that an ACL could be referenced from
    multiple rows should be done.

    Change-Id: I895eaf4006583fedc2657a4eb527df1ff992c5bc
    Related-bug: #1857016

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on networking-ovn (master)

Change abandoned by Terry Wilson (<email address hidden>) on branch: master
Review: https://review.opendev.org/700001
Reason: this was just here to make it easy to backport.

tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-ovn (stable/train)

Reviewed: https://review.opendev.org/700008
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=82c74cba4b74ef1798cea03bbc98cd6482053933
Submitter: Zuul
Branch: stable/train

commit 82c74cba4b74ef1798cea03bbc98cd6482053933
Author: Terry Wilson <email address hidden>
Date: Thu Dec 19 08:32:44 2019 -0600

    Work around potential double row.delete() call

    The update_acls code can potentially iterate over the same ACL
    twice. This temporary workaround silently ignores an attempt to
    delete the same row twice in the same transaction, since that
    should be safe. Ultimately, refactoring the ACL code to use sets
    and possibly handle the fact that an ACL could be referenced from
    multiple rows should be done.

    Change-Id: I259c92116f7d3186ae5af45f1407052eb57ac0ba
    Related-bug: #1857016
    (cherry-picked from Neutron I895eaf4006583fedc2657a4eb527df1ff992c5bc)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-ovn (stable/stein)

Reviewed: https://review.opendev.org/700063
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=0589a9cf9a9fa1415bea89d266ce6ad19c0f4b74
Submitter: Zuul
Branch: stable/stein

commit 0589a9cf9a9fa1415bea89d266ce6ad19c0f4b74
Author: Terry Wilson <email address hidden>
Date: Thu Dec 19 08:32:44 2019 -0600

    Work around potential double row.delete() call

    The update_acls code can potentially iterate over the same ACL
    twice. This temporary workaround silently ignores an attempt to
    delete the same row twice in the same transaction, since that
    should be safe. Ultimately, refactoring the ACL code to use sets
    and possibly handle the fact that an ACL could be referenced from
    multiple rows should be done.

    Change-Id: I259c92116f7d3186ae5af45f1407052eb57ac0ba
    Related-bug: #1857016
    (cherry-picked from Neutron I895eaf4006583fedc2657a4eb527df1ff992c5bc)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-ovn (stable/queens)

Reviewed: https://review.opendev.org/700065
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=dce03899e3da2b87aec256d0d68cc1e58ca38e70
Submitter: Zuul
Branch: stable/queens

commit dce03899e3da2b87aec256d0d68cc1e58ca38e70
Author: Terry Wilson <email address hidden>
Date: Thu Dec 19 08:32:44 2019 -0600

    Work around potential double row.delete() call

    The update_acls code can potentially iterate over the same ACL
    twice. This temporary workaround silently ignores an attempt to
    delete the same row twice in the same transaction, since that
    should be safe. Ultimately, refactoring the ACL code to use sets
    and possibly handle the fact that an ACL could be referenced from
    multiple rows should be done.

    Change-Id: I259c92116f7d3186ae5af45f1407052eb57ac0ba
    Related-bug: #1857016
    (cherry-picked from Neutron I895eaf4006583fedc2657a4eb527df1ff992c5bc)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-ovn (stable/rocky)

Reviewed: https://review.opendev.org/700064
Committed: https://git.openstack.org/cgit/openstack/networking-ovn/commit/?id=140a1505b132328ff748b0ef279d9c2bbf76ef59
Submitter: Zuul
Branch: stable/rocky

commit 140a1505b132328ff748b0ef279d9c2bbf76ef59
Author: Terry Wilson <email address hidden>
Date: Thu Dec 19 08:32:44 2019 -0600

    Work around potential double row.delete() call

    The update_acls code can potentially iterate over the same ACL
    twice. This temporary workaround silently ignores an attempt to
    delete the same row twice in the same transaction, since that
    should be safe. Ultimately, refactoring the ACL code to use sets
    and possibly handle the fact that an ACL could be referenced from
    multiple rows should be done.

    Change-Id: I259c92116f7d3186ae5af45f1407052eb57ac0ba
    Related-bug: #1857016
    (cherry-picked from Neutron I895eaf4006583fedc2657a4eb527df1ff992c5bc)

tags: added: in-stable-rocky
Revision history for this message
Brian Haley (brian-haley) wrote :

The UpdateACLsCommand class, where the code in question was, has been completely removed in commit e748f3f2d800de6c84b6fe835edfa1385bc223b1 so we can close this bug.

Changed in neutron:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.