DB session commit error in resource_registry.set_resources_dirty

Bug #1943714 reported by Slawek Kaplonski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Slawek Kaplonski

Bug Description

It seems that patch https://review.opendev.org/c/openstack/neutron/+/805031 introduced some new error during call of resource_registry.set_resources_dirty() in https://github.com/openstack/neutron/blob/6db261962894b1667dd213b116e89246a3e54386/neutron/api/v2/base.py#L506

I didn't saw that issue in our CI jobs on master branch but we noticed them in the d/s jobs on OSP-16 which is based on Train. Error is like:

2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource six.reraise(self.type_, self.value, self.tb) [731/1883]
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource raise value
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib/python3.6/site-packages/oslo_db/api.py", line 142, in wrapper
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource return f(*args, **kwargs)
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib/python3.6/site-packages/neutron_lib/db/api.py", line 183, in wrapped
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource LOG.debug("Retry wrapper got retriable exception: %s", e)
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource self.force_reraise()
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource six.reraise(self.type_, self.value, self.tb)
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource raise value
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib/python3.6/site-packages/neutron_lib/db/api.py", line 179, in wrapped
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource return f(*dup_args, **dup_kwargs)
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib/python3.6/site-packages/neutron/api/v2/base.py", line 558, in _create
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource obj)})
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib/python3.6/site-packages/neutron/api/v2/base.py", line 500, in notify
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource resource_registry.set_resources_dirty(request.context)
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource next(self.gen)
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 1065, in _transaction_scope
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource yield resource
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib64/python3.6/contextlib.py", line 88, in __exit__
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource next(self.gen)
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 667, in _session
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource self.session.rollback()
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource self.force_reraise()
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource six.reraise(self.type_, self.value, self.tb)
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib/python3.6/site-packages/six.py", line 675, in reraise
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource raise value
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 664, in _session
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource self._end_session_transaction(self.session)
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib/python3.6/site-packages/oslo_db/sqlalchemy/enginefacade.py", line 692, in _end_session_transaction
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource session.commit()
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 1026, in commit
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource self.transaction.commit()
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 491, in commit
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource self._assert_active(prepared_ok=True)
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource File "/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py", line 294, in _assert_active
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource % self._rollback_exception
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource sqlalchemy.exc.InvalidRequestError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original ex$
eption was: (pymysql.err.OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction')
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource [SQL: DELETE FROM reservations WHERE reservations.id = %(id)s]
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource [parameters: {'id': '3644bc07-a2b6-47b2-9767-bbf89f9606e2'}]
2021-09-15 09:50:09.540 15 ERROR neutron.api.v2.resource (Background on this error at: http://sqlalche.me/e/e3q8)

I guess that this may be some race condition which can be hit under specific conditions and IMHO it can happend also in master branch as well.

Changed in neutron:
assignee: nobody → Slawek Kaplonski (slaweq)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/809191

Changed in neutron:
status: New → In Progress
Changed in neutron:
milestone: none → xena-rc2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/809983

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/809191
Committed: https://opendev.org/openstack/neutron/commit/f8f50397ca1e4ab7f5f31b19dde255ab70b4ccaf
Submitter: "Zuul (22348)"
Branch: master

commit f8f50397ca1e4ab7f5f31b19dde255ab70b4ccaf
Author: Slawek Kaplonski <email address hidden>
Date: Wed Sep 15 15:47:35 2021 +0200

    Rollback db session in case of error during releasing quota reservation

    Patch [1] changed to not fail if DBError will happend when releasing
    quota reservation. That may lead to the errors while commiting db
    transaction in the neutron/api/v2/base.py module when in same
    transaction Neutron commits reservation (which removes reservation from
    db) and then set resources dirty. In case if DB error happens in the
    commit_reservation() and we will simply pass this error and move on,
    transaction can't be commited without rollback.

    This patch adds handle of such DBErrors in the remove_reservation
    function so transaction can be rolled back in case of DB error happens.

    [1] https://review.opendev.org/c/openstack/neutron/+/805031

    Closes-Bug: #1943714
    Change-Id: I295a4f0eb1eaf0286f0e34b96db29c8f08340b84

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/810873

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/810873
Committed: https://opendev.org/openstack/neutron/commit/78cba51af3f60cc0c2eefaf4a93c109d7eb98f2c
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 78cba51af3f60cc0c2eefaf4a93c109d7eb98f2c
Author: Slawek Kaplonski <email address hidden>
Date: Wed Sep 15 15:47:35 2021 +0200

    Rollback db session in case of error during releasing quota reservation

    Patch [1] changed to not fail if DBError will happend when releasing
    quota reservation. That may lead to the errors while commiting db
    transaction in the neutron/api/v2/base.py module when in same
    transaction Neutron commits reservation (which removes reservation from
    db) and then set resources dirty. In case if DB error happens in the
    commit_reservation() and we will simply pass this error and move on,
    transaction can't be commited without rollback.

    This patch adds handle of such DBErrors in the remove_reservation
    function so transaction can be rolled back in case of DB error happens.

    [1] https://review.opendev.org/c/openstack/neutron/+/805031

    Closes-Bug: #1943714
    Change-Id: I295a4f0eb1eaf0286f0e34b96db29c8f08340b84
    (cherry picked from commit f8f50397ca1e4ab7f5f31b19dde255ab70b4ccaf)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/811124

Changed in neutron:
status: Fix Released → Confirmed
Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/811124
Committed: https://opendev.org/openstack/neutron/commit/23f956ab37618d5ec6b1b2bf0d50dea7a601513c
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 23f956ab37618d5ec6b1b2bf0d50dea7a601513c
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Mon Sep 20 09:27:39 2021 +0000

    Execute the quota reservation removal in an isolated DB txn

    The goal of [1] is to, in case of failing when removing the quota
    reservation, continue the operation. Any expired reservation will
    be removed automatically in any driver.

    If the DB transaction fails, it should affect only to the reservation
    trying to be deleted. This is why this patch isolates the
    "remove_reservation" method and guarantees it is called outside an
    active DB session. That guarantees, in case of failure, no other DB
    operation will be affected.

    This patch also partially reverts [2] but still checks the security
    group rule quota when a new security group is created. Instead of
    creating and releasing a quota reservation for the security group
    rules created, now only the available quota limit is checked before
    creating them. That won't prevent another operation to create security
    group rules in parallel, exceeding the available quota. However, this
    is not even guaranteed with the current quota driver.

    [1]https://review.opendev.org/c/openstack/neutron/+/805031
    [2]https://review.opendev.org/c/openstack/neutron/+/701565

    Closes-Bug: #1943714

    Conflicts:
        neutron/tests/unit/db/quota/test_driver.py
        neutron/db/quota/driver.py

    Change-Id: Id73368576a948f78a043d7cf0be16661a65626a9
    (cherry picked from commit 603abeb977d4018963beade5c858b53f990ef32a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/809983
Committed: https://opendev.org/openstack/neutron/commit/603abeb977d4018963beade5c858b53f990ef32a
Submitter: "Zuul (22348)"
Branch: master

commit 603abeb977d4018963beade5c858b53f990ef32a
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Mon Sep 20 09:27:39 2021 +0000

    Execute the quota reservation removal in an isolated DB txn

    The goal of [1] is to, in case of failing when removing the quota
    reservation, continue the operation. Any expired reservation will
    be removed automatically in any driver.

    If the DB transaction fails, it should affect only to the reservation
    trying to be deleted. This is why this patch isolates the
    "remove_reservation" method and guarantees it is called outside an
    active DB session. That guarantees, in case of failure, no other DB
    operation will be affected.

    This patch also partially reverts [2] but still checks the security
    group rule quota when a new security group is created. Instead of
    creating and releasing a quota reservation for the security group
    rules created, now only the available quota limit is checked before
    creating them. That won't prevent another operation to create security
    group rules in parallel, exceeding the available quota. However, this
    is not even guaranteed with the current quota driver.

    [1]https://review.opendev.org/c/openstack/neutron/+/805031
    [2]https://review.opendev.org/c/openstack/neutron/+/701565

    Closes-Bug: #1943714

    Change-Id: Id73368576a948f78a043d7cf0be16661a65626a9

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 19.0.0.0rc2

This issue was fixed in the openstack/neutron 19.0.0.0rc2 release candidate.

tags: added: neutron-proactive-backport-potential
tags: removed: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 20.0.0.0rc1

This issue was fixed in the openstack/neutron 20.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.