db deadlock on service_update()

Bug #1370191 reported by Russell Bryant
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Russell Bryant
Havana
Fix Released
Medium
Russell Bryant
Icehouse
Fix Released
Medium
Russell Bryant

Bug Description

Several methods in nova.db.sqlalchemy.api are decorated with @_retry_on_deadlock. service_update() is not currently one of them, but it should be based on the following backtrace:

4-09-15 15:40:22.574 34384 ERROR nova.servicegroup.drivers.db [-] model
server went away
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db Traceback
(most recent call last):
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib/python2.7/site-packages/nova/servicegroup/drivers/db.py", line
95, in _report_state
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
service.service_ref, state_catalog)
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib/python2.7/site-packages/nova/conductor/api.py", line 218, in
service_update
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
return self._manager.service_update(context, service, values)
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib/python2.7/site-packages/nova/utils.py", line 967, in wrapper
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
return func(*args, **kwargs)
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib/python2.7/site-packages/oslo/messaging/rpc/server.py", line 139,
in inner
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
return func(*args, **kwargs)
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 491, in
service_update
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db svc =
self.db.service_update(context, service['id'], values)
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib/python2.7/site-packages/nova/db/api.py", line 148, in
service_update
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
return IMPL.service_update(context, service_id, values)
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 146, in
wrapper
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
return f(*args, **kwargs)
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 533, in
service_update
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
service_ref.update(values)
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 447,
in __exit__
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
self.rollback()
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line
58, in __exit__
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
compat.reraise(exc_type, exc_value, exc_tb)
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 444,
in __exit__
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
self.commit()
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 358,
in commit
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
t[1].commit()
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1195,
in commit
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
self._do_commit()
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1226,
in _do_commit
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
self.connection._commit_impl()
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 491,
in _commit_impl
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
self._handle_dbapi_exception(e, None, None, None, None)
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1024,
in _handle_dbapi_exception
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
exc_info
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 196,
in raise_from_cause
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
reraise(type(exception), exception, tb=exc_tb)
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 488,
in _commit_impl
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
self.engine.dialect.do_commit(self.connection)
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db File
"/usr/lib64/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py",
line 2030, in do_commit
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
dbapi_connection.commit()
2014-09-15 15:40:22.574 34384 TRACE nova.servicegroup.drivers.db
OperationalError: (OperationalError) (1213, 'Deadlock found when trying to
get lock; try restarting transaction') None None

Changed in nova:
assignee: nobody → Russell Bryant (russellb)
status: New → In Progress
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/121940

Revision history for this message
Michael Still (mikal) wrote :

Probably not release critical, but we should fix it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/121940
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=61d54f2652424279af81c4084e87651044fac4bd
Submitter: Jenkins
Branch: master

commit 61d54f2652424279af81c4084e87651044fac4bd
Author: Russell Bryant <email address hidden>
Date: Tue Sep 16 18:10:47 2014 +0000

    db: Add @_retry_on_deadlock to service_update()

    Add the _retry_on_deadlock decorator to the service_update() method of
    nova's sqlalchemy API. This decorator is scattered throughout the
    sqlalchemy API for other methods that may encounter this error. The
    referenced bug report shows a trace from where it occurred on this
    method.

    Change-Id: I93b370d6457d2e85493be01a62a76404d228a6fa
    Closes-bug: #1370191

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/122146

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/122147

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/havana)

Reviewed: https://review.openstack.org/122147
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a35a9fd9122b1a2972e032b2f77a73c81e8c47bc
Submitter: Jenkins
Branch: stable/havana

commit a35a9fd9122b1a2972e032b2f77a73c81e8c47bc
Author: Russell Bryant <email address hidden>
Date: Tue Sep 16 18:10:47 2014 +0000

    db: Add @_retry_on_deadlock to service_update()

    Add the _retry_on_deadlock decorator to the service_update() method of
    nova's sqlalchemy API. This decorator is scattered throughout the
    sqlalchemy API for other methods that may encounter this error. The
    referenced bug report shows a trace from where it occurred on this
    method.

    Change-Id: I93b370d6457d2e85493be01a62a76404d228a6fa
    Closes-bug: #1370191
    (cherry picked from commit 61d54f2652424279af81c4084e87651044fac4bd)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/122146
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=5d5970a77a79a11864363d8e3e4fd66862157d19
Submitter: Jenkins
Branch: stable/icehouse

commit 5d5970a77a79a11864363d8e3e4fd66862157d19
Author: Russell Bryant <email address hidden>
Date: Tue Sep 16 18:10:47 2014 +0000

    db: Add @_retry_on_deadlock to service_update()

    Add the _retry_on_deadlock decorator to the service_update() method of
    nova's sqlalchemy API. This decorator is scattered throughout the
    sqlalchemy API for other methods that may encounter this error. The
    referenced bug report shows a trace from where it occurred on this
    method.

    Change-Id: I93b370d6457d2e85493be01a62a76404d228a6fa
    Closes-bug: #1370191
    (cherry picked from commit 61d54f2652424279af81c4084e87651044fac4bd)

Revision history for this message
Mike Bayer (zzzeek) wrote :

I've submitted https://review.openstack.org/#/c/123248/ to also add wrapping to session.commit(). the stacktrace here otherwise won't be caught using the oslo-incubator version of the code.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/124112

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/icehouse)

Change abandoned by Michael Bayer (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/123248
Reason: merged to the same change-id at https://review.openstack.org/#/c/124112/

Revision history for this message
Mike Bayer (zzzeek) wrote :

we're still open here. there's another path to commit(), patch is coming

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/124515

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/124112
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3371ad81ba7f2e8b1a9391dae3f0844d3dba916f
Submitter: Jenkins
Branch: stable/icehouse

commit 3371ad81ba7f2e8b1a9391dae3f0844d3dba916f
Author: Mike Bayer <email address hidden>
Date: Mon Sep 22 16:53:55 2014 -0400

    Add _wrap_db_error() support to Session.commit()

    This patch adds _wrap_db_error() to session.commit(),
    which has been observed to be a common point of failure for
    deadlock exceptions. In order to achieve this, the
    _wrap_db_error() decorator itself also needed to propagate an
    existing DBError, as it is the case that SQLAlchemy's
    session.commit() calls into the session.flush() method.
    Tests are added to exercise both the nesting of _wrap_db_error()
    when a flush() inside commit() raises an exception, as well
    as when commit() alone raises an exception that the error
    is wrapped as expected.

    Tests are omitted here as we are relying upon the tests
    that were added to the corresponding oslo-incubator code.

    Closes-bug: #1370191
    Change-Id: I91510a2b864f0c1b73cfae18f271e94334714dce

Thierry Carrez (ttx)
Changed in nova:
milestone: none → juno-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-rc1 → 2014.2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/124515
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=cac6472fd2ad495d22fe6fad89a9485cf025c437
Submitter: Jenkins
Branch: stable/icehouse

commit cac6472fd2ad495d22fe6fad89a9485cf025c437
Author: Mike Bayer <email address hidden>
Date: Fri Sep 26 17:27:53 2014 -0400

    Add _wrap_db_error() support to SessionTransaction.commit()

    This patch adds _wrap_db_error() to the commit()
    and rollback() methods of sqlalchemy.orm.session.SessionTransaction,
    which is the object that is dealt with when one invokes
    code of the form "with session.begin():". The context manager
    form does not invoke the commit() method on the
    SQLAlchemy Session directly, and instead calls the one
    local to the SessionTransaction.

    In order to intercept this, we must build a subclass
    of SessionTransaction with the appropriate wrapping, and
    then patch it into the object that is returned by
    Session.begin(), ensuring that it is compatible with
    _wrap_db_error().

    This whole approach is legacy; newer oslo.db versions
    intercept errors at the point at which they occur
    via engine events.

    Tests are omitted here as we are relying upon the tests
    that were added to the corresponding oslo-incubator code.

    Closes-bug: #1370191

    Change-Id: Ie0456e6daa86c99cf6fbe56ca5dfd8a618f14232

tags: added: in-stable-icehouse
Revision history for this message
Jian Wen (wenjianhn) wrote :

The above 2 patches have not been backported stable/havana.
Is Havana end of life?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.