Query Deadlock when creating >200 servers at once in sqlalchemy

Bug #1283987 reported by wingwj
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Eugeniya Kudryashova
oslo.db
Fix Released
Medium
wingwj

Bug Description

Query Deadlock when creating >200 servers at once in sqlalchemy.

--------

This bug occurred when I test this bug:
https://bugs.launchpad.net/nova/+bug/1270725

The original info is logged here:
http://paste.openstack.org/show/61534/

--------------

After checking the error-log, we can notice that the deadlock function is 'all()' in sqlalchemy framework.

Previously, we use '@retry_on_dead_lock' function to retry requests when deadlock occurs.

But it's only available for session deadlock(query/flush/execute). It doesn't cover some 'Query' actions in sqlalchemy.

So, we need to add the same protction for 'all()' in sqlalchemy.

wingwj (wingwj)
Changed in nova:
assignee: nobody → wingwj (wingwj)
Changed in oslo:
assignee: nobody → wingwj (wingwj)
Revision history for this message
wingwj (wingwj) wrote :

One question left:

Should we need to protect more functions in 'Query' used in sqlalchemy?
Like count(), join(), delete()?

Now, only 'all()' will occur deadlock during my test.

tags: added: havana-backport-potential
tags: added: deadlock
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo-incubator (master)

Fix proposed to branch: master
Review: https://review.openstack.org/75802

Changed in oslo:
status: New → In Progress
Revision history for this message
Vish Ishaya (vishvananda) wrote :

Note i got this same error trying to create a lot of security groups, testing fix

Changed in nova:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Vish Ishaya (vishvananda) wrote :
Download full text (5.2 KiB)

here is a traceback not using postgres:

2014-04-16 17:37:00.180 4815 TRACE nova.api.openstack
2014-04-16 17:37:31.052 4810 TRACE nova.quota Traceback (most recent call last):
2014-04-16 17:37:31.052 4810 TRACE nova.quota File "/usr/lib/python2.7/dist-packages/nova/quota.py", line 1289, in commit
2014-04-16 17:37:31.052 4810 TRACE nova.quota user_id=user_id)
2014-04-16 17:37:31.052 4810 TRACE nova.quota File "/usr/lib/python2.7/dist-packages/nova/quota.py", line 536, in commit
2014-04-16 17:37:31.052 4810 TRACE nova.quota user_id=user_id)
2014-04-16 17:37:31.052 4810 TRACE nova.quota File "/usr/lib/python2.7/dist-packages/nova/db/api.py", line 1112, in reservation_commit
2014-04-16 17:37:31.052 4810 TRACE nova.quota user_id=user_id)
2014-04-16 17:37:31.052 4810 TRACE nova.quota File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 164, in wrapper
2014-04-16 17:37:31.052 4810 TRACE nova.quota return f(*args, **kwargs)
2014-04-16 17:37:31.052 4810 TRACE nova.quota File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 202, in wrapped
2014-04-16 17:37:31.052 4810 TRACE nova.quota return f(*args, **kwargs)
2014-04-16 17:37:31.052 4810 TRACE nova.quota File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 3291, in reservation_commit
2014-04-16 17:37:31.052 4810 TRACE nova.quota usages = _get_user_quota_usages(context, session, project_id, user_id)
2014-04-16 17:37:31.052 4810 TRACE nova.quota File "/usr/lib/python2.7/dist-packages/nova/db/sqlalchemy/api.py", line 3038, in _get_user_quota_usages
2014-04-16 17:37:31.052 4810 TRACE nova.quota with_lockmode('update').\
2014-04-16 17:37:31.052 4810 TRACE nova.quota File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2237, in all
2014-04-16 17:37:31.052 4810 TRACE nova.quota return list(self)
2014-04-16 17:37:31.052 4810 TRACE nova.quota File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2349, in __iter__
2014-04-16 17:37:31.052 4810 TRACE nova.quota return self._execute_and_instances(context)
2014-04-16 17:37:31.052 4810 TRACE nova.quota File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2364, in _execute_and_instances
2014-04-16 17:37:31.052 4810 TRACE nova.quota result = conn.execute(querycontext.statement, self._params)
2014-04-16 17:37:31.052 4810 TRACE nova.quota File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 662, in execute
2014-04-16 17:37:31.052 4810 TRACE nova.quota params)
2014-04-16 17:37:31.052 4810 TRACE nova.quota File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 761, in _execute_clauseelement
2014-04-16 17:37:31.052 4810 TRACE nova.quota compiled_sql, distilled_params
2014-04-16 17:37:31.052 4810 TRACE nova.quota File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 874, in _execute_context
2014-04-16 17:37:31.052 4810 TRACE nova.quota context)
2014-04-16 17:37:31.052 4810 TRACE nova.quota File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1024, in _handle_dbapi_exception
2014-04-16 17:37:31.052 4810 TRACE nova.quota ...

Read more...

Revision history for this message
Vish Ishaya (vishvananda) wrote :

Related bug for the deadlock in nova: https://bugs.launchpad.net/nova/+bug/1308715

Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote : Fix proposed to oslo.db (master)

Fix proposed to branch: master
Review: https://review.openstack.org/91995

Revision history for this message
Openstack Gerrit (openstack-gerrit) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/92002

Changed in nova:
status: Triaged → In Progress
tags: added: db
Changed in oslo:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.db (master)

Change abandoned by Roman Podoliaka (<email address hidden>) on branch: master
Review: https://review.openstack.org/92002
Reason: no longer needed due to https://review.openstack.org/#/c/105891/

Revision history for this message
Jay Pipes (jaypipes) wrote :

wingwj,

Pavel Kholkin has been working on refactoring the quota DB code to remove the use of SELECT FOR UPDATE (which triggers this behaviour behind the scenes in both MySQL and PostgreSQL).

If you are not currently working on this bug, can I transfer ownership of the bug (and bug 1270725) to Pavel?

Please let me know at your earliest!
-jay

Revision history for this message
Jay Pipes (jaypipes) wrote :
affects: oslo-incubator → oslo.db
Changed in nova:
assignee: wingwj (wingwj) → Eugeniya Kudryashova (ekudryashova)
Changed in oslo.db:
status: In Progress → Fix Released
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Looks like this will be fixed when nova switches to use oslo.db:
https://review.openstack.org/#/c/101901/

Which is in the gate queue as of this moment :)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/101901
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1b83b2f11054ddd3f72fe75c5cef496060afcb3a
Submitter: Jenkins
Branch: master

commit 1b83b2f11054ddd3f72fe75c5cef496060afcb3a
Author: Andrey Kurilin <email address hidden>
Date: Sat Jun 21 13:56:42 2014 +0300

    Move to oslo.db

    Replace common oslo code nova.openstack.common.db by usage
    of oslo.db library and remove common code.

    Replaced catching of raw sqlalchemy exceptions by catches
    of oslo.db exceptions(such as DBError, DBDuplicateEntry, etc).

    Co-Authored-By: Eugeniya Kudryashova <email address hidden>

    Closes-Bug: #1364986
    Closes-Bug: #1353131
    Closes-Bug: #1283987
    Closes-Bug: #1274523
    Change-Id: I0649539e071b2318ec85ed5d70259c949408e64b

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → juno-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-rc1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.