Cinder volume failed due to a server failure “Caught error: (TransactionRollbackError) deadlock detected” when running simultaneous threads

Bug #1294855 reported by Yehia Beyh
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Invalid
Undecided
Unassigned

Bug Description

Cinder volume failed due to a server failure “Caught error: (TransactionRollbackError) deadlock detected” when running simultaneous threads.

To reproduce the problem, I created a simple test.

OpenStack cinder version:

openstack-cinder.noarch 2013.2.1-8 @isc_product_local
python-cinder.noarch 2013.2.1-8 @isc_product_local
python-cinderclient.noarch 1.0.7-6 @isc_product_local

 To reproduce the problem, I created a simple test.

# cat > create-vol-test.sh
echo "creating volumes $1 number of volumes $2 in vol type $3"
for((i=1;i<=$2;i+=1)); do cinder create --volume-type $3 --display-name vol$1-$i 1; done

# cat > test
sh create-vol-test.sh testa 30 vt1 &
sh create-vol-test.sh testb 30 vt1 &
sh create-vol-test.sh testc 30 vt1 &
sh create-vol-test.sh testd 30 vt1 &
sh create-vol-test.sh teste 30 vt1 &
sh create-vol-test.sh testf 30 vt1 &
sh create-vol-test.sh testg 30 vt1 &

# sh test

Cinder create volume Error:
ERROR: The server has either erred or is incapable of performing the requested operation. (HTTP 500) (Request-ID: req-f7574fff-ce92-4f1e-931e-9d40ea64c332)

Cinder/api.log exception:
2014-03-19 20:33:40.360 9249 ERROR cinder.api.middleware.fault [req-f7574fff-ce92-4f1e-931e-9d40ea64c332 798f1573ff0f4ee49b235a8847b30ede 32ed6e26275e44148385526adfe04450] Caught error: (TransactionRollbackError) deadlock detected
DETAIL: Process 3960 waits for ShareLock on transaction 715553; blocked by process 4042.
Process 4042 waits for ShareLock on transaction 715552; blocked by process 3960.
HINT: See server log for query details.
 'SELECT quota_usages.created_at AS quota_usages_created_at, quota_usages.updated_at AS quota_usages_updated_at, quota_usages.deleted_at AS quota_usages_deleted_at, quota_usages.deleted AS quota_usages_deleted, quota_usages.id AS quota_usages_id, quota_usages.project_id AS quota_usages_project_id, quota_usages.resource AS quota_usages_resource, quota_usages.in_use AS quota_usages_in_use, quota_usages.reserved AS quota_usages_reserved, quota_usages.until_refresh AS quota_usages_until_refresh \nFROM quota_usages \nWHERE quota_usages.deleted = %(deleted_1)s AND quota_usages.project_id = %(project_id_1)s FOR UPDATE' {'project_id_1': u'32ed6e26275e44148385526adfe04450', 'deleted_1': False}
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault Traceback (most recent call last):
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/api/middleware/fault.py", line 77, in __call__
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault return req.get_response(self.application)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/webob/request.py", line 1296, in send
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault application, catch_exc_info=False)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/webob/request.py", line 1260, in call_application
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault app_iter = application(self.environ, start_response)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/webob/dec.py", line 144, in __call__
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault return resp(environ, start_response)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/keystoneclient/middleware/auth_token.py", line 574, in __call__
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault return self.app(env, start_response)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/webob/dec.py", line 144, in __call__
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault return resp(environ, start_response)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/webob/dec.py", line 144, in __call__
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault return resp(environ, start_response)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/routes/middleware.py", line 131, in __call__
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault response = self.app(environ, start_response)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/webob/dec.py", line 144, in __call__
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault return resp(environ, start_response)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/webob/dec.py", line 130, in __call__
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault resp = self.call_func(req, *args, **self.kwargs)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/webob/dec.py", line 195, in call_func
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault return self.func(req, *args, **kwargs)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/api/openstack/wsgi.py", line 898, in __call__
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault content_type, body, accept)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/api/openstack/wsgi.py", line 946, in _process_stack
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault action_result = self.dispatch(meth, request, action_args)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/api/openstack/wsgi.py", line 1022, in dispatch
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault return method(req=request, **action_args)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/api/v1/volumes.py", line 422, in create
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault **kwargs)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/volume/api.py", line 174, in create
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault flow.run(context)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/taskflow/decorators.py", line 105, in wrapper
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault return f(self, *args, **kwargs)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/taskflow/patterns/linear_flow.py", line 232, in run
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault run_it(r)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/taskflow/patterns/linear_flow.py", line 212, in run_it
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault self.rollback(context, cause)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib64/python2.6/contextlib.py", line 23, in __exit__
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault self.gen.next()
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/taskflow/patterns/linear_flow.py", line 172, in run_it
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault result = runner(context, *args, **kwargs)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/taskflow/utils.py", line 260, in __call__
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault self.result = self.task(*args, **kwargs)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/volume/flows/create_volume/__init__.py", line 615, in __call__
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault reservations = QUOTAS.reserve(context, **reserve_opts)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/quota.py", line 760, in reserve
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault project_id=project_id)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/quota.py", line 363, in reserve
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault project_id=project_id)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/db/api.py", line 729, in quota_reserve
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault until_refresh, max_age, project_id=project_id)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/db/sqlalchemy/api.py", line 138, in wrapper
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault return f(*args, **kwargs)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/db/sqlalchemy/api.py", line 772, in quota_reserve
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault usages = _get_quota_usages(context, session, project_id)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib/python2.6/site-packages/cinder/db/sqlalchemy/api.py", line 757, in _get_quota_usages
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault with_lockmode('update').\
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2115, in all
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault return list(self)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2227, in __iter__
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault return self._execute_and_instances(context)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib64/python2.6/site-packages/sqlalchemy/orm/query.py", line 2242, in _execute_and_instances
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault result = conn.execute(querycontext.statement, self._params)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 1449, in execute
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault params)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 1584, in _execute_clauseelement
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault compiled_sql, distilled_params
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 1698, in _execute_context
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault context)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/base.py", line 1691, in _execute_context
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault context)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault File "/usr/lib64/python2.6/site-packages/sqlalchemy/engine/default.py", line 331, in do_execute
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault cursor.execute(statement, parameters)
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault DBAPIError: (TransactionRollbackError) deadlock detected
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault DETAIL: Process 3960 waits for ShareLock on transaction 715553; blocked by process 4042.
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault Process 4042 waits for ShareLock on transaction 715552; blocked by process 3960.
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault HINT: See server log for query details.
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault 'SELECT quota_usages.created_at AS quota_usages_created_at, quota_usages.updated_at AS quota_usages_updated_at, quota_usages.deleted_at AS quota_usages_deleted_at, quota_usages.deleted AS quota_usages_deleted, quota_usages.id AS quota_usages_id, quota_usages.project_id AS quota_usages_project_id, quota_usages.resource AS quota_usages_resource, quota_usages.in_use AS quota_usages_in_use, quota_usages.reserved AS quota_usages_reserved, quota_usages.until_refresh AS quota_usages_until_refresh \nFROM quota_usages \nWHERE quota_usages.deleted = %(deleted_1)s AND quota_usages.project_id = %(project_id_1)s FOR UPDATE' {'project_id_1': u'32ed6e26275e44148385526adfe04450', 'deleted_1': False}
2014-03-19 20:33:40.360 9249 TRACE cinder.api.middleware.fault

Liyingjun (liyingjun)
Changed in cinder:
status: New → Confirmed
assignee: nobody → Liyingjun (liyingjun)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/82025

Changed in cinder:
status: Confirmed → In Progress
Revision history for this message
haruka tanizawa (h-tanizawa) wrote :

What is the real status of this bugreport and the above patch ?
Does this problem still happen?

Revision history for this message
Takahiro Shida (shida) wrote :

It looks heavy impact for concurrency because of it raise deadlock.
Can you tell us the current status of this bug report? or , share the script for reproduce.

Revision history for this message
Takahiro Shida (shida) wrote :

Oops, I didn't notice about script.

echo "creating volumes $1 number of volumes $2 in vol type $3"
for((i=1;i<=$2;i+=1)); do cinder create --volume-type $3 --display-name vol$1-$i 1; done

Revision history for this message
Marios Andreou (marios-b) wrote :

was brought here from (possibly) related neutron bug 1333084 [1] - this could be postgresql specific as discussed in bug 1270725 [2] and fixed in oslo.db [3]

[1] https://bugs.launchpad.net/neutron/+bug/1333084
[2] https://bugs.launchpad.net/nova/+bug/1270725
[3] https://review.openstack.org/#/c/92001/2

Revision history for this message
SamP (sampath-priyankara) wrote :

This is not reproducible in stable juno.
cinder db access mechanism has changed since then.
Didnt have time to look how we got there, sorry

In openstack cinder 2013.2.1, we only had this.
@require_context
def quota_reserve(context, resources, quotas, deltas, expire,
                  until_refresh, max_age, project_id=None):

In cinder stable/juno there is a retry mechanism
@require_context
@_retry_on_deadlock
def quota_reserve(context, resources, quotas, deltas, expire,
                  until_refresh, max_age, project_id=None):
   .
   .
   .
def _retry_on_deadlock(f):
    """Decorator to retry a DB API call if Deadlock was received."""
    @functools.wraps(f)
    def wrapped(*args, **kwargs):
        while True:
            try:
                return f(*args, **kwargs)
            except db_exc.DBDeadlock:
                LOG.warn(_("Deadlock detected when running "
                           "'%(func_name)s': Retrying..."),
                         dict(func_name=f.__name__))
                # Retry!
                time.sleep(0.5)
                continue
    functools.update_wrapper(wrapped, f)
    return wrapped

I did a simple test as it in reproduce script in top of the report.
create 200 NFS spared volumes without problem.
cinder logs are also attached.

Revision history for this message
SamP (sampath-priyankara) wrote :

Forget the paste,
here it is, http://paste.openstack.org/show/163917/

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Automatically unassigning due to inactivity.

Changed in cinder:
assignee: Liyingjun (liyingjun) → nobody
status: In Progress → Triaged
Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Unable to reproduce with current code. Please reopen if that's just on my machine and it's found to still be an issue.

Changed in cinder:
status: Triaged → Invalid
Revision history for this message
LIU Yulong (dragon889) wrote :

Run rally test got following log:
http://paste.openstack.org/show/567037/

Rally input:

{
    "CinderVolumes.create_and_delete_volume": [
        {
            "args": {
                "size": 100
            },
            "runner": {
                "type": "constant",
                "times": 2000,
                "concurrency": 100
            },
            "context": {
                "users": {
                    "tenants": 100,
                    "users_per_tenant": 2
                },
                "quotas": {
                    "cinder": {
                        "gigabytes": -1,
                        "volumes": -1,
                        "snapshots": -1,
                    }
                }
            }
        },
    ]
}

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.