SQL deadlock in quota code

Bug #1026709 reported by Johannes Erdfelt on 2012-07-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Undecided
Johannes Erdfelt

Bug Description

The quota reservations code acquires SQL locks inconsistently which can result in a deadlock.

Because of a bug in MySQLdb it can result in this exception:

[...]
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/api/openstack/compute/servers.py", line 707, in create
2012-07-16 22:16:03 TRACE nova.api.openstack scheduler_hints=scheduler_hints)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/compute/api.py", line 780, in create
2012-07-16 22:16:03 TRACE nova.api.openstack scheduler_hints=scheduler_hints)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/compute/api.py", line 372, in _create_instance
2012-07-16 22:16:03 TRACE nova.api.openstack context, instance_type, min_count, max_count)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/compute/api.py", line 178, in _check_num_instances_quota
2012-07-16 22:16:03 TRACE nova.api.openstack cores=req_cores, ram=req_ram)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/quota.py", line 695, in reserve
2012-07-16 22:16:03 TRACE nova.api.openstack expire=expire)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/quota.py", line 338, in reserve
2012-07-16 22:16:03 TRACE nova.api.openstack FLAGS.until_refresh, FLAGS.max_age)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/db/api.py", line 1050, in quota_reserve
2012-07-16 22:16:03 TRACE nova.api.openstack until_refresh, max_age)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/db/sqlalchemy/api.py", line 132, in wrapper
2012-07-16 22:16:03 TRACE nova.api.openstack return f(*args, **kwargs)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/db/sqlalchemy/api.py", line 2584, in quota_reserve
2012-07-16 22:16:03 TRACE nova.api.openstack usages = _get_quota_usages(context, session, deltas.keys())
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/db/sqlalchemy/api.py", line 2572, in _get_quota_usages
2012-07-16 22:16:03 TRACE nova.api.openstack with_lockmode('update').\
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/sqlalchemy/orm/query.py", line 1729, in all
2012-07-16 22:16:03 TRACE nova.api.openstack return list(self)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/sqlalchemy/orm/query.py", line 1953, in instances
2012-07-16 22:16:03 TRACE nova.api.openstack fetch = cursor.fetchall()
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/sqlalchemy/engine/base.py", line 2979, in fetchall
2012-07-16 22:16:03 TRACE nova.api.openstack l = self.process_rows(self._fetchall_impl())
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/sqlalchemy/engine/base.py", line 2948, in _fetchall_impl
2012-07-16 22:16:03 TRACE nova.api.openstack self._non_result()
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/sqlalchemy/engine/base.py", line 2953, in _non_result
2012-07-16 22:16:03 TRACE nova.api.openstack "This result object does not return rows. "
2012-07-16 22:16:03 TRACE nova.api.openstack ResourceClosedError: This result object does not return rows. It has been closed automatically.

However, the real bug is a deadlock (hidden by the aforementioned MySQLdb bug), which should show as an OperationError similar to this:

"Deadlock found when trying to get lock; try restarting transaction"

The problem is a result of the quota_reserve code acquiring the quota_usages lock before the reservations lock, whereas reservation_commit and reservation_rollback acquire the locks in reverse order.

Changed in nova:
assignee: nobody → Johannes Erdfelt (johannes.erdfelt)
status: New → In Progress
description: updated
description: updated

Reviewed: https://review.openstack.org/10026
Committed: http://github.com/openstack/nova/commit/b244f6fde2e4b85a01a8e0a340d12a1fa9073236
Submitter: Jenkins
Branch: master

commit b244f6fde2e4b85a01a8e0a340d12a1fa9073236
Author: Johannes Erdfelt <email address hidden>
Date: Thu Jul 19 17:11:24 2012 +0000

    Fix SQL deadlock in quota reservations

    Fixes bug 1026709

    The code in quota_reserve acquires SQL locks in a different order than
    the code in reservation_commit/reservation_rollback. This can result in
    an SQL deadlock under heavy load. Due to an (unrelated) bug in
    SQLAlchemy, this can result in this exception:

    ResourceClosedError: This result object does not return rows. It has
    been closed automatically.

    This patch reorganizes the code to always fetch (and thusly lock) the
    quota_usages table before the reservations table.

    Change-Id: Ia364496a996870d754094915ea0501ff19052037

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2012-08-16
Changed in nova:
milestone: none → folsom-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2012-09-27
Changed in nova:
milestone: folsom-3 → 2012.2
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers