SQL deadlock in quota code

Bug #1026709 reported by Johannes Erdfelt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Johannes Erdfelt

Bug Description

The quota reservations code acquires SQL locks inconsistently which can result in a deadlock.

Because of a bug in MySQLdb it can result in this exception:

[...]
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/api/openstack/compute/servers.py", line 707, in create
2012-07-16 22:16:03 TRACE nova.api.openstack scheduler_hints=scheduler_hints)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/compute/api.py", line 780, in create
2012-07-16 22:16:03 TRACE nova.api.openstack scheduler_hints=scheduler_hints)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/compute/api.py", line 372, in _create_instance
2012-07-16 22:16:03 TRACE nova.api.openstack context, instance_type, min_count, max_count)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/compute/api.py", line 178, in _check_num_instances_quota
2012-07-16 22:16:03 TRACE nova.api.openstack cores=req_cores, ram=req_ram)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/quota.py", line 695, in reserve
2012-07-16 22:16:03 TRACE nova.api.openstack expire=expire)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/quota.py", line 338, in reserve
2012-07-16 22:16:03 TRACE nova.api.openstack FLAGS.until_refresh, FLAGS.max_age)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/db/api.py", line 1050, in quota_reserve
2012-07-16 22:16:03 TRACE nova.api.openstack until_refresh, max_age)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/db/sqlalchemy/api.py", line 132, in wrapper
2012-07-16 22:16:03 TRACE nova.api.openstack return f(*args, **kwargs)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/db/sqlalchemy/api.py", line 2584, in quota_reserve
2012-07-16 22:16:03 TRACE nova.api.openstack usages = _get_quota_usages(context, session, deltas.keys())
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/nova/db/sqlalchemy/api.py", line 2572, in _get_quota_usages
2012-07-16 22:16:03 TRACE nova.api.openstack with_lockmode('update').\
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/sqlalchemy/orm/query.py", line 1729, in all
2012-07-16 22:16:03 TRACE nova.api.openstack return list(self)
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/sqlalchemy/orm/query.py", line 1953, in instances
2012-07-16 22:16:03 TRACE nova.api.openstack fetch = cursor.fetchall()
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/sqlalchemy/engine/base.py", line 2979, in fetchall
2012-07-16 22:16:03 TRACE nova.api.openstack l = self.process_rows(self._fetchall_impl())
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/sqlalchemy/engine/base.py", line 2948, in _fetchall_impl
2012-07-16 22:16:03 TRACE nova.api.openstack self._non_result()
2012-07-16 22:16:03 TRACE nova.api.openstack File "/usr/lib/python2.6/dist-packages/sqlalchemy/engine/base.py", line 2953, in _non_result
2012-07-16 22:16:03 TRACE nova.api.openstack "This result object does not return rows. "
2012-07-16 22:16:03 TRACE nova.api.openstack ResourceClosedError: This result object does not return rows. It has been closed automatically.

However, the real bug is a deadlock (hidden by the aforementioned MySQLdb bug), which should show as an OperationError similar to this:

"Deadlock found when trying to get lock; try restarting transaction"

The problem is a result of the quota_reserve code acquiring the quota_usages lock before the reservations lock, whereas reservation_commit and reservation_rollback acquire the locks in reverse order.

Changed in nova:
assignee: nobody → Johannes Erdfelt (johannes.erdfelt)
status: New → In Progress
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/10026

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/10026
Committed: http://github.com/openstack/nova/commit/b244f6fde2e4b85a01a8e0a340d12a1fa9073236
Submitter: Jenkins
Branch: master

commit b244f6fde2e4b85a01a8e0a340d12a1fa9073236
Author: Johannes Erdfelt <email address hidden>
Date: Thu Jul 19 17:11:24 2012 +0000

    Fix SQL deadlock in quota reservations

    Fixes bug 1026709

    The code in quota_reserve acquires SQL locks in a different order than
    the code in reservation_commit/reservation_rollback. This can result in
    an SQL deadlock under heavy load. Due to an (unrelated) bug in
    SQLAlchemy, this can result in this exception:

    ResourceClosedError: This result object does not return rows. It has
    been closed automatically.

    This patch reorganizes the code to always fetch (and thusly lock) the
    quota_usages table before the reservations table.

    Change-Id: Ia364496a996870d754094915ea0501ff19052037

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → folsom-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: folsom-3 → 2012.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.