deadlock in scheduler expire reservation periodic task

Bug #1350466 reported by Joe Gordon
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
John Griffith
Icehouse
Fix Released
High
Vish Ishaya
OpenStack Compute (nova)
Fix Released
High
Vish Ishaya
Icehouse
Fix Released
High
Vish Ishaya

Bug Description

http://logs.openstack.org/54/105554/4/check/gate-tempest-dsvm-neutron-large-ops/45501af/logs/screen-n-sch.txt.gz?level=TRACE#_2014-07-30_16_26_20_158

2014-07-30 16:26:20.158 17209 ERROR nova.openstack.common.periodic_task [-] Error during SchedulerManager._expire_reservations: (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'UPDATE reservations SET updated_at=updated_at, deleted_at=%s, deleted=id WHERE reservations.deleted = %s AND reservations.expire < %s' (datetime.datetime(2014, 7, 30, 16, 26, 20, 152098), 0, datetime.datetime(2014, 7, 30, 16, 26, 20, 149665))
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task Traceback (most recent call last):
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task File "/opt/stack/new/nova/nova/openstack/common/periodic_task.py", line 198, in run_periodic_tasks
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task task(self, context)
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task File "/opt/stack/new/nova/nova/scheduler/manager.py", line 157, in _expire_reservations
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task QUOTAS.expire(context)
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task File "/opt/stack/new/nova/nova/quota.py", line 1401, in expire
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task self._driver.expire(context)
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task File "/opt/stack/new/nova/nova/quota.py", line 651, in expire
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task db.reservation_expire(context)
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task File "/opt/stack/new/nova/nova/db/api.py", line 1173, in reservation_expire
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task return IMPL.reservation_expire(context)
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 149, in wrapper
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task return f(*args, **kwargs)
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task File "/opt/stack/new/nova/nova/db/sqlalchemy/api.py", line 3394, in reservation_expire
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task reservation_query.soft_delete(synchronize_session=False)
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task File "/opt/stack/new/nova/nova/openstack/common/db/sqlalchemy/session.py", line 694, in soft_delete
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task synchronize_session=synchronize_session)
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2690, in update
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task update_op.exec_()
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 816, in exec_
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task self._do_exec()
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/persistence.py", line 913, in _do_exec
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task update_stmt, params=self.query._params)
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task File "/opt/stack/new/nova/nova/openstack/common/db/sqlalchemy/session.py", line 444, in _wrap
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task _raise_if_deadlock_error(e, self.bind.dialect.name)
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task File "/opt/stack/new/nova/nova/openstack/common/db/sqlalchemy/session.py", line 427, in _raise_if_deadlock_error
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task raise exception.DBDeadlock(operational_error)
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task DBDeadlock: (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'UPDATE reservations SET updated_at=updated_at, deleted_at=%s, deleted=id WHERE reservations.deleted = %s AND reservations.expire < %s' (datetime.datetime(2014, 7, 30, 16, 26, 20, 152098), 0, datetime.datetime(2014, 7, 30, 16, 26, 20, 149665))
2014-07-30 16:26:20.158 17209 TRACE nova.openstack.common.periodic_task

message:"nova.openstack.common.periodic_task" AND message:"Deadlock found" AND NOT tags:"console" AND tags:"screen-n-sch.txt"

Revision history for this message
Sean Dague (sdague) wrote :

Something that did change recently is indexes got added to some of these columns, which might be making the deletes / updates take longer... thus deadlocking.

Revision history for this message
Joe Gordon (jogo) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Vish Ishaya (vishvananda) wrote :

So my read of this is that the old slow version of the code was less likely to hit a race with delete but now that it is optimised it goes quiclkly enought to race with reservation deleted. I think we need to add a retry_on_deadlock decorator to the expiration

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/111084

Changed in nova:
assignee: nobody → Vish Ishaya (vishvananda)
status: New → In Progress
Matt Riedemann (mriedem)
Changed in nova:
importance: Undecided → High
tags: added: db
Revision history for this message
Matt Riedemann (mriedem) wrote :

Also showing up in the conductor logs:

http://logs.openstack.org/26/109626/1/gate/gate-tempest-dsvm-full/6b1b6c2/logs/screen-n-cond.txt.gz#_2014-08-01_09_38_05_447

2014-08-01 09:38:05.447 25490 TRACE nova.quota OperationalError: (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'SELECT reservations.created_at AS reservations_created_at, reservations.updated_at AS reservations_updated_at, reservations.deleted_at AS reservations_deleted_at, reservations.deleted AS reservations_deleted, reservations.id AS reservations_id, reservations.uuid AS reservations_uuid, reservations.usage_id AS reservations_usage_id, reservations.project_id AS reservations_project_id, reservations.user_id AS reservations_user_id, reservations.resource AS reservations_resource, reservations.delta AS reservations_delta, reservations.expire AS reservations_expire \nFROM reservations \nWHERE reservations.deleted = %s AND reservations.uuid IN (%s, %s, %s) FOR UPDATE' (0, '23e6e41f-4267-484b-91d3-3db30f01df8c', '89015f24-fc2d-4ddb-8e6e-a5bfb9bfc413', '416317c4-8b0b-460e-8ead-48f1496db695')
2014-08-01 09:38:05.447 25490 TRACE nova.quota

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/111084
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3619c1914e9925e18c062bc11bc3e708137f8322
Submitter: Jenkins
Branch: master

commit 3619c1914e9925e18c062bc11bc3e708137f8322
Author: Vishvananda Ishaya <email address hidden>
Date: Thu Jul 31 14:00:54 2014 -0700

    Add a retry_on_deadlock to reservations_expire

    Now that we have an index for the reservations_expire query, we
    have uncovered a race with quota usage updates due to the foreign
    key on the table, so just retry if there is a deadlock. This is
    a fix to unblock the gate while we try to remove the deadlocks
    completely.

    Change-Id: I1993c3bd1f81facf1d98a4e29f9e8df4858a7d66
    Partial-bug: #1350466

Changed in cinder:
status: New → Triaged
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/111410

Changed in cinder:
assignee: nobody → John Griffith (john-griffith)
status: Triaged → In Progress
Jay Bryant (jsbryant)
Changed in cinder:
milestone: none → juno-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/112109

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/112418

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/111410
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=201adf31a57ea4d456ef5daf1eca2ed3b64fca85
Submitter: Jenkins
Branch: master

commit 201adf31a57ea4d456ef5daf1eca2ed3b64fca85
Author: John Griffith <email address hidden>
Date: Fri Aug 1 17:10:55 2014 -0600

    Add retry_on_deadlock to db update methods

    There's some known races that have been addressed in
    Nova via a retry_on_deadlock deocrator. Some known
    cases in Cinder exist as well when dealing with updates
    to quotas and reservations_expire.

    This patch introduces the decorator to Cinder and hits
    the methods known to be susceptible to deadlock as well
    as some others that seem as though they'd fall into the
    same category.

    Leverages work that's already been done in Nova

    Co-Authored-By: Vish Ishaya <email address hidden>

    Change-Id: Ic807f4f8a333e4d5477f86e41eea40191637087c
    Partial-bug: #1350466

Changed in cinder:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/112109
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d8b9ba5ef77ec4af7a8b67ef380d5cd2947de51f
Submitter: Jenkins
Branch: stable/icehouse

commit d8b9ba5ef77ec4af7a8b67ef380d5cd2947de51f
Author: Vishvananda Ishaya <email address hidden>
Date: Thu Jul 31 14:00:54 2014 -0700

    Add a retry_on_deadlock to reservations_expire

    Now that we have an index for the reservations_expire query, we
    have uncovered a race with quota usage updates due to the foreign
    key on the table, so just retry if there is a deadlock. This is
    a fix to unblock the gate while we try to remove the deadlocks
    completely.

    Change-Id: I1993c3bd1f81facf1d98a4e29f9e8df4858a7d66
    Partial-bug: #1350466
    (cherry picked from commit 3619c1914e9925e18c062bc11bc3e708137f8322)

tags: added: in-stable-icehouse
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

https://review.openstack.org/#/c/111084/ for Nova is already merged. marking accordingly

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/icehouse)

Reviewed: https://review.openstack.org/112418
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=69ade1064c15f63b66906e97e95122926d6e4775
Submitter: Jenkins
Branch: stable/icehouse

commit 69ade1064c15f63b66906e97e95122926d6e4775
Author: John Griffith <email address hidden>
Date: Fri Aug 1 17:10:55 2014 -0600

    Add retry_on_deadlock to db update methods

    There's some known races that have been addressed in
    Nova via a retry_on_deadlock deocrator. Some known
    cases in Cinder exist as well when dealing with updates
    to quotas and reservations_expire.

    This patch introduces the decorator to Cinder and hits
    the methods known to be susceptible to deadlock as well
    as some others that seem as though they'd fall into the
    same category.

    Leverages work that's already been done in Nova

    Co-Authored-By: Vish Ishaya <email address hidden>

    Change-Id: Ic807f4f8a333e4d5477f86e41eea40191637087c
    Partial-bug: #1350466
    (cherry picked from commit 201adf31a57ea4d456ef5daf1eca2ed3b64fca85)

Thierry Carrez (ttx)
Changed in nova:
milestone: none → juno-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-rc1 → 2014.2
Thierry Carrez (ttx)
Changed in cinder:
milestone: juno-3 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.