OpenStack Compute (nova)

Database deadlocks not handled

Bug #1090016 reported by David Busby on 2012-12-13

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	Russell Bryant	OpenStack Compute (nova) 2013.1 "grizzly"

Bug Description

I have a setup where each compute node addresses the same load balanced mySQL cluster (Percona XtraDB this uses Galera for replication), when attempting to launch many instances via horizon the following will occur.

---
2012-12-13 16:16:09 TRACE nova.rpc.amqp DBError: (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'UPDATE floating_ips SET updated_at=%s, fixed_ip_id=%s, host=%s WHERE floating_ips.id = %s' (datetime.datetime(2012, 12, 13, 16, 16, 8, 976213), 782L, 'nova_node', 11L)
2012-12-13 16:16:09 TRACE nova.rpc.amqp
2012-12-13 16:16:09 ERROR nova.rpc.amqp [req-9ffd60f6-b8b4-453e-b689-807c02112c27 aa861f92dcb840e28568980e737daff8 a3f95ee1d0d44d8daeac5e27d403c2f1] Returning exception (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'UPDATE floating_ips SET updated_at=%s, fixed_ip_id=%s, host=%s WHERE floating_ips.id = %s' (datetime.datetime(2012, 12, 13, 16, 16, 8, 976213), 782L, 'nova_node', 11L) to caller
2012-12-13 16:16:09 ERROR nova.rpc.amqp [req-9ffd60f6-b8b4-453e-b689-807c02112c27 aa861f92dcb840e28568980e737daff8 a3f95ee1d0d44d8daeac5e27d403c2f1] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site-packages/nova/rpc/amqp.py", line 253, in _process_data\n rval = node_func(context=ctxt, **node_args)\n', ' File "/usr/lib/python2.7/site-packages/nova/network/manager.py", line 479, in _associate_floating_ip\n self.host)\n', ' File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 284, in floating_ip_fixed_ip_associate\n host)\n', ' File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 120, in wrapper\n return f(*args, **kwargs)\n', ' File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 679, in floating_ip_fixed_ip_associate\n floating_ip_ref.save(session=session)\n', ' File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/models.py", line 58, in save\n session.flush()\n', ' File "/usr/lib/python2.7/site-packages/nova/exception.py", line 95, in _wrap\n raise DBError(e)\n', "DBError: (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'UPDATE floating_ips SET updated_at=%s, fixed_ip_id=%s, host=%s WHERE floating_ips.id = %s' (datetime.datetime(2012, 12, 13, 16, 16, 8, 976213), 782L, 'nova_node', 11L)\n"]
---

In the case of a Deadlock could the instance not be "queued" for a subsequent retry ? as apposed to dropping strait into an error state.

Tags:

aeva black (tenbrae) on 2013-02-07

tags:

added: db

Revision history for this message

Russell Bryant (russellb) wrote on 2013-02-26:

A related recent commit:

commit e53e22271ed7e7a9b919d817a8eb50a1ecce16f8
Author: Chris Behrens <email address hidden>
Date: Tue Feb 19 01:04:37 2013 +0000

Retry bw_usage_update() on innodb Deadlock

Adds a new decorator _retry_on_deadlock() to sqlalchemy api. This patch
makes bw_usage_update() use it.

Fixes bug 1129622

Change-Id: I0293c62d2dd5ac036445bc639cabbd05ba016e83

Changed in nova:
status:	New → Confirmed
importance:	Undecided → High

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-02-26: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/22955

Changed in nova:
assignee:	nobody → Russell Bryant (russellb)
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-02-27: Fix merged to nova (master)

Reviewed: https://review.openstack.org/22955
Committed: http://github.com/openstack/nova/commit/6e9a2f42616859963034bab6c21c793f05a5ba8d
Submitter: Jenkins
Branch: master

commit 6e9a2f42616859963034bab6c21c793f05a5ba8d
Author: Russell Bryant <email address hidden>
Date: Tue Feb 26 01:47:42 2013 -0500

Retry floating_ip_fixed_ip_associate on deadlock.

    Update the floating_ip_fixed_ip_associate method of the sqlalchemy db
    API to retry if it fails because of a deadlock. The decorator that
    handles this was introduced in e53e22271ed7e7a9b919d817a8eb50a1ecce16f8.
    The related bug report shows that this method could benefit from the use
    of this decorator.

Fix bug 1090016.

Change-Id: I0159470e6eb2f8017bcb3e659e7b119194dda920

Changed in nova:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2013-03-20

Changed in nova:
milestone:	none → grizzly-rc1
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2013-04-04

Changed in nova:
milestone:	grizzly-rc1 → 2013.1

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.