Database deadlocks not handled

Bug #1090016 reported by David Busby
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Russell Bryant

Bug Description

I have a setup where each compute node addresses the same load balanced mySQL cluster (Percona XtraDB this uses Galera for replication), when attempting to launch many instances via horizon the following will occur.

---
2012-12-13 16:16:09 TRACE nova.rpc.amqp DBError: (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'UPDATE floating_ips SET updated_at=%s, fixed_ip_id=%s, host=%s WHERE floating_ips.id = %s' (datetime.datetime(2012, 12, 13, 16, 16, 8, 976213), 782L, 'nova_node', 11L)
2012-12-13 16:16:09 TRACE nova.rpc.amqp
2012-12-13 16:16:09 ERROR nova.rpc.amqp [req-9ffd60f6-b8b4-453e-b689-807c02112c27 aa861f92dcb840e28568980e737daff8 a3f95ee1d0d44d8daeac5e27d403c2f1] Returning exception (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'UPDATE floating_ips SET updated_at=%s, fixed_ip_id=%s, host=%s WHERE floating_ips.id = %s' (datetime.datetime(2012, 12, 13, 16, 16, 8, 976213), 782L, 'nova_node', 11L) to caller
2012-12-13 16:16:09 ERROR nova.rpc.amqp [req-9ffd60f6-b8b4-453e-b689-807c02112c27 aa861f92dcb840e28568980e737daff8 a3f95ee1d0d44d8daeac5e27d403c2f1] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site-packages/nova/rpc/amqp.py", line 253, in _process_data\n rval = node_func(context=ctxt, **node_args)\n', ' File "/usr/lib/python2.7/site-packages/nova/network/manager.py", line 479, in _associate_floating_ip\n self.host)\n', ' File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 284, in floating_ip_fixed_ip_associate\n host)\n', ' File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 120, in wrapper\n return f(*args, **kwargs)\n', ' File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 679, in floating_ip_fixed_ip_associate\n floating_ip_ref.save(session=session)\n', ' File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/models.py", line 58, in save\n session.flush()\n', ' File "/usr/lib/python2.7/site-packages/nova/exception.py", line 95, in _wrap\n raise DBError(e)\n', "DBError: (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'UPDATE floating_ips SET updated_at=%s, fixed_ip_id=%s, host=%s WHERE floating_ips.id = %s' (datetime.datetime(2012, 12, 13, 16, 16, 8, 976213), 782L, 'nova_node', 11L)\n"]
---

In the case of a Deadlock could the instance not be "queued" for a subsequent retry ? as apposed to dropping strait into an error state.

Tags: db
aeva black (tenbrae)
tags: added: db
Revision history for this message
Russell Bryant (russellb) wrote :

A related recent commit:

commit e53e22271ed7e7a9b919d817a8eb50a1ecce16f8
Author: Chris Behrens <email address hidden>
Date: Tue Feb 19 01:04:37 2013 +0000

    Retry bw_usage_update() on innodb Deadlock

    Adds a new decorator _retry_on_deadlock() to sqlalchemy api. This patch
    makes bw_usage_update() use it.

    Fixes bug 1129622

    Change-Id: I0293c62d2dd5ac036445bc639cabbd05ba016e83

Changed in nova:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/22955

Changed in nova:
assignee: nobody → Russell Bryant (russellb)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/22955
Committed: http://github.com/openstack/nova/commit/6e9a2f42616859963034bab6c21c793f05a5ba8d
Submitter: Jenkins
Branch: master

commit 6e9a2f42616859963034bab6c21c793f05a5ba8d
Author: Russell Bryant <email address hidden>
Date: Tue Feb 26 01:47:42 2013 -0500

    Retry floating_ip_fixed_ip_associate on deadlock.

    Update the floating_ip_fixed_ip_associate method of the sqlalchemy db
    API to retry if it fails because of a deadlock. The decorator that
    handles this was introduced in e53e22271ed7e7a9b919d817a8eb50a1ecce16f8.
    The related bug report shows that this method could benefit from the use
    of this decorator.

    Fix bug 1090016.

    Change-Id: I0159470e6eb2f8017bcb3e659e7b119194dda920

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → grizzly-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: grizzly-rc1 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.