I have a setup where each compute node addresses the same load balanced mySQL cluster (Percona XtraDB this uses Galera for replication), when attempting to launch many instances via horizon the following will occur.
---
2012-12-13 16:16:09 TRACE nova.rpc.amqp DBError: (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'UPDATE floating_ips SET updated_at=%s, fixed_ip_id=%s, host=%s WHERE floating_ips.id = %s' (datetime.datetime(2012, 12, 13, 16, 16, 8, 976213), 782L, 'nova_node', 11L)
2012-12-13 16:16:09 TRACE nova.rpc.amqp
2012-12-13 16:16:09 ERROR nova.rpc.amqp [req-9ffd60f6-b8b4-453e-b689-807c02112c27 aa861f92dcb840e28568980e737daff8 a3f95ee1d0d44d8daeac5e27d403c2f1] Returning exception (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'UPDATE floating_ips SET updated_at=%s, fixed_ip_id=%s, host=%s WHERE floating_ips.id = %s' (datetime.datetime(2012, 12, 13, 16, 16, 8, 976213), 782L, 'nova_node', 11L) to caller
2012-12-13 16:16:09 ERROR nova.rpc.amqp [req-9ffd60f6-b8b4-453e-b689-807c02112c27 aa861f92dcb840e28568980e737daff8 a3f95ee1d0d44d8daeac5e27d403c2f1] ['Traceback (most recent call last):\n', ' File "/usr/lib/python2.7/site-packages/nova/rpc/amqp.py", line 253, in _process_data\n rval = node_func(context=ctxt, **node_args)\n', ' File "/usr/lib/python2.7/site-packages/nova/network/manager.py", line 479, in _associate_floating_ip\n self.host)\n', ' File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 284, in floating_ip_fixed_ip_associate\n host)\n', ' File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 120, in wrapper\n return f(*args, **kwargs)\n', ' File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 679, in floating_ip_fixed_ip_associate\n floating_ip_ref.save(session=session)\n', ' File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/models.py", line 58, in save\n session.flush()\n', ' File "/usr/lib/python2.7/site-packages/nova/exception.py", line 95, in _wrap\n raise DBError(e)\n', "DBError: (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'UPDATE floating_ips SET updated_at=%s, fixed_ip_id=%s, host=%s WHERE floating_ips.id = %s' (datetime.datetime(2012, 12, 13, 16, 16, 8, 976213), 782L, 'nova_node', 11L)\n"]
---
In the case of a Deadlock could the instance not be "queued" for a subsequent retry ? as apposed to dropping strait into an error state.
A related recent commit:
commit e53e22271ed7e7a 9b919d817a8eb50 a1ecce16f8
Author: Chris Behrens <email address hidden>
Date: Tue Feb 19 01:04:37 2013 +0000
Retry bw_usage_update() on innodb Deadlock
Adds a new decorator _retry_ on_deadlock( ) to sqlalchemy api. This patch
makes bw_usage_update() use it.
Fixes bug 1129622
Change-Id: I0293c62d2dd5ac 036445bc639cabb d05ba016e83