Compute node stats update may lead to DBDeadlock

Bug #1253455 reported by Vui Lam
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Gary Kotton
Havana
Fix Released
High
Rohit Karajgi

Bug Description

During a tempest run, when a compute node's usage stats are updated on the DB as part of resource claiming for an instance spawn, we hit a DBDeadlock exception:

File ".../nova/compute/manager.py", line 1002, in _build_instance
 with rt.instance_claim(context, instance, limits):
    File ".../nova/openstack/common/lockutils.py", line 248, in inner
 return f(*args, **kwargs)
    File ".../nova/compute/resource_tracker.py", line 126, in instance_claim
 self._update(elevated, self.compute_node)
    File ".../nova/compute/resource_tracker.py", line 429, in _update
 context, self.compute_node, values, prune_stats)
    File ".../nova/conductor/api.py", line 240, in compute_node_update
 prune_stats)
    File ".../nova/conductor/rpcapi.py", line 363, in compute_node_update
 prune_stats=prune_stats)
    File ".../nova/rpcclient.py", line 85, in call
 return self._invoke(self.proxy.call, ctxt, method, **kwargs)
    File ".../nova/rpcclient.py", line 63, in _invoke
 return cast_or_call(ctxt, msg, **self.kwargs)
    File ".../nova/openstack/common/rpc/proxy.py", line 126, in call
 result = rpc.call(context, real_topic, msg, timeout)
    File ".../nova/openstack/common/rpc/__init__.py", line 139, in call
 return _get_impl().call(CONF, context, topic, msg, timeout)
    File ".../nova/openstack/common/rpc/impl_kombu.py", line 816, in call
 rpc_amqp.get_connection_pool(conf, Connection))
    File ".../nova/openstack/common/rpc/amqp.py", line 574, in call
 rv = list(rv)
    File ".../nova/openstack/common/rpc/amqp.py", line 539, in __iter__
 raise result
  RemoteError: Remote error: DBDeadlock (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'UPDATE compute_nodes SET updated_at=%s, hypervisor_version=%s WHERE compute_nodes.id = %s' (datetime.datetime(2013, 11, 20, 18, 28, 19, 525920), u'5.1.0 1)

(A more complete log is at http://paste.openstack.org/raw/53702/)

Can someone characterize the conditions under which this type of errors can occur?

Perhaps sqlchemy.api.compute_node_update() needs the @_retry_on_deadlock treatment?

Tags: db
Vui Lam (vui)
description: updated
Revision history for this message
Vui Lam (vui) wrote :

Another issue reported that is similar to this is:

https://bugs.launchpad.net/nova/+bug/1250836

Gary Kotton (garyk)
Changed in nova:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Gary Kotton (garyk)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/58215

Changed in nova:
status: Confirmed → In Progress
Gary Kotton (garyk)
Changed in nova:
milestone: none → icehouse-1
tags: added: db grizzly-backport-potential havana-backport-potential
Changed in nova:
milestone: icehouse-1 → icehouse-2
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-2 → icehouse-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/58215
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=74268d329fccea295b3a5e64f6b8637c661481c0
Submitter: Jenkins
Branch: master

commit 74268d329fccea295b3a5e64f6b8637c661481c0
Author: Gary Kotton <email address hidden>
Date: Mon Nov 25 00:06:30 2013 -0800

    Enable compute_node_update to tolerate deadlocks

    When running the CI there were cases when the aformentioned method would
    throw a DBDeadlock exception.

    Change-Id: I98d1a804c51e1bf3bb96193d82ffe7e5d064e134
    Closes-bug: #1253455

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/75825

Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/havana)

Reviewed: https://review.openstack.org/75825
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=656452548b2d8d654a25fc27476cd77a9b8a9172
Submitter: Jenkins
Branch: stable/havana

commit 656452548b2d8d654a25fc27476cd77a9b8a9172
Author: Gary Kotton <email address hidden>
Date: Mon Nov 25 00:06:30 2013 -0800

    Enable compute_node_update to tolerate deadlocks

    When running the CI there were cases when the aformentioned method would
    throw a DBDeadlock exception.

    Change-Id: I98d1a804c51e1bf3bb96193d82ffe7e5d064e134
    Closes-bug: #1253455

tags: added: in-stable-havana
Alan Pevec (apevec)
tags: removed: grizzly-backport-potential havana-backport-potential in-stable-havana
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-3 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.