OpenStack Compute (nova)

Compute node stats update may lead to DBDeadlock

Bug #1253455 reported by Vui Lam on 2013-11-20

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	Gary Kotton	OpenStack Compute (nova) 2014.1 "icehouse"
	Havana	Fix Released	High	Rohit Karajgi	OpenStack Compute (nova) 2013.2.3

Bug Description

During a tempest run, when a compute node's usage stats are updated on the DB as part of resource claiming for an instance spawn, we hit a DBDeadlock exception:

File ".../nova/compute/manager.py", line 1002, in _build_instance
with rt.instance_claim(context, instance, limits):
    File ".../nova/openstack/common/lockutils.py", line 248, in inner
return f(*args, **kwargs)
    File ".../nova/compute/resource_tracker.py", line 126, in instance_claim
self._update(elevated, self.compute_node)
    File ".../nova/compute/resource_tracker.py", line 429, in _update
context, self.compute_node, values, prune_stats)
    File ".../nova/conductor/api.py", line 240, in compute_node_update
prune_stats)
    File ".../nova/conductor/rpcapi.py", line 363, in compute_node_update
prune_stats=prune_stats)
    File ".../nova/rpcclient.py", line 85, in call
return self._invoke(self.proxy.call, ctxt, method, **kwargs)
    File ".../nova/rpcclient.py", line 63, in _invoke
return cast_or_call(ctxt, msg, **self.kwargs)
    File ".../nova/openstack/common/rpc/proxy.py", line 126, in call
result = rpc.call(context, real_topic, msg, timeout)
    File ".../nova/openstack/common/rpc/__init__.py", line 139, in call
return _get_impl().call(CONF, context, topic, msg, timeout)
    File ".../nova/openstack/common/rpc/impl_kombu.py", line 816, in call
rpc_amqp.get_connection_pool(conf, Connection))
    File ".../nova/openstack/common/rpc/amqp.py", line 574, in call
rv = list(rv)
    File ".../nova/openstack/common/rpc/amqp.py", line 539, in __iter__
raise result
  RemoteError: Remote error: DBDeadlock (OperationalError) (1213, 'Deadlock found when trying to get lock; try restarting transaction') 'UPDATE compute_nodes SET updated_at=%s, hypervisor_version=%s WHERE compute_nodes.id = %s' (datetime.datetime(2013, 11, 20, 18, 28, 19, 525920), u'5.1.0 1)

(A more complete log is at http://paste.openstack.org/raw/53702/)

Can someone characterize the conditions under which this type of errors can occur?

Perhaps sqlchemy.api.compute_node_update() needs the @_retry_on_deadlock treatment?

See original description

Tags:

Vui Lam (vui) on 2013-11-20

description:

updated

Revision history for this message

Vui Lam (vui) wrote on 2013-11-20:

Another issue reported that is similar to this is:

https://bugs.launchpad.net/nova/+bug/1250836

Gary Kotton (garyk) on 2013-11-25

Changed in nova:
status:	New → Confirmed
importance:	Undecided → High
assignee:	nobody → Gary Kotton (garyk)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-11-25: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/58215

Changed in nova:
status:	Confirmed → In Progress

Gary Kotton (garyk) on 2013-11-25

Changed in nova:
milestone:	none → icehouse-1
tags:	added: db grizzly-backport-potential havana-backport-potential

Russell Bryant (russellb) on 2013-12-03

Changed in nova:
milestone:	icehouse-1 → icehouse-2

Thierry Carrez (ttx) on 2014-01-22

Changed in nova:
milestone:	icehouse-2 → icehouse-3

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-25: Fix merged to nova (master)

Reviewed: https://review.openstack.org/58215
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=74268d329fccea295b3a5e64f6b8637c661481c0
Submitter: Jenkins
Branch: master

commit 74268d329fccea295b3a5e64f6b8637c661481c0
Author: Gary Kotton <email address hidden>
Date: Mon Nov 25 00:06:30 2013 -0800

Enable compute_node_update to tolerate deadlocks

When running the CI there were cases when the aformentioned method would
throw a DBDeadlock exception.

Change-Id: I98d1a804c51e1bf3bb96193d82ffe7e5d064e134
Closes-bug: #1253455

Changed in nova:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-02-24: Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/75825

Thierry Carrez (ttx) on 2014-03-05

Changed in nova:
status:	Fix Committed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-03-14: Fix merged to nova (stable/havana)

Reviewed: https://review.openstack.org/75825
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=656452548b2d8d654a25fc27476cd77a9b8a9172
Submitter: Jenkins
Branch: stable/havana

commit 656452548b2d8d654a25fc27476cd77a9b8a9172
Author: Gary Kotton <email address hidden>
Date: Mon Nov 25 00:06:30 2013 -0800

Enable compute_node_update to tolerate deadlocks

When running the CI there were cases when the aformentioned method would
throw a DBDeadlock exception.

Change-Id: I98d1a804c51e1bf3bb96193d82ffe7e5d064e134
Closes-bug: #1253455

tags:

added: in-stable-havana

Alan Pevec (apevec) on 2014-03-30

tags:

removed: grizzly-backport-potential havana-backport-potential in-stable-havana

Thierry Carrez (ttx) on 2014-04-17

Changed in nova:
milestone:	icehouse-3 → 2014.1

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.