nova-compute wedged by deleting an in-use baremetal node
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
High
|
Telles Mota Vidal Nóbrega | ||
tripleo |
Fix Released
|
Medium
|
Telles Mota Vidal Nóbrega |
Bug Description
we deleted a node that had an instance sortof on it, because we thought it was dead.
symptoms:
nova compute stops checking in:
$ nova service-list
+------
| Binary | Host | Zone | Status | State | Updated_at |
+------
| nova-cert | foo.novalocal | internal | enabled | up | 2013-05-
| nova-compute | foo.novalocal | nova | enabled | down | 2013-05-
| nova-conductor | foo.novalocal | internal | enabled | up | 2013-05-
| nova-consoleauth | foo.novalocal | internal | enabled | up | 2013-05-
| nova-scheduler | foo.novalocal | internal | enabled | up | 2013-05-
+------
and it's log is full of repeating attempts to start it where it dies [because upstart is restarting it]
2013-05-24 01:48:52,227.227 28649 INFO nova.openstack.
2013-05-24 01:48:52,291.291 28649 INFO nova.virt.driver [-] Loading compute driver 'baremetal.
2013-05-24 01:48:52,346.346 28649 INFO nova.openstack.
2013-05-24 01:48:52,409.409 28649 AUDIT nova.service [-] Starting compute node (version 2013.2)
2013-05-24 01:48:52,817.817 28649 ERROR nova.compute.
2013-05-24 01:48:52,817.817 28649 ERROR nova.compute.
2013-05-24 01:48:52,817.817 28649 ERROR nova.compute.
2013-05-24 01:48:52,818.818 28649 ERROR nova.compute.
2013-05-24 01:48:52,818.818 28649 ERROR nova.compute.
2013-05-24 01:48:52,818.818 28649 ERROR nova.compute.
2013-05-24 01:48:52,818.818 28649 ERROR nova.compute.
2013-05-24 01:48:52,818.818 28649 ERROR nova.compute.
2013-05-24 01:48:52,818.818 28649 ERROR nova.compute.
2013-05-24 01:48:52,818.818 28649 ERROR nova.compute.
2013-05-24 01:48:52,819.819 28649 ERROR nova.compute.
2013-05-24 01:48:52,819.819 28649 ERROR nova.compute.
2013-05-24 01:48:52,819.819 28649 ERROR nova.compute.
2013-05-24 01:48:52,819.819 28649 ERROR nova.compute.
2013-05-24 01:48:52,820.820 28649 ERROR nova.compute.
Traceback (most recent call last):
File "/usr/lib/
timer()
File "/usr/lib/
cb(*args, **kw)
File "/usr/lib/
result = function(*args, **kwargs)
File "/opt/stack/
server.start()
File "/opt/stack/
self.
File "/opt/stack/
self.
File "/opt/stack/
self.
File "/opt/stack/
self.
File "/opt/stack/
node = _get_baremetal_
File "/opt/stack/
node = db.bm_node_
File "/opt/stack/
instance_uuid)
File "/opt/stack/
return f(*args, **kwargs)
File "/opt/stack/
raise exception.
InstanceNotFound: Instance 9dc0aba0-
2013-05-24 01:48:54,716.716 28649 CRITICAL nova [-] Instance 9dc0aba0-
2013-05-24 01:48:54,716.716 28649 TRACE nova Traceback (most recent call last):
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/
2013-05-24 01:48:54,716.716 28649 TRACE nova load_entry_
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/
2013-05-24 01:48:54,716.716 28649 TRACE nova service.wait()
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/
2013-05-24 01:48:54,716.716 28649 TRACE nova _launcher.wait()
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/
2013-05-24 01:48:54,716.716 28649 TRACE nova super(ServiceLa
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/
2013-05-24 01:48:54,716.716 28649 TRACE nova service.wait()
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/usr/lib/
2013-05-24 01:48:54,716.716 28649 TRACE nova return self._exit_
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/usr/lib/
2013-05-24 01:48:54,716.716 28649 TRACE nova return hubs.get_
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/usr/lib/
2013-05-24 01:48:54,716.716 28649 TRACE nova return self.greenlet.
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/usr/lib/
2013-05-24 01:48:54,716.716 28649 TRACE nova result = function(*args, **kwargs)
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/
2013-05-24 01:48:54,716.716 28649 TRACE nova server.start()
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/
2013-05-24 01:48:54,716.716 28649 TRACE nova self.manager.
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/
2013-05-24 01:48:54,716.716 28649 TRACE nova self._init_
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/
2013-05-24 01:48:54,716.716 28649 TRACE nova self.driver.
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/
2013-05-24 01:48:54,716.716 28649 TRACE nova self._plug_
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/
2013-05-24 01:48:54,716.716 28649 TRACE nova node = _get_baremetal_
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/
2013-05-24 01:48:54,716.716 28649 TRACE nova node = db.bm_node_
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/
2013-05-24 01:48:54,716.716 28649 TRACE nova instance_uuid)
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/
2013-05-24 01:48:54,716.716 28649 TRACE nova return f(*args, **kwargs)
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/
2013-05-24 01:48:54,716.716 28649 TRACE nova raise exception.
2013-05-24 01:48:54,716.716 28649 TRACE nova InstanceNotFound: Instance 9dc0aba0-
2013-05-24 01:48:54,716.716 28649 TRACE nova
Note that the last line is truncated in the logs - its not missing content from the copy-paste.
tags: | added: baremetal |
Changed in nova: | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in nova: | |
assignee: | nobody → ugvddm (271025598-9) |
Changed in nova: | |
assignee: | ugvddm (271025598-9) → nobody |
Changed in tripleo: | |
importance: | High → Medium |
Changed in tripleo: | |
assignee: | nobody → Telles Mota Vidal Nóbrega (tellesmvn) |
Changed in nova: | |
assignee: | nobody → Telles Mota Vidal Nóbrega (tellesmvn) |
This code looks to be the issue:
"""Initializat ion for a standalone compute service."""
self.driver. init_host( host=self. host) get_admin_ context( ) api.instance_ get_all_ by_host( context,
self.host)
def init_host(self):
context = nova.context.
instances = self.conductor_
if CONF.defer_ iptables_ apply:
self. driver. filter_ defer_apply_ on()
try:
self. _destroy_ evacuated_ instances( context)
self. _init_instance( context, instance) iptables_ apply:
self. driver. filter_ defer_apply_ off()
# checking that instance was not already evacuated to other host
for instance in instances:
finally:
if CONF.defer_
It tries to bring up all non-evacuated instances, but _init_instance is throwing rather than catching.