nova-compute wedged by deleting an in-use baremetal node

Bug #1183633 reported by Robert Collins
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Telles Mota Vidal Nóbrega
tripleo
Fix Released
Medium
Telles Mota Vidal Nóbrega

Bug Description

we deleted a node that had an instance sortof on it, because we thought it was dead.

symptoms:
nova compute stops checking in:
$ nova service-list
+------------------+---------------+----------+---------+-------+----------------------------+
| Binary | Host | Zone | Status | State | Updated_at |
+------------------+---------------+----------+---------+-------+----------------------------+
| nova-cert | foo.novalocal | internal | enabled | up | 2013-05-24T01:45:02.000000 |
| nova-compute | foo.novalocal | nova | enabled | down | 2013-05-23T22:16:26.000000 |
| nova-conductor | foo.novalocal | internal | enabled | up | 2013-05-24T01:45:04.000000 |
| nova-consoleauth | foo.novalocal | internal | enabled | up | 2013-05-24T01:45:03.000000 |
| nova-scheduler | foo.novalocal | internal | enabled | up | 2013-05-24T01:45:03.000000 |
+------------------+---------------+----------+---------+-------+----------------------------+

and it's log is full of repeating attempts to start it where it dies [because upstart is restarting it]
2013-05-24 01:48:52,227.227 28649 INFO nova.openstack.common.periodic_task [-] Skipping periodic task _periodic_update_dns because its interval is negative
2013-05-24 01:48:52,291.291 28649 INFO nova.virt.driver [-] Loading compute driver 'baremetal.driver.BareMetalDriver'
2013-05-24 01:48:52,346.346 28649 INFO nova.openstack.common.rpc.common [req-08ee3ab5-fc1f-4550-a203-c0fb37d9a9e3 None None] Connected to AMQP server on 127.0.0.1:5672
2013-05-24 01:48:52,409.409 28649 AUDIT nova.service [-] Starting compute node (version 2013.2)
2013-05-24 01:48:52,817.817 28649 ERROR nova.compute.manager [req-76be156c-c2d9-4c7c-aa00-3ce45b3b49a8 None None] Instance test-332aac1b-8987-488d-9ea2-19f26a16907d found in the hypervisor, but not in the database
2013-05-24 01:48:52,817.817 28649 ERROR nova.compute.manager [req-76be156c-c2d9-4c7c-aa00-3ce45b3b49a8 None None] Instance bootstack-vm.notcompute found in the hypervisor, but not in the database
2013-05-24 01:48:52,817.817 28649 ERROR nova.compute.manager [req-76be156c-c2d9-4c7c-aa00-3ce45b3b49a8 None None] Instance bootstack-vm-4.notcompute found in the hypervisor, but not in the database
2013-05-24 01:48:52,818.818 28649 ERROR nova.compute.manager [req-76be156c-c2d9-4c7c-aa00-3ce45b3b49a8 None None] Instance bootstack-vm-testing.notcompute found in the hypervisor, but not in the database
2013-05-24 01:48:52,818.818 28649 ERROR nova.compute.manager [req-76be156c-c2d9-4c7c-aa00-3ce45b3b49a8 None None] Instance compute-test.novacompute-0 found in the hypervisor, but not in the database
2013-05-24 01:48:52,818.818 28649 ERROR nova.compute.manager [req-76be156c-c2d9-4c7c-aa00-3ce45b3b49a8 None None] Instance test-8b80ff43-ba81-44c6-a22a-fffd6034579a found in the hypervisor, but not in the database
2013-05-24 01:48:52,818.818 28649 ERROR nova.compute.manager [req-76be156c-c2d9-4c7c-aa00-3ce45b3b49a8 None None] Instance test-cd715548-afd7-4342-8c74-b4d5e5984dd6 found in the hypervisor, but not in the database
2013-05-24 01:48:52,818.818 28649 ERROR nova.compute.manager [req-76be156c-c2d9-4c7c-aa00-3ce45b3b49a8 None None] Instance test-bb35ffdf-9fae-4e23-8e46-ec76b89c1ce4 found in the hypervisor, but not in the database
2013-05-24 01:48:52,818.818 28649 ERROR nova.compute.manager [req-76be156c-c2d9-4c7c-aa00-3ce45b3b49a8 None None] Instance test-d01059f8-97ab-4f0a-968b-7411b2ab717c found in the hypervisor, but not in the database
2013-05-24 01:48:52,818.818 28649 ERROR nova.compute.manager [req-76be156c-c2d9-4c7c-aa00-3ce45b3b49a8 None None] Instance test-f7862b82-268d-4971-b961-a8fe51488b21 found in the hypervisor, but not in the database
2013-05-24 01:48:52,819.819 28649 ERROR nova.compute.manager [req-76be156c-c2d9-4c7c-aa00-3ce45b3b49a8 None None] Instance test-3f0cdb8f-70ae-43f7-bb98-83c48f5da317 found in the hypervisor, but not in the database
2013-05-24 01:48:52,819.819 28649 ERROR nova.compute.manager [req-76be156c-c2d9-4c7c-aa00-3ce45b3b49a8 None None] Instance test-d3d7d58f-408c-47ff-993a-4b8327f27541 found in the hypervisor, but not in the database
2013-05-24 01:48:52,819.819 28649 ERROR nova.compute.manager [req-76be156c-c2d9-4c7c-aa00-3ce45b3b49a8 None None] Instance test-30405362-c307-428a-94c5-dbe6284b8f28 found in the hypervisor, but not in the database
2013-05-24 01:48:52,819.819 28649 ERROR nova.compute.manager [req-76be156c-c2d9-4c7c-aa00-3ce45b3b49a8 None None] Instance test-54fb06f0-325c-4d98-9a54-2ab4d3ab9794 found in the hypervisor, but not in the database
2013-05-24 01:48:52,820.820 28649 ERROR nova.compute.manager [req-76be156c-c2d9-4c7c-aa00-3ce45b3b49a8 None None] Instance test-091264f9-830b-4279-92e3-20ff56375973 found in the hypervisor, but not in the database
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 336, in fire_timers
    timer()
  File "/usr/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 56, in __call__
    cb(*args, **kw)
  File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 192, in main
    result = function(*args, **kwargs)
  File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/service.py", line 148, in run_server
    server.start()
  File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/service.py", line 430, in start
    self.manager.init_host()
  File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 631, in init_host
    self._init_instance(context, instance)
  File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 520, in _init_instance
    self.driver.plug_vifs(instance, legacy_net_info)
  File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/baremetal/driver.py", line 460, in plug_vifs
    self._plug_vifs(instance, network_info)
  File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/baremetal/driver.py", line 465, in _plug_vifs
    node = _get_baremetal_node_by_instance_uuid(instance['uuid'])
  File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/baremetal/driver.py", line 88, in _get_baremetal_node_by_instance_uuid
    node = db.bm_node_get_by_instance_uuid(ctx, instance_uuid)
  File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/baremetal/db/api.py", line 101, in bm_node_get_by_instance_uuid
    instance_uuid)
  File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 97, in wrapper
    return f(*args, **kwargs)
  File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/baremetal/db/sqlalchemy/api.py", line 151, in bm_node_get_by_instance_uuid
    raise exception.InstanceNotFound(instance_id=instance_uuid)
InstanceNotFound: Instance 9dc0aba0-27a5-47cb-a85a-574763e8243e could not be found.
2013-05-24 01:48:54,716.716 28649 CRITICAL nova [-] Instance 9dc0aba0-27a5-47cb-a85a-574763e8243e could not be found.
2013-05-24 01:48:54,716.716 28649 TRACE nova Traceback (most recent call last):
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/venvs/nova/bin/nova-compute", line 8, in <module>
2013-05-24 01:48:54,716.716 28649 TRACE nova load_entry_point('nova==2013.2.a2.gaf90386', 'console_scripts', 'nova-compute')()
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/cmd/compute.py", line 65, in main
2013-05-24 01:48:54,716.716 28649 TRACE nova service.wait()
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/service.py", line 690, in wait
2013-05-24 01:48:54,716.716 28649 TRACE nova _launcher.wait()
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/service.py", line 210, in wait
2013-05-24 01:48:54,716.716 28649 TRACE nova super(ServiceLauncher, self).wait()
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/service.py", line 180, in wait
2013-05-24 01:48:54,716.716 28649 TRACE nova service.wait()
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 166, in wait
2013-05-24 01:48:54,716.716 28649 TRACE nova return self._exit_event.wait()
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
2013-05-24 01:48:54,716.716 28649 TRACE nova return hubs.get_hub().switch()
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 177, in switch
2013-05-24 01:48:54,716.716 28649 TRACE nova return self.greenlet.switch()
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 192, in main
2013-05-24 01:48:54,716.716 28649 TRACE nova result = function(*args, **kwargs)
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/service.py", line 148, in run_server
2013-05-24 01:48:54,716.716 28649 TRACE nova server.start()
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/service.py", line 430, in start
2013-05-24 01:48:54,716.716 28649 TRACE nova self.manager.init_host()
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 631, in init_host
2013-05-24 01:48:54,716.716 28649 TRACE nova self._init_instance(context, instance)
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/compute/manager.py", line 520, in _init_instance
2013-05-24 01:48:54,716.716 28649 TRACE nova self.driver.plug_vifs(instance, legacy_net_info)
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/baremetal/driver.py", line 460, in plug_vifs
2013-05-24 01:48:54,716.716 28649 TRACE nova self._plug_vifs(instance, network_info)
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/baremetal/driver.py", line 465, in _plug_vifs
2013-05-24 01:48:54,716.716 28649 TRACE nova node = _get_baremetal_node_by_instance_uuid(instance['uuid'])
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/baremetal/driver.py", line 88, in _get_baremetal_node_by_instance_uuid
2013-05-24 01:48:54,716.716 28649 TRACE nova node = db.bm_node_get_by_instance_uuid(ctx, instance_uuid)
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/baremetal/db/api.py", line 101, in bm_node_get_by_instance_uuid
2013-05-24 01:48:54,716.716 28649 TRACE nova instance_uuid)
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 97, in wrapper
2013-05-24 01:48:54,716.716 28649 TRACE nova return f(*args, **kwargs)
2013-05-24 01:48:54,716.716 28649 TRACE nova File "/opt/stack/venvs/nova/local/lib/python2.7/site-packages/nova/virt/baremetal/db/sqlalchemy/api.py", line 151, in bm_node_get_by_instance_uuid
2013-05-24 01:48:54,716.716 28649 TRACE nova raise exception.InstanceNotFound(instance_id=instance_uuid)
2013-05-24 01:48:54,716.716 28649 TRACE nova InstanceNotFound: Instance 9dc0aba0-27a5-47cb-a85a-574763e8243e could not be found.
2013-05-24 01:48:54,716.716 28649 TRACE nova

Note that the last line is truncated in the logs - its not missing content from the copy-paste.

Tags: baremetal
Revision history for this message
Robert Collins (lifeless) wrote :

This code looks to be the issue:
    def init_host(self):
        """Initialization for a standalone compute service."""
        self.driver.init_host(host=self.host)
        context = nova.context.get_admin_context()
        instances = self.conductor_api.instance_get_all_by_host(context,
                                                                self.host)

        if CONF.defer_iptables_apply:
            self.driver.filter_defer_apply_on()

        self.init_virt_events()

        try:
            # checking that instance was not already evacuated to other host
            self._destroy_evacuated_instances(context)
            for instance in instances:
                self._init_instance(context, instance)
        finally:
            if CONF.defer_iptables_apply:
                self.driver.filter_defer_apply_off()

It tries to bring up all non-evacuated instances, but _init_instance is throwing rather than catching.

Revision history for this message
Robert Collins (lifeless) wrote :

the instance vm_state is set to error
so we might want to not init such instances.

tags: added: baremetal
aeva black (tenbrae)
Changed in nova:
importance: Undecided → High
status: New → Triaged
ugvddm (271025598-9)
Changed in nova:
assignee: nobody → ugvddm (271025598-9)
ugvddm (271025598-9)
Changed in nova:
assignee: ugvddm (271025598-9) → nobody
Changed in tripleo:
importance: High → Medium
Changed in tripleo:
assignee: nobody → Telles Mota Vidal Nóbrega (tellesmvn)
Changed in nova:
assignee: nobody → Telles Mota Vidal Nóbrega (tellesmvn)
Revision history for this message
Telles Mota Vidal Nóbrega (tellesmvn) wrote :

I talked to Robert Collins and tried to reproduce the bug, but it didnt happen again, i tried several times and at last ran a test he suggested and the bug didn't happen, so he said that i could mark this bug as fix released.

Changed in nova:
status: Triaged → Fix Released
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.