Bug #1272623 “nova refuses to start if there are baremetal insta...” : Bugs : OpenStack Compute (nova)

Robert Collins (lifeless) on 2014-01-25

summary:	- nova refuses to delete baremetal instances if there is no associated - node + nova refuses to start if there are baremetal instances with no + associated node
description:	updated

Revision history for this message

Robert Collins (lifeless) wrote on 2014-01-25:

#1

We're running a monkeypatch to avoid this at the moment

Changed in tripleo:
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Robert Collins (lifeless) wrote on 2014-02-03:

#2

questions from jog:
- how does this happen
- is this only nova-bm ?

Revision history for this message

Robert Collins (lifeless) wrote on 2014-02-03:

#3

On startup nova-compute attempts to restore the state of the node to it's internal model. E.g. start vms that are meant to be running, fully delete vms that are means to be purged from disk etc.

We also try to start VMs in state 'ERROR' here, which AFAICT doesn't happen in any other circumstance. This is conceptually problematic because ERROR is used to indicate that nova has given up on the VM, rather than it being in the middle of an operation which needs resuming.

One particular thing that can happen is that once a VM is in state ERROR, there is no guarantee that the axioms for it are maintained - it might not have had networking allocated, for instance.

The thing that caused this particular backtrace here was an instance of that: nova-compute error the VM before writing the instance id to the bm_nodes table (which is what captures the association of instance to node). This happened quite legitimately - the scheduler was trying to schedule to an already used node (due to a different issue - but the scheduler is intrinsically racy, so this should be expected in general). Then when restarted nova-compute attempted to restart the ERROR state VM, and threw an exception (rightly so, attempting to power on nothing is an error)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-02-07: Fix merged to nova (master)

#4

Reviewed: https://review.openstack.org/69108
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=6534a89de9cabc274cbdb7d2ecee3d851c456a87
Submitter: Jenkins
Branch: master

commit 6534a89de9cabc274cbdb7d2ecee3d851c456a87
Author: Steve Kowalik <email address hidden>
Date: Sat Jan 25 20:00:19 2014 +1300

Don't try to restore VM's in state ERROR.

    We don't try to restore VM's that are in a failed BUILDING state, so
    attempting to start ERROR VMs is more than a little weird. The one
    exception to this rule are VMs that are in RESIZE_MIGRATING, since
    recovery is already attempted. It's also a problem, because many ERROR
    states aren't recoverable from (at the moment anyhow).

    Closes-Bug: #1272623
    Change-Id: I0599b83a82ad3ee67a92126d3b57df5b02e20539
    Co-Authored-By: Robert Collins <email address hidden>

Changed in nova:
status:	Triaged → Fix Committed

Russell Bryant (russellb) on 2014-02-21

Changed in nova:
milestone:	none → icehouse-3

Thierry Carrez (ttx) on 2014-03-05

Changed in nova:
status:	Fix Committed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-03-20: Fix proposed to nova (stable/havana)

#5

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/81690

Revision history for this message

wangpan (hzwangpan) wrote on 2014-03-20:

#6

Download full text (4.6 KiB)

I believe this bug also affects libvirt driver(qemu hypervisor) in havana, so I cherry picked it to havana.
please see the trace as below, nova-compute tried to restore an error instance, and it failed to start at end.

2014-03-19 23:02:57.783 24757 DEBUG nova.virt.libvirt.vif [req-d5ae9690-179e-4661-928f-9c3febaeda5f None None] vif_type=binding_failed instance=<nova.objects.instance.Instance object at 0x3f38750> vif=VIF({'ovs_interfaceid': None, 'network': Network({'bridge': None, 'subnets': [Subnet({'ips': [FixedIP({'meta': {}, 'version': 4, 'type': u'fixed', 'floating_ips': [], 'address': u'10.0.17.4'})], 'version': 4, 'meta': {u'dhcp_server': u'10.0.17.3'}, 'dns': [], 'routes': [], 'cidr': u'10.0.17.0/24', 'gateway': IP({'meta': {}, 'version': 4, 'type': u'gateway', 'address': u'10.0.17.1'})})], 'meta': {u'injected': False, u'tenant_id': u'e1caab985d1e4418a8b0e4d869afdd25'}, 'id': u'001361ce-ebbf-44de-aa7a-2943682bfa3a', 'label': u'admin-test'}), 'devname': u'tapa13b662a-24', 'qbh_params': None, 'meta': {}, 'address': u'fa:16:3e:15:ac:a5', 'type': u'binding_failed', 'id': u'a13b662a-24d3-4ccd-8788-f59151376e6f', 'qbg_params': None}) plug /usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py:544
2014-03-19 23:02:57.786 24757 ERROR nova.openstack.common.threadgroup [-] Unexpected vif_type=binding_failed
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup Traceback (most recent call last):
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/threadgroup.py", line 117, in wait
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup x.wait()
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/threadgroup.py", line 49, in wait
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup return self.thread.wait()
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 168, in wait
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup return self._exit_event.wait()
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup return hubs.get_hub().switch()
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup return self.greenlet.switch()
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 194, in main
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup result = function(*args, **kwargs)
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/service.py", line 65, in run_service
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup service....

I believe this bug also affects libvirt driver(qemu hypervisor) in havana, so I cherry picked it to havana.
please see the trace as below, nova-compute tried to restore an error instance, and it failed to start at end.

2014-03-19 23:02:57.783 24757 DEBUG nova.virt.libvirt.vif [req-d5ae9690-179e-4661-928f-9c3febaeda5f None None] vif_type=binding_failed instance=<nova.objects.instance.Instance object at 0x3f38750> vif=VIF({'ovs_interfaceid': None, 'network': Network({'bridge': None, 'subnets': [Subnet({'ips': [FixedIP({'meta': {}, 'version': 4, 'type': u'fixed', 'floating_ips': [], 'address': u'10.0.17.4'})], 'version': 4, 'meta': {u'dhcp_server': u'10.0.17.3'}, 'dns': [], 'routes': [], 'cidr': u'10.0.17.0/24', 'gateway': IP({'meta': {}, 'version': 4, 'type': u'gateway', 'address': u'10.0.17.1'})})], 'meta': {u'injected': False, u'tenant_id': u'e1caab985d1e4418a8b0e4d869afdd25'}, 'id': u'001361ce-ebbf-44de-aa7a-2943682bfa3a', 'label': u'admin-test'}), 'devname': u'tapa13b662a-24', 'qbh_params': None, 'meta': {}, 'address': u'fa:16:3e:15:ac:a5', 'type': u'binding_failed', 'id': u'a13b662a-24d3-4ccd-8788-f59151376e6f', 'qbg_params': None}) plug /usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py:544
2014-03-19 23:02:57.786 24757 ERROR nova.openstack.common.threadgroup [-] Unexpected vif_type=binding_failed
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup Traceback (most recent call last):
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/threadgroup.py", line 117, in wait
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup x.wait()
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/threadgroup.py", line 49, in wait
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup return self.thread.wait()
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 168, in wait
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup return self._exit_event.wait()
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/event.py", line 116, in wait
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup return hubs.get_hub().switch()
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 187, in switch
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup return self.greenlet.switch()
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/eventlet/greenthread.py", line 194, in main
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup result = function(*args, **kwargs)
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/openstack/common/service.py", line 65, in run_service
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup service.start()
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/service.py", line 154, in start
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup self.manager.init_host()
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 967, in init_host
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup wait_ticks=wait_ticks)
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 776, in _init_instance
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup self.driver.plug_vifs(instance, net_info)
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 794, in plug_vifs
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup self.vif_driver.plug(instance, vif)
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/vif.py", line 566, in plug
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup _("Unexpected vif_type=%s") % vif_type)
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup NovaException: Unexpected vif_type=binding_failed
2014-03-19 23:02:57.786 24757 TRACE nova.openstack.common.threadgroup

Thierry Carrez (ttx) on 2014-04-17

Changed in nova:
milestone:	icehouse-3 → 2014.1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-31: Change abandoned on nova (stable/havana)

#7

Change abandoned by Wangpan (<email address hidden>) on branch: stable/havana
Review: https://review.openstack.org/81690

Revision history for this message

Ben Nemec (bnemec) wrote on 2016-03-23:

#8

It appears this has been fixed in Nova for a long time.

Changed in tripleo:
status:	Triaged → Fix Released

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	Unassigned	OpenStack Compute (nova) 2014.1 "icehouse"
	tripleo	Fix Released	High	Unassigned

OpenStack Compute (nova)

nova refuses to start if there are baremetal instances with no associated node

Bug Description

Other bug subscribers

Remote bug watches