nova-compute cannot restart if _init_instance failed

Bug #1324041 reported by zhangjialong
22
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
wangpan
Icehouse
Fix Released
High
Artom Lifshitz
Juno
Fix Released
High
Artom Lifshitz

Bug Description

In my openstack, because of the interruption of power supply, my compute nodes crash . Then , i start my compute nodes, and the start the nova-compute service. Unfortunately , i cannot start nova-compute service. I checked the compute.log , found something error like follows:

2014-05-28 16:21:12.558 2724 DEBUG nova.compute.manager [-] [instance: ac57aab0-1864-4335-aa4a-bbfcc75a9624] Checking state _get_power_state /usr/lib/python2.6/site-packages/nova/compute/manager.py:1043
2014-05-28 16:21:12.563 2724 DEBUG nova.compute.manager [-] [instance: ac57aab0-1864-4335-aa4a-bbfcc75a9624] Checking state _get_power_state /usr/lib/python2.6/site-packages/nova/compute/manager.py:1043
2014-05-28 16:21:12.567 2724 DEBUG nova.virt.libvirt.vif [-] vif_type=bridge instance=<nova.objects.instance.Instance object at 0x3fcead0> vif=VIF({'ovs_interfaceid': None, 'network': Network({'bridge': u'brqf29d33d2-7c', 'subnets': [Subnet({'ips': [FixedIP({'meta': {}, 'version': 4, 'type': u'fixed', 'floating_ips': [IP({'meta': {}, 'version': 4, 'type': u'floating', 'address': u'10.0.0.101'})], 'address': u'192.168.0.2'})], 'version': 4, 'meta': {u'dhcp_server': u'192.168.0.3'}, 'dns': [], 'routes': [], 'cidr': u'192.168.0.0/24', 'gateway': IP({'meta': {}, 'version': 4, 'type': u'gateway', 'address': u'192.168.0.1'})})], 'meta': {u'injected': False, u'tenant_id': u'5d56667c799c46ef81b87455445af457', u'should_create_bridge': True}, 'id': u'f29d33d2-7c70-456a-96b0-03a59fe0b40f', 'label': u'admin_net'}), 'devname': u'tap0780a643-9a', 'qbh_params': None, 'meta': {}, 'details': {u'port_filter': True}, 'address': u'fa:16:3e:dc:23:66', 'active': True, 'type': u'bridge', 'id': u'0780a643-9ad4-4388-a51d-3456a1e88ae6', 'qbg_params': None}) plug /usr/lib/python2.6/site-packages/nova/virt/libvirt/vif.py:592
2014-05-28 16:21:12.568 2724 DEBUG nova.virt.libvirt.vif [-] [instance: ac57aab0-1864-4335-aa4a-bbfcc75a9624] Ensuring bridge brqf29d33d2-7c plug_bridge /usr/lib/python2.6/site-packages/nova/virt/libvirt/vif.py:408
2014-05-28 16:21:12.568 2724 DEBUG nova.openstack.common.lockutils [-] Got semaphore "lock_bridge" lock /usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py:168
2014-05-28 16:21:12.569 2724 DEBUG nova.openstack.common.lockutils [-] Attempting to grab file lock "lock_bridge" lock /usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py:178
2014-05-28 16:21:12.569 2724 DEBUG nova.openstack.common.lockutils [-] Got file lock "lock_bridge" at /var/lib/nova/tmp/nova-lock_bridge lock /usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py:206
2014-05-28 16:21:12.569 2724 DEBUG nova.openstack.common.lockutils [-] Got semaphore / lock "ensure_bridge" inner /usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py:248
2014-05-28 16:21:12.570 2724 DEBUG nova.openstack.common.lockutils [-] Released file lock "lock_bridge" at /var/lib/nova/tmp/nova-lock_bridge lock /usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py:210
2014-05-28 16:21:12.570 2724 DEBUG nova.openstack.common.lockutils [-] Semaphore / lock released "ensure_bridge" inner /usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py:252
2014-05-28 16:21:12.570 2724 DEBUG nova.compute.manager [-] [instance: ac57aab0-1864-4335-aa4a-bbfcc75a9624] Checking state _get_power_state /usr/lib/python2.6/site-packages/nova/compute/manager.py:1043
2014-05-28 16:21:12.575 2724 DEBUG nova.compute.manager [-] [instance: ac57aab0-1864-4335-aa4a-bbfcc75a9624] Current state is 4, state in DB is 1. _init_instance /usr/lib/python2.6/site-packages/nova/compute/manager.py:920
2014-05-28 16:21:12.575 2724 DEBUG nova.compute.manager [-] [instance: 8047e688-d189-4d35-a9c8-634f34cdda86] Checking state _get_power_state /usr/lib/python2.6/site-packages/nova/compute/manager.py:1043
2014-05-28 16:21:12.579 2724 DEBUG nova.compute.manager [-] [instance: 8047e688-d189-4d35-a9c8-634f34cdda86] Checking state _get_power_state /usr/lib/python2.6/site-packages/nova/compute/manager.py:1043
2014-05-28 16:21:12.584 2724 DEBUG nova.virt.libvirt.vif [-] vif_type=binding_failed instance=<nova.objects.instance.Instance object at 0x3fcef50> vif=VIF({'ovs_interfaceid': None, 'network': Network({'bridge': None, 'subnets': [Subnet({'ips': [FixedIP({'meta': {}, 'version': 4, 'type': u'fixed', 'floating_ips': [IP({'meta': {}, 'version': 4, 'type': u'floating', 'address': u'10.0.0.112'})], 'address': u'172.16.0.180'})], 'version': 4, 'meta': {u'dhcp_server': u'172.16.0.3'}, 'dns': [], 'routes': [], 'cidr': u'172.16.0.0/24', 'gateway': IP({'meta': {}, 'version': 4, 'type': u'gateway', 'address': u'172.16.0.1'})})], 'meta': {u'injected': False, u'tenant_id': u'b0df2063f0ae4830880ce544643c15e2'}, 'id': u'1ddbf12d-9adc-4371-9741-a2d89cd40686', 'label': u'plcloud_net'}), 'devname': u'tap75168900-ca', 'qbh_params': None, 'meta': {}, 'details': {}, 'address': u'fa:16:3e:35:47:1e', 'active': True, 'type': u'binding_failed', 'id': u'75168900-cafe-4cb1-9e20-1ed9cd33de44', 'qbg_params': None}) plug /usr/lib/python2.6/site-packages/nova/virt/libvirt/vif.py:592
2014-05-28 16:21:12.588 2724 ERROR nova.openstack.common.threadgroup [-] Unexpected vif_type=binding_failed
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup Traceback (most recent call last):
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/nova/openstack/common/threadgroup.py", line 117, in wait
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup x.wait()
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/nova/openstack/common/threadgroup.py", line 49, in wait
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup return self.thread.wait()
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/eventlet/greenthread.py", line 168, in wait
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup return self._exit_event.wait()
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/eventlet/event.py", line 116, in wait
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup return hubs.get_hub().switch()
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/eventlet/hubs/hub.py", line 187, in switch
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup return self.greenlet.switch()
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/eventlet/greenthread.py", line 194, in main
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup result = function(*args, **kwargs)
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/nova/openstack/common/service.py", line 483, in run_service
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup service.start()
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/nova/service.py", line 163, in start
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup self.manager.init_host()
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1026, in init_host
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup self._init_instance(context, instance)
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 884, in _init_instance
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup self.driver.plug_vifs(instance, net_info)
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 855, in plug_vifs
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup self.vif_driver.plug(instance, vif)
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/vif.py", line 616, in plug
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup _("Unexpected vif_type=%s") % vif_type)
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup NovaException: Unexpected vif_type=binding_failed
2014-05-28 16:21:12.588 2724 TRACE nova.openstack.common.threadgroup

Then the nova-compute exit

It`s no doubt that we should check and recover the instances when start the crashed compute node, while, i do not think the nova-compute service should be exit if calling _init_instance falied

zhangjialong (zhangjl)
Changed in nova:
assignee: nobody → zhangjialong (zhangjl)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/96150

Changed in nova:
status: New → In Progress
Revision history for this message
Peng Gu (gp-will) wrote :

Adding a general try and except block to ensure the n-cpu running is a good point. However I thought it is hopefully to resolve this bug in _init_instance, because I encountered the same problem, and I'm currently working on it.

Revision history for this message
haruka tanizawa (h-tanizawa) wrote :

Hi Peng !
How about showing your patch link if you have done ?

Revision history for this message
Sean Dague (sdague) wrote :

The upstream patch is stalled with 3 -1s

Changed in nova:
status: In Progress → Confirmed
importance: Undecided → High
assignee: zhangjialong (zhangjl) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Joe Gordon (<email address hidden>) on branch: master
Review: https://review.openstack.org/96150
Reason: Patch is stalled waiting for the author, looks like this has been abandoned. Feel free to restore.

wangpan (hzwangpan)
Changed in nova:
assignee: nobody → wangpan (hzwangpan)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/129158

Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/130096

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Wangpan (<email address hidden>) on branch: master
Review: https://review.openstack.org/130096

Revision history for this message
Eli Qiao (taget-9) wrote :

interesting, I got a similar bug https://launchpad.net/bugs/1390336
and I proposed a fix for it, but I think the 2 bugs fix difference scenario, both are needed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/129158
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=16ac50b1e760b7d20b840763b271a497b66ad5a5
Submitter: Jenkins
Branch: master

commit 16ac50b1e760b7d20b840763b271a497b66ad5a5
Author: Wangpan <email address hidden>
Date: Thu Nov 13 06:10:40 2014 +0000

    Compute: Catch binding failed exception while init host

    While compute starts it will init all instances,
    if an exception is raised from one instance
    (e.g NovaException during plug_vifs), then the
    compute process exits unexpectedly because of
    this unhandled exception.
    This commit changes the NovaException to more
    appropriate VirtualInterfacePlugException and
    catches it during init host, as well as the
    instance is set to error state, with this change
    the compute process can be started normally even
    if this VirtualInterfacePlugException is raised.

    Closes-bug: #1324041

    Change-Id: Ia584dba66affb86787e3069df19bd17b89cb5c49

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → kilo-1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/160527

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/160541

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/juno)

Reviewed: https://review.openstack.org/160527
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b904b0b294211a0875ee2714137772f5ad4012c0
Submitter: Jenkins
Branch: stable/juno

commit b904b0b294211a0875ee2714137772f5ad4012c0
Author: Wangpan <email address hidden>
Date: Thu Nov 13 06:10:40 2014 +0000

    Compute: Catch binding failed exception while init host

    While compute starts it will init all instances,
    if an exception is raised from one instance
    (e.g NovaException during plug_vifs), then the
    compute process exits unexpectedly because of
    this unhandled exception.
    This commit changes the NovaException to more
    appropriate VirtualInterfacePlugException and
    catches it during init host, as well as the
    instance is set to error state, with this change
    the compute process can be started normally even
    if this VirtualInterfacePlugException is raised.

    Closes-bug: #1324041

    Conflicts:
     nova/tests/unit/compute/test_compute_mgr.py

    Change-Id: Ia584dba66affb86787e3069df19bd17b89cb5c49
    (cherry picked from commit 16ac50b1e760b7d20b840763b271a497b66ad5a5)

tags: added: in-stable-juno
Alan Pevec (apevec)
tags: removed: in-stable-juno
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/160541
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e9cf07b96f57fed0d9f46bd8f24aac491b6cb976
Submitter: Jenkins
Branch: stable/icehouse

commit e9cf07b96f57fed0d9f46bd8f24aac491b6cb976
Author: Wangpan <email address hidden>
Date: Thu Nov 13 06:10:40 2014 +0000

    Compute: Catch binding failed exception while init host

    While compute starts it will init all instances,
    if an exception is raised from one instance
    (e.g NovaException during plug_vifs), then the
    compute process exits unexpectedly because of
    this unhandled exception.
    This commit changes the NovaException to more
    appropriate VirtualInterfacePlugException and
    catches it during init host, as well as the
    instance is set to error state, with this change
    the compute process can be started normally even
    if this VirtualInterfacePlugException is raised.

    Closes-bug: #1324041

    Conflicts:
     nova/tests/unit/compute/test_compute_mgr.py
     nova/virt/ironic/driver.py
     nova/virt/libvirt/vif.py

    Change-Id: Ia584dba66affb86787e3069df19bd17b89cb5c49
    (cherry picked from commit 16ac50b1e760b7d20b840763b271a497b66ad5a5)

Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.