If instance's disk is not avilable when nova-compute service start which would cause the service failed

Bug #1522667 reported by jingtao liang on 2015-12-04
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Unassigned

Bug Description

version : 2015.1

details:

nova-compute.service starts,It will do something like update_available_resource .Including get_disk_over_committed_size_total and
get_instance_disk_info .If the instance's iscsi_path is not available or the ip is unable to connect.That will cause the nova-comute.service inactive.I think It's not reasonable for user.

logs:

2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup Traceback (most recent call last):
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/openstack/commot
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup x.wait()
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/openstack/commot
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup return self.thread.wait()
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/eventlet/greenthreadt
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup return self._exit_event.wait()
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/eventlet/event.py", t
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup return hubs.get_hub().switch()
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.pyh
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup return self.greenlet.switch()
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/eventlet/greenthreadn
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup result = function(*args, **kwargs)
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/openstack/commoe
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup service.start()
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/service.py", lit
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup self.manager.pre_start_hook()
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/compute/managerk
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup self.update_available_resource(nova.context.get_admin_con)
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/compute/managere
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup rt.update_available_resource(context)
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/compute/resource
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup resources = self.driver.get_available_resource(self.noden)
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/dre
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup disk_over_committed = self._get_disk_over_committed_size_)
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/drl
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup self._get_instance_disk_info(dom.name(), xml))
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/dro
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup dk_size = lvm.get_volume_size(path)
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/lve
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup run_as_root=True)
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/virt/libvirt/ute
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup return utils.execute(*args, **kwargs)
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/utils.py", linee
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup return processutils.execute(*cmd, **kwargs)
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/oslo_concurrency/proe
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup cmd=sanitized_cmd)
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup ProcessExecutionError: Unexpected error while running command.
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup Command: sudo nova-rootwrap /etc/nova/rootwrap.conf blockdev 0
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup Exit code: 1
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup Stdout: u''
2015-12-03 17:21:23.508 15646 TRACE nova.openstack.common.threadgroup Stderr: u'blockdev: cannot open /dev/disk/by-path/ip-162.161.1.208:3260-iscsi-iqn.2000-09.com.fujitsu:storage-system.eternus-dxl:002859c4-lun-0: No such device or address\n'

added:
I have check the error . It is because the ip-162.161.1.208:3260 is unable to connect. If the ip is reachable .The sevice can restart successfully.

please check the bugs. tks

Michael Still (mikal) wrote :
Download full text (6.0 KiB)

Despite this mangled traceback, I was able to re-create this with devstack on a public cloud instance. I followed these steps:

- boot an instance
- stop the instance (nova stop)
- remove the instance's root disk from /opt/stack/data/nova/instances/...
- start the instance (nova start)

This is what I get:

2015-12-07 21:59:20.622 ERROR oslo_messaging.rpc.dispatcher [req-cadd058d-3ac7-424e-9e97-8386c8afc079 admin demo] Exception during message handling: [Errno 2] No such file or directory: '/opt/stack/data/nova/instances/f9a712f1-bd23-426c-846d-3d4cda57e342/disk'
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher executor_callback))
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher executor_callback)
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 129, in _do_dispatch
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/exception.py", line 105, in wrapped
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher payload)
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 204, in __exit__
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/exception.py", line 88, in wrapped
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher return f(self, context, *args, **kw)
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 349, in decorated_function
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher LOG.warning(msg, e, instance=instance)
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 204, in __exit__
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 322, in decorated_function
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 399, in decorated_function
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher return function(self, context, *args, **kwargs)
2015-12-07 21:59:20.622 TRACE oslo_messaging.rpc.dispatcher ...

Read more...

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Michael Still (mikalstill)
Michael Still (mikal) on 2015-12-07
tags: added: libvirt
Tardis Xu (xiaoxubeii) wrote :

Hi jingtao liang,

Did your instance boot from volume(iscsi) or local disk? I have checked if the instance boots from local disk attached a iscsi cinder volume, disconnects the iscsi ip, and it can start successfully. My nova code is master.

Changed in nova:
status: Confirmed → Incomplete
jingtao liang (liang-jingtao) wrote :

My instance boot from volume(iscsi). And I also attached a volume(iscsi) to the instance. I mean if the iscsi_path is not available such as the command "blockdev --getsize" can't get the size of volume(iscsi). If we restart the nova-compute.service(not the instance).the
nova-compute service can‘t active.

Tardis Xu (xiaoxubeii) wrote :

Hi jingtao liang,

I cannot reproduce your bug in my devstack(master), maybe it has been solved.

Michael Still (mikal) wrote :

@Tardis: did you read my comment above? I can recreate this problem, or at least one very similar to it on devstack.

Fix proposed to branch: master
Review: https://review.openstack.org/258243

Changed in nova:
status: Incomplete → In Progress

Ok, the LVM case here has a try / except wrapper in master. The local file option does not, so I've added a check for that. I've also added an error handler at the compute manager layer so the instance will go into error instead of silently failing.

yuyafei (yu-yafei) on 2016-05-24
summary: - when nova-compute.service start .If the instance's disk is not
- avilable.It will cause the service failed.That is not reasonable
+ If instance's disk is not avilable when nova-compute service start which
+ would cause the service failed

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/258254
Reason: This patch is quite old, so I am abandoning it to keep the review queue manageable. Feel free to restore the change if you're still interested in working on it.

Matt Riedemann (mriedem) wrote :

At this point I'm going to consider this expired. If someone can recreate on master then we can revive.

Changed in nova:
status: In Progress → Invalid
assignee: Michael Still (mikal) → nobody

Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https://review.openstack.org/258243
Reason: This patch has been sitting unchanged for more than 12 weeks. I am therefore going to abandon it to keep the nova review queue sane. Please feel free to restore the change if you're still working on it.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers