nova-compute restart over ande over again when the right volume path cannot get volume size

Bug #1524726 reported by nail-zte
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Low
Unassigned

Bug Description

The method _get_instance_disk_info() in libvirt driver will get volume size from lvm.py using logical volume path.When the volume path is not exist,it will raise “VolumeBDMPathNotFound”,while "ProcessExecutionError" directly with other problems.Related codes as following:

try:
        out, _err = utils.execute('blockdev', '--getsize64', path,
                                  run_as_root=True)
except processutils.ProcessExecutionError:
        if not utils.path_exists(path):
            raise exception.VolumeBDMPathNotFound(path=path)
        else:
            raise

So, in this case, if the logical volume path is exactly exists, but something wrong with the backend to get volume size, it will raise "ProcessExecutionError". The point is that libvirt driver would not catch the "ProcessExecutionError" exception, this nova-compute service down finally.Then because of guard.sevice, nova-compute will restart over and over again.

Code in libvirt driver as following:
        dk_size = lvm.get_volume_size(path)

We think the error appeared in backend should not infact the nova-compute service.
we think we can catch ProcessExecutionError in libvirt dirver and set dk_size = 0.

This bug appeared in Kilo as well as Liberty.

nail-zte (nail-i)
description: updated
Revision history for this message
Zhihai Song (szhsong) wrote :

Would you please paste the compute.log

Revision history for this message
Tardis Xu (xiaoxubeii) wrote :

Please specify the reproduce step

Changed in nova:
status: New → Incomplete
Revision history for this message
nail-zte (nail-i) wrote :
Download full text (5.3 KiB)

Here is the compute.log:

2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup Traceback (most recent call last):
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/openstack/common/threadgroup.py", line 145, in wait
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup x.wait()
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/openstack/common/threadgroup.py", line 47, in wait
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup return self.thread.wait()
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 175, in wait
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup return self._exit_event.wait()
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/eventlet/event.py", line 121, in wait
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup return hubs.get_hub().switch()
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in switch
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup return self.greenlet.switch()
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/eventlet/greenthread.py", line 214, in main
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup result = function(*args, **kwargs)
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/openstack/common/service.py", line 497, in run_service
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup service.start()
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/service.py", line 196, in start
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup self.manager.pre_start_hook()
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 1465, in pre_start_hook
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup self.update_available_resource(nova.context.get_admin_context())
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 6888, in update_available_resource
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup rt.update_available_resource(context)
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/nova/compute/resource_tracker.py", line 448, in update_available_resource
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup resources = self.driver.get_available_resource(self.nodename)
2015-12-03 16:39:18.326 1506 TRACE nova.openstack.common.threadgroup File "/usr/lib/python2.7/site-packages/...

Read more...

nail-zte (nail-i)
Changed in nova:
assignee: nobody → nail-zte (nail-i)
Revision history for this message
Lee Yarwood (lyarwood) wrote :

@nail-i Are you still working on this? IMHO this is valid, a single volume being present but inaccessible shouldn't block nova-compute from starting correctly and we should at least be logging why blockdev is failing here.

Changed in nova:
status: Incomplete → Confirmed
Revision history for this message
Sujitha (sujitha-neti) wrote :

There are no open reviews for this bug report since a long time. To signal that to other contributors which might provide patches for this bug, I'm removing the assignee.

nail-zte: Feel free to add yourself back as assignee and push a review for it.

Changed in nova:
assignee: nail-zte (nail-i) → nobody
tags: added: liberty-backport-potential
tags: added: libvirt
tags: added: volumes
Revision history for this message
John Garbutt (johngarbutt) wrote :

I am curious how we got into the state where the path was invalid. That suggests something else is going wrong here.

Changed in nova:
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.