ZFSSA volume driver should catch REST API failure

Bug #1537914 reported by iain MacDonnell
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
iain MacDonnell

Bug Description

The fix for #1472412 looks quite wreckless to me. A transient failure of REST API access, such as a network outage, could cause a volume to be deleted in the cinder database but the LUN left lying around on the ZFSSA (consuming space) for ever.

A better solution would be for get_lun() to check for ret.status==restclient.Status.NOT_FOUND and raise a exception.NotFound() (any other non-OK status should still raise the exception.VolumeBackendAPIException()), then delete_volume() should catch the exception.NotFound and return happily, but raise all other exceptions.

Kedar (kedar-vidvans)
tags: added: oracle-zfssa
tags: added: zfssa
Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Does this still need to be addressed?

Revision history for this message
iain MacDonnell (imacdonn) wrote :

I think the code has changed a bit since I submitted this report. I need to review the latest version. I think it may still have issues.

Revision history for this message
iain MacDonnell (imacdonn) wrote :
Download full text (3.8 KiB)

Actually there appears to be a regression here. In the current code, zfssarest.get_lun() throws a exception.VolumeNotFound if the LUN doesn't exist (https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/zfssa/zfssarest.py#L746) but zfssaiscsi.delete_volume() only catches a exception.VolumeBackendAPIException (https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/zfssa/zfssaiscsi.py#L337)

So we end up with:

2016-10-13 21:37:21.454 26 ERROR cinder.volume.drivers.zfssa.zfssarest [req-abe82fc1-2e99-486b-b146-573db1f7f5ef b2ae6b7bdac142ddb708a3550f61d998 66234fea2ccc42398a1ae5300c594d49 - default default] Error Getting Volume: 3385ece4-0d9f-45d2-8b5b-f2a2096a65e4 on Pool: pool-609 Project: imot04 Return code: 404 Message: Not Found.
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server [req-abe82fc1-2e99-486b-b146-573db1f7f5ef b2ae6b7bdac142ddb708a3550f61d998 66234fea2ccc42398a1ae5300c594d49 - default default] Exception during message handling
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/cinder/coordination.py", line 285, in wrapped
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server return f(*a, **k)
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/cinder/volume/manager.py", line 759, in delete_volume
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server 'error_deleting')
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server self.force_reraise()
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib/python2.7/site-packages/cinder/volume/manager.py", line 745, in delete_volume
2016-10-13 21:37:21.629 26 ERROR oslo_messaging.rpc.server self.driver.delete_volume(volume)
2016-10-13 21:37:21.629 ...

Read more...

Matt Smith (mss-4)
tags: added: drivers
Changed in cinder:
assignee: nobody → iain MacDonnell (imacdonn)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/484956

Changed in cinder:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/484956
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=e68b879ef0fe8668eeac396505328b27a1e00c8b
Submitter: Jenkins
Branch: master

commit e68b879ef0fe8668eeac396505328b27a1e00c8b
Author: iain MacDonnell <email address hidden>
Date: Tue Jul 18 21:35:09 2017 +0000

    ZFSSA iSCSI delete volume with non-existent LUN

    Under some circumstances, a volume can exist in cinder for which there
    is no corresponding LUN on the ZFSSA. This fix allows deletion of the
    volume in such cases, by catching the NOT_FOUND status from the ZFSSA
    REST API and translating it to a VolumeNotFound exception, and then
    considering that exception to be non-fatal, whilst passing up any
    other exception that may be encountered. This way, some other failure
    in using the REST API to obtain LUN information (e.g. transient
    network failure) will not cause the cinder volume to get blindly
    deleted, which would have left behind an orphaned LUN on the ZFSSA.

    Change-Id: I332163fa35a2aa3f6f921b805fa97c803ec10724
    Closes-Bug: #1537914

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 11.0.0.0b3

This issue was fixed in the openstack/cinder 11.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.