cinder should not fail to delete a volume already deleted on zfssa

Bug #1472412 reported by Albert White
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
John Griffith

Bug Description

After running some rally scenarios on my main openstack (Solaris, Juno, mix of x86 & sparc, zfssa for storage) I noticed in the admin panel in horizon that several volumes created by the test were still present but in an ERROR_DELETING state.

Setting these to AVAILABLE and trying to delete them manually failed and they changed back to the ERROR_DELETING state.

The volumes were created on an attached ZFSSA (7320) however the volumes were no longer present on that device.

Looking in the cinder-volume log on the host that has the zfssa cinder driver we find a trace of the error:

2015-07-07 22:35:48.599 19035 INFO cinder.volume.manager [req-f992a7a9-cced-434c-b8f6-8ef06fc55ad5 03227578c38345cea638f864a6491fdb fe45765544ad46b2ac28d1ef0db6e541 - - -] volume a6dd4dd8-4ab6-41b8-9286-88a58797c5f9: deleting
2015-07-07 22:35:50.231 19035 ERROR cinder.volume.drivers.zfssa.zfssarest [req-f992a7a9-cced-434c-b8f6-8ef06fc55ad5 03227578c38345cea638f864a6491fdb fe45765544ad46b2ac28d1ef0db6e541 - - -] Error Setting Volume: volume-a6dd4dd8-4ab6-41b8-9286-88a58797c5f9 to InitiatorGroup: com.sun.ms.vss.hg.maskAll Pool: swimming Project: openstack Return code: 404 Message: Not Found.
2015-07-07 22:35:50.262 19035 ERROR cinder.volume.drivers.zfssa.zfssarest [req-f992a7a9-cced-434c-b8f6-8ef06fc55ad5 03227578c38345cea638f864a6491fdb fe45765544ad46b2ac28d1ef0db6e541 - - -] Error Getting Volume: volume-a6dd4dd8-4ab6-41b8-9286-88a58797c5f9 on Pool: swimming Project: openstack Return code: 404 Message: Not Found.
2015-07-07 22:35:50.315 19035 ERROR oslo.messaging.rpc.dispatcher [req-f992a7a9-cced-434c-b8f6-8ef06fc55ad5 03227578c38345cea638f864a6491fdb fe45765544ad46b2ac28d1ef0db6e541 - - -] Exception during message handling: Bad or unexpected response from the storage volume backend API: Error Getting Volume: volume-a6dd4dd8-4ab6-41b8-9286-88a58797c5f9 on Pool: swimming Project: openstack Return code: 404 Message: Not Found.
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher Traceback (most recent call last):
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/vendor-packages/oslo/messaging/rpc/dispatcher.py", line 134, in _dispatch_and_reply
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher incoming.message))
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/vendor-packages/oslo/messaging/rpc/dispatcher.py", line 177, in _dispatch
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher return self._do_dispatch(endpoint, method, ctxt, args)
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/vendor-packages/oslo/messaging/rpc/dispatcher.py", line 123, in _do_dispatch
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher result = getattr(endpoint, method)(ctxt, **new_args)
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/vendor-packages/osprofiler/profiler.py", line 105, in wrapper
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher return f(*args, **kwargs)
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/vendor-packages/cinder/volume/manager.py", line 127, in lvo_inner1
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher return lvo_inner2(inst, context, volume_id, **kwargs)
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/vendor-packages/cinder/openstack/common/lockutils.py", line 233, in inner
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher retval = f(*args, **kwargs)
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/vendor-packages/cinder/volume/manager.py", line 126, in lvo_inner2
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher return f(*_args, **_kwargs)
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/vendor-packages/cinder/volume/manager.py", line 455, in delete_volume
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher {'status': 'error_deleting'})
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/vendor-packages/cinder/openstack/common/excutils.py", line 82, in __exit__
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher six.reraise(self.type_, self.value, self.tb)
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/vendor-packages/cinder/volume/manager.py", line 444, in delete_volume
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher self.driver.delete_volume(volume_ref)
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/vendor-packages/osprofiler/profiler.py", line 105, in wrapper
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher return f(*args, **kwargs)
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/vendor-packages/cinder/volume/drivers/zfssa/zfssaiscsi.py", line 212, in delete_volume
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher volume['name'])
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.6/vendor-packages/cinder/volume/drivers/zfssa/zfssarest.py", line 446, in get_lun
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher raise exception.VolumeBackendAPIException(data=exception_msg)
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Error Getting Volume: volume-a6dd4dd8-4ab6-41b8-9286-88a58797c5f9 on Pool: swimming Project: openstack Return code: 404 Message: Not Found.
2015-07-07 22:35:50.315 19035 TRACE oslo.messaging.rpc.dispatcher
2015-07-07 22:35:55.551 19035 INFO cinder.volume.manager [req-4af9bdec-7494-4c34-875c-d8d55b8fc3d7 - - - - -] Updating volume status

So there may be a race here somewhere where the volume got deleted but cinder still has it.

Anyway, it seems that we should handle this a bit better. I'd suggest in this particular case where we have established that connection to the zfssa is working correctly that instead of throwing an exception that will cause a stack trace and an ERROR_DELETING state and have the user have to hunt through logs that zfssarest detects that the volume is not present (assumed already deleted) and allows cinder to delete it from its database of current volumes.

Tags: drivers zfssa
Albert White (albertw)
tags: added: zfssa
tags: added: drivers
Changed in cinder:
assignee: nobody → John Griffith (john-griffith)
status: New → In Progress
Albert White (albertw)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/199330
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=78e25a577fa4c5e47c871c435898c847423d7ac6
Submitter: Jenkins
Branch: master

commit 78e25a577fa4c5e47c871c435898c847423d7ac6
Author: John Griffith <email address hidden>
Date: Tue Jul 7 16:26:21 2015 -0600

    Handle volume not found on zfssa volume delete

    The zfssa driver doesn't handle cases where a volume may not
    exist on the backend when a delete call is sent. The result
    is a stack-trace in the logs, sometimes a crash of the service
    and an inability to clean up volumes from Cinder's perspective.

    This patch doesn't address the root cause of the issue, but
    it does handle the case more gracefully by catching the
    exception, logging an error and returning.

    This way the volume can be cleaned up on the Cinder side and
    the operator still has an indication that something went wrong.
    This is a common pattern for most of the drivers in Cinder.

    Change-Id: I09725b29effb79450d010949527bd54329919f52
    Closes-Bug: #1472412

Changed in cinder:
status: In Progress → Fix Committed
Changed in cinder:
milestone: none → liberty-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: liberty-2 → 7.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.