Solidfire Delete Consistency Group Snapshots needs retries

Bug #1791594 reported by Mario Sommer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
In Progress
Undecided
Fernando Ferraz

Bug Description

There is a race condition between the Solidfire API calls "ListGroupSnapshots" and "DeleteGroupSnapshot", when deleting multiple Consistency Group Snapshots concurrently.
The Solidfire API throws an error "xSnapshotIDDoesNotExist" (see full trace below).
I already reported this bug to Netapp/Solidfire but meanwhile I would propose to just add xSnapshotIDDoesNotExist to "retryable_errors" in the Solidfire driver

2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server [req-f4e4f53b-5f07-4966-92b5-fb3160606dea 53033d9970644283b707c9ebdd82b6bb 15d30213519a4e55aa38821b1dc9f21c - default default] Exception during messag
e handling: SolidFireAPIException: API response: {u'id': None, u'error': {u'message': u'Snapshot 44093 does not exist.', u'code': 500, u'name': u'xSnapshotIDDoesNotExist'}}
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 160, in _process_incoming
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 213, in dispatch
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 3858, in delete_group_snapshot
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server snapshot.save()
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server self.force_reraise()
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 3817, in delete_group_snapshot
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server snapshots))
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/solidfire.py", line 1666, in delete_cgsnapshot
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server self._delete_cgsnapshot_by_name(snap_name)
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/solidfire.py", line 1579, in _delete_cgsnapshot_by_name
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server target = self._get_group_snapshot_by_name(snap_name)
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/solidfire.py", line 1566, in _get_group_snapshot_by_name
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server target_snaps = self._list_group_snapshots()
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/solidfire.py", line 1562, in _list_group_snapshots
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server version='7.0')
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/solidfire.py", line 121, in func_retry
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/solidfire.py", line 492, in _issue_api_request
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server raise exception.SolidFireAPIException(msg)
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server SolidFireAPIException: API response: {u'id': None, u'error': {u'message': u'Snapshot 44093 does not exist.', u'code': 500, u'name': u'xSnapshotIDDoesNotExist'}}

Revision history for this message
Jay Rubenstein (jarbassaidai) wrote :

In the first sentence, you mention concurrent deletion of multiple Consistency Group Snapshots
Was this done in a high availability active/active environment with multiple SolidFire-cinder drivers running against the same ElementOS/SolidFire cluster?

What command or statements were used concurrently, delete multiple Consistency Group Snapshots?

Revision history for this message
Anastasiya Zhyrkevich (anastzhyr) wrote :

It seems that def _issue_api_request
is already retryable. https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/solidfire.py#L584

Was there another fix expected?

Changed in cinder:
assignee: nobody → Fernando Ferraz (fernando-ferraz)
Changed in cinder:
status: New → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.