Solidfire Delete Consistency Group Snapshots needs retries

Bug #1791594 reported by Mario Sommer
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Undecided
Fernando Ferraz

Bug Description

There is a race condition between the Solidfire API calls "ListGroupSnapshots" and "DeleteGroupSnapshot", when deleting multiple Consistency Group Snapshots concurrently.
The Solidfire API throws an error "xSnapshotIDDoesNotExist" (see full trace below).
I already reported this bug to Netapp/Solidfire but meanwhile I would propose to just add xSnapshotIDDoesNotExist to "retryable_errors" in the Solidfire driver

2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server [req-f4e4f53b-5f07-4966-92b5-fb3160606dea 53033d9970644283b707c9ebdd82b6bb 15d30213519a4e55aa38821b1dc9f21c - default default] Exception during messag
e handling: SolidFireAPIException: API response: {u'id': None, u'error': {u'message': u'Snapshot 44093 does not exist.', u'code': 500, u'name': u'xSnapshotIDDoesNotExist'}}
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 160, in _process_incoming
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 213, in dispatch
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 3858, in delete_group_snapshot
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server snapshot.save()
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server self.force_reraise()
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/manager.py", line 3817, in delete_group_snapshot
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server snapshots))
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/solidfire.py", line 1666, in delete_cgsnapshot
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server self._delete_cgsnapshot_by_name(snap_name)
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/solidfire.py", line 1579, in _delete_cgsnapshot_by_name
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server target = self._get_group_snapshot_by_name(snap_name)
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/solidfire.py", line 1566, in _get_group_snapshot_by_name
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server target_snaps = self._list_group_snapshots()
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/solidfire.py", line 1562, in _list_group_snapshots
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server version='7.0')
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/solidfire.py", line 121, in func_retry
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/cinder/volume/drivers/solidfire.py", line 492, in _issue_api_request
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server raise exception.SolidFireAPIException(msg)
2018-09-10 10:40:50.299 9648 ERROR oslo_messaging.rpc.server SolidFireAPIException: API response: {u'id': None, u'error': {u'message': u'Snapshot 44093 does not exist.', u'code': 500, u'name': u'xSnapshotIDDoesNotExist'}}

Revision history for this message
Jay Rubenstein (jarbassaidai) wrote :

In the first sentence, you mention concurrent deletion of multiple Consistency Group Snapshots
Was this done in a high availability active/active environment with multiple SolidFire-cinder drivers running against the same ElementOS/SolidFire cluster?

What command or statements were used concurrently, delete multiple Consistency Group Snapshots?

Revision history for this message
Anastasiya Zhyrkevich (anastzhyr) wrote :

It seems that def _issue_api_request
is already retryable. https://github.com/openstack/cinder/blob/master/cinder/volume/drivers/solidfire.py#L584

Was there another fix expected?

Changed in cinder:
assignee: nobody → Fernando Ferraz (fernando-ferraz)
Changed in cinder:
status: New → In Progress
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers