NetApp cDOT driver is too strict in delete workflows

Bug #1438893 reported by Yogesh
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Shared File Systems Service (Manila)
Fix Released
Medium
Clinton Knight

Bug Description

If a share or share server is not created successfully, sometimes Manila will not save the vserver info in the share or share server object. In such cases, when asked to delete such a share or share server, the cDOT driver should not protest about the lack of vserver info but instead should log a warning and not raise an exception (which leaves the object in an error_deleting state).

Steps to reproduce -
1. Create a manila share.
2. Delete a vserver(on which this share resides) from backend.
3. Now try to delete this share.

Actual Result -
System throws below error -

2015-03-31 15:38:01.635 ERROR oslo_messaging.rpc.dispatcher (req-0fa64495-cd54-421d-a6ee-336883c800e6 c5822027abb6475e847630ac37f520ba bbf877ebe7f34efe87709b4cb0d0172c) Exception during message handling: Vserver os_1b201723-7cd3-456e-87ab-7013cc734c8d is not available.

Detailed logs -
http://paste.openstack.org/show/197735/

Revision history for this message
Ben Swartzlander (bswartz) wrote :

This is pretty serious bug because it prevents the development of negative tests. We need to be able to intentionally fail the creation of various things within Manila, and then to reliably clean up the garbage objects that get created.

affects: cinder → manila
Changed in manila:
importance: Undecided → Medium
milestone: none → kilo-rc1
assignee: nobody → NetApp (netapp)
status: New → Confirmed
status: Confirmed → Triaged
Changed in manila:
assignee: NetApp (netapp) → Clinton Knight (clintonk)
summary: - Manila cDOT driver is too strict in delete workflows
+ NetApp cDOT driver is too strict in delete workflows
Revision history for this message
Valeriy Ponomaryov (vponomaryov) wrote :

In most cases cDOT has created resources when something fails. Just "warning" will leave orphaned resources.

Correct fix would be forced saving of vservername.

Here is example from master branch: https://github.com/openstack/manila/blob/199692d6/manila/share/drivers/service_instance.py#L546

Revision history for this message
Clinton Knight (clintonk) wrote :

If storage resources (shares, share servers, snapshots, etc.) are actually created, then of course we don't want to orphan them during a delete workflow. If the data provided to the driver during a delete call suggests that the storage resource exists (or existed at one time), then the driver should not take that lightly and should do everything it can to locate the resource and force its deletion.

Stepping back a little, there are a few classes of issues here. As part of this bug, I am auditing the cDOT driver for each of the following and making the delete workflows more resilient wherever possible.

1. Storage resources exist now:

Sometimes, an recoverable error during deletion may be safely caught and ignored. For example, if a cDOT share is already offline when manila goes to set it offline in preparation for deleting it, manila should catch the error and continue with deleting the storage resource instead of failing the delete workflow.

2. Storage resources existed earlier but don't exist now:

Given valid info (share name, snapshot name) as well as a valid share server *that is currently responding to API commands*, then the driver can be reasonably certain the resource no longer exists (perhaps deleted by an admin outside of manila, or a snapshot that was deleted automatically under a cDOT snapshot deletion policy) and should allow the delete workflow to continue, lest the admin have to remove the object from the manila database manually.

3. Storage resource never existed:

It is possible (in some cases, arguably, due to bugs elsewhere in manila) to have create workflows leave manila objects in error state without any storage resources ever having been created. For example, as of this writing, attempting to create a share on a multi-SVM driver without specifying a share network will leave both a new share server and a new share object in error state with no identifying details (i.e. the share object has no share server ID set or teardown_server is called with no server details). The lack of identifying info (as opposed to having left 'fail_safe_data' behind) is useful in distinguishing these cases from those described above in cases #1 or #2.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to manila (master)

Fix proposed to branch: master
Review: https://review.openstack.org/171789

Changed in manila:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to manila (master)

Reviewed: https://review.openstack.org/171789
Committed: https://git.openstack.org/cgit/openstack/manila/commit/?id=6999c8d26bdef7f3a8795995e5605f8b85a7c584
Submitter: Jenkins
Branch: master

commit 6999c8d26bdef7f3a8795995e5605f8b85a7c584
Author: Clinton Knight <email address hidden>
Date: Fri Apr 3 11:08:58 2015 -0400

    NetApp cDOT driver is too strict in delete workflows

    If a share or share server is not created successfully,
    sometimes Manila will not save the vserver info in the
    share or share server object. In such cases, when asked
    to delete such a share or share server, the cDOT driver
    should not protest about the lack of vserver info but
    instead should log a warning and not raise an exception
    (which leaves the object in an error_deleting state).

    This patch addresses a number of issues in the delete,
    share-server-delete, and snapshot-delete workflows where
    the cDOT driver could unnecessarily raise an exception
    when it should merely do nothing and allow the workflow
    to proceed.

    Change-Id: I54cf96b8a24ac5272b37bce2f5118551504a1699
    Closes-Bug: #1438893

Changed in manila:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in manila:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in manila:
milestone: kilo-rc1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.