NetApp cDOT driver reports migration cancelation was successful even if it wasn't

Bug #1688620 reported by Goutham Pacha Ravi
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Shared File Systems Service (Manila)
Fix Released
Low
Daniel Tapia

Bug Description

This is a hard to trigger bug, and probably easier to catch on a very busy NetApp cDOT back end:

* Configure the NetApp cDOT driver (DHSS True/False, doesn't matter)
* Create a share
    $ manila create nfs 1 --share-type gotonetapp --name myshare
* Start migration for the share

   $ manila migration-start myshare pool2000 --nondisruptive True --preserve-metadata True --preserve-snapshots True --writable True
* Cancel the migration
   $ manila migration-cancel myshare
* Delete the share as soon as it transitions to "available" state

   $ manila delete myshare

The deletion could fail with the following error in m-share :

"share_221a269a_f64a_4d3a_b91e_6217686b978e” in Vserver “vserver_eb31a90e" cannot be offlined because a volume move operation is in progress.

RCA:
volume-move-trigger-abort is an asynchronous API. It will return with "passed"/success when invoked but the volume on the back end isn't ready for any further interaction yet.

Changed in manila:
importance: Undecided → Low
assignee: nobody → Goutham Pacha Ravi (gouthamr)
description: updated
tags: added: netapp
Changed in manila:
assignee: Goutham Pacha Ravi (gouthamr) → nobody
Tom Barron (tpb)
tags: added: driver migration
Jason Grosso (jgrosso)
Changed in manila:
assignee: nobody → Jason Grosso (jgrosso)
status: New → Triaged
Changed in manila:
assignee: Jason Grosso (jgrosso) → Naresh Kumar Gunjalli (nareshkumarg)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to manila (master)

Fix proposed to branch: master
Review: https://review.opendev.org/718233

Changed in manila:
assignee: Naresh Kumar Gunjalli (nareshkumarg) → Daniel Tapia (danielarthurt)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to manila (master)

Reviewed: https://review.opendev.org/718233
Committed: https://git.openstack.org/cgit/openstack/manila/commit/?id=0ee414082318d22b3ed19acad6a479cb105c30e5
Submitter: Zuul
Branch: master

commit 0ee414082318d22b3ed19acad6a479cb105c30e5
Author: danielarthurt <email address hidden>
Date: Mon Apr 6 13:26:13 2020 +0000

    [NetApp] Fix falsely report migration cancelation success

    NetApp ONTAP share delete operation can fail sometimes when is triggered
    immediately after migration cancelation on a overloaded NetApp backend.
    Migration cancelation invokes "abort_volume_move" which is an asynchronous
    API. If share delete operation is requested immediately after call the
    former API, it fails because the "abort_volume_move" is still in progress.
    Now NetApp cDOT driver checks, for a period of time, if the
    ``volume-move-abort`` operation has ended before report migration
    cancelation success.

    Change-Id: I76e11fef27c9723f019cfdfdc6ea86878db78776
    Closes-Bug: #1688620

Changed in manila:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to manila (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/730383

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to manila (master)

Reviewed: https://review.opendev.org/735384
Committed: https://git.openstack.org/cgit/openstack/manila/commit/?id=a0dd86a98788f7e2d1ca55be26c2a3dea4e36f57
Submitter: Zuul
Branch: master

commit a0dd86a98788f7e2d1ca55be26c2a3dea4e36f57
Author: dtapia <email address hidden>
Date: Fri Jun 12 18:52:03 2020 +0000

    [NetApp] Updating the release note for bugfix 1688620

    This patch update the release note of the bugfix for the bug 1688620
    explaining better about the added configuration option and its use.

    Related-Bug: #1688620
    Change-Id: Idf9730bfc9604f906b10e58f5b767b4030d8f0db

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to manila (stable/ussuri)

Reviewed: https://review.opendev.org/730383
Committed: https://git.openstack.org/cgit/openstack/manila/commit/?id=71c3a08ead50e8aa68cef3a26c8f8485922895b6
Submitter: Zuul
Branch: stable/ussuri

commit 71c3a08ead50e8aa68cef3a26c8f8485922895b6
Author: danielarthurt <email address hidden>
Date: Mon Apr 6 13:26:13 2020 +0000

    [NetApp] Fix falsely report migration cancelation success

    NetApp ONTAP share delete operation can fail sometimes when is triggered
    immediately after migration cancelation on a overloaded NetApp backend.
    Migration cancelation invokes "abort_volume_move" which is an asynchronous
    API. If share delete operation is requested immediately after call the
    former API, it fails because the "abort_volume_move" is still in progress.
    Now NetApp cDOT driver checks, for a period of time, if the
    ``volume-move-abort`` operation has ended before report migration
    cancelation success.

    This patch squash the following commit that improves the release note
    for this fix:
    [NetApp] Updating the release note for bugfix 1688620
    (cherry picked from commit a0dd86a98788f7e2d1ca55be26c2a3dea4e36f57)

    Change-Id: I76e11fef27c9723f019cfdfdc6ea86878db78776
    Closes-Bug: #1688620
    (cherry picked from commit 0ee414082318d22b3ed19acad6a479cb105c30e5)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to manila (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/743565

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to manila (stable/train)

Reviewed: https://review.opendev.org/743565
Committed: https://git.openstack.org/cgit/openstack/manila/commit/?id=c915918f3c6ac608469263dd05e3f2b5eb61dc4e
Submitter: Zuul
Branch: stable/train

commit c915918f3c6ac608469263dd05e3f2b5eb61dc4e
Author: danielarthurt <email address hidden>
Date: Mon Apr 6 13:26:13 2020 +0000

    [NetApp] Fix falsely report migration cancelation success

    NetApp ONTAP share delete operation can fail sometimes when is triggered
    immediately after migration cancelation on a overloaded NetApp backend.
    Migration cancelation invokes "abort_volume_move" which is an asynchronous
    API. If share delete operation is requested immediately after call the
    former API, it fails because the "abort_volume_move" is still in progress.
    Now NetApp cDOT driver checks, for a period of time, if the
    ``volume-move-abort`` operation has ended before report migration
    cancelation success.

    This patch squash the following commit that improves the release note
    for this fix:
    [NetApp] Updating the release note for bugfix 1688620
    (cherry picked from commit a0dd86a98788f7e2d1ca55be26c2a3dea4e36f57)

    Change-Id: I76e11fef27c9723f019cfdfdc6ea86878db78776
    Closes-Bug: #1688620
    (cherry picked from commit 0ee414082318d22b3ed19acad6a479cb105c30e5)
    (cherry picked from commit 71c3a08ead50e8aa68cef3a26c8f8485922895b6)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to manila (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/743718

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to manila (stable/stein)

Reviewed: https://review.opendev.org/743718
Committed: https://git.openstack.org/cgit/openstack/manila/commit/?id=1199489d998a729d7fcf7cf7f88ab34b6464d46b
Submitter: Zuul
Branch: stable/stein

commit 1199489d998a729d7fcf7cf7f88ab34b6464d46b
Author: danielarthurt <email address hidden>
Date: Mon Apr 6 13:26:13 2020 +0000

    [NetApp] Fix falsely report migration cancelation success

    NetApp ONTAP share delete operation can fail sometimes when is triggered
    immediately after migration cancelation on a overloaded NetApp backend.
    Migration cancelation invokes "abort_volume_move" which is an asynchronous
    API. If share delete operation is requested immediately after call the
    former API, it fails because the "abort_volume_move" is still in progress.
    Now NetApp cDOT driver checks, for a period of time, if the
    ``volume-move-abort`` operation has ended before report migration
    cancelation success.

    This patch squash the following commit that improves the release note
    for this fix:
    [NetApp] Updating the release note for bugfix 1688620
    (cherry picked from commit a0dd86a98788f7e2d1ca55be26c2a3dea4e36f57)

    Change-Id: I76e11fef27c9723f019cfdfdc6ea86878db78776
    Closes-Bug: #1688620
    (cherry picked from commit 0ee414082318d22b3ed19acad6a479cb105c30e5)
    (cherry picked from commit 71c3a08ead50e8aa68cef3a26c8f8485922895b6)
    (cherry picked from commit c915918f3c6ac608469263dd05e3f2b5eb61dc4e)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to manila (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/743855

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to manila (stable/rocky)

Reviewed: https://review.opendev.org/743855
Committed: https://git.openstack.org/cgit/openstack/manila/commit/?id=b8fc8feb070a61b957500a51a95af153da53a02c
Submitter: Zuul
Branch: stable/rocky

commit b8fc8feb070a61b957500a51a95af153da53a02c
Author: danielarthurt <email address hidden>
Date: Mon Apr 6 13:26:13 2020 +0000

    [NetApp] Fix falsely report migration cancelation success

    NetApp ONTAP share delete operation can fail sometimes when is triggered
    immediately after migration cancelation on a overloaded NetApp backend.
    Migration cancelation invokes "abort_volume_move" which is an asynchronous
    API. If share delete operation is requested immediately after call the
    former API, it fails because the "abort_volume_move" is still in progress.
    Now NetApp cDOT driver checks, for a period of time, if the
    ``volume-move-abort`` operation has ended before report migration
    cancelation success.

    This patch squash the following commit that improves the release note
    for this fix:
    [NetApp] Updating the release note for bugfix 1688620
    (cherry picked from commit a0dd86a98788f7e2d1ca55be26c2a3dea4e36f57)

    Change-Id: I76e11fef27c9723f019cfdfdc6ea86878db78776
    Closes-Bug: #1688620
    (cherry picked from commit 0ee414082318d22b3ed19acad6a479cb105c30e5)
    (cherry picked from commit 71c3a08ead50e8aa68cef3a26c8f8485922895b6)
    (cherry picked from commit c915918f3c6ac608469263dd05e3f2b5eb61dc4e)
    (cherry picked from commit 1199489d998a729d7fcf7cf7f88ab34b6464d46b)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to manila (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/744724

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to manila (stable/queens)

Reviewed: https://review.opendev.org/744724
Committed: https://git.openstack.org/cgit/openstack/manila/commit/?id=826cbfa7cee207914a149c24a3a8755ca04583e5
Submitter: Zuul
Branch: stable/queens

commit 826cbfa7cee207914a149c24a3a8755ca04583e5
Author: danielarthurt <email address hidden>
Date: Mon Apr 6 13:26:13 2020 +0000

    [NetApp] Fix falsely report migration cancelation success

    NetApp ONTAP share delete operation can fail sometimes when is triggered
    immediately after migration cancelation on a overloaded NetApp backend.
    Migration cancelation invokes "abort_volume_move" which is an asynchronous
    API. If share delete operation is requested immediately after call the
    former API, it fails because the "abort_volume_move" is still in progress.
    Now NetApp cDOT driver checks, for a period of time, if the
    ``volume-move-abort`` operation has ended before report migration
    cancelation success.

    This patch squash the following commit that improves the release note
    for this fix:
    [NetApp] Updating the release note for bugfix 1688620
    (cherry picked from commit a0dd86a98788f7e2d1ca55be26c2a3dea4e36f57)

    Change-Id: I76e11fef27c9723f019cfdfdc6ea86878db78776
    Closes-Bug: #1688620
    (cherry picked from commit 0ee414082318d22b3ed19acad6a479cb105c30e5)
    (cherry picked from commit 71c3a08ead50e8aa68cef3a26c8f8485922895b6)
    (cherry picked from commit c915918f3c6ac608469263dd05e3f2b5eb61dc4e)
    (cherry picked from commit 1199489d998a729d7fcf7cf7f88ab34b6464d46b)
    (cherry picked from commit b8fc8feb070a61b957500a51a95af153da53a02c)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/manila queens-eol

This issue was fixed in the openstack/manila queens-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/manila rocky-eol

This issue was fixed in the openstack/manila rocky-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.