NetApp ONTAP: QoS policy group is deleted after migration

Bug #1906291 reported by Lucio Seki
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Medium
Unassigned

Bug Description

NetApp ONTAP Cinder driver has support for setting QoS.
When a Cinder volume is created with a volume-type associated to a QoS entity, the driver creates a QoS policy group at the ONTAP back end, and associates it to the entity representing the Cinder volume (either a LUN or a file within an NFS share).
When a migrate operation is issued, the QoS policy group is deleted.

Steps to reproduce:
- Set up 2 ONTAP back ends `ontap1` and `ontap2`
- Create a Cinder QoS `qos_test`
- Create a Cinder volume type `ontap`
- Associate the QoS `qos_test` to the volume type `ontap`
- Create a Cinder volume with the volume type `ontap`
- Migrate the volume to another ONTAP back end
- Wait for the driver to perform a host-assited migration
- Wait for the driver to create a new QoS policy group and associate it to the new LUN/file representing the volume

Expected result:
- Have the new QoS policy group associated to the LUN/file permanently.

Actual result:
- The new QoS policy group is deleted afer a few minutes.

Detailed commands and outputs are here [0].

[0] http://paste.openstack.org/show/800563/

Changed in cinder:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Gorka Eguileor (gorka) wrote :

I believe the issue is on the NetApp driver itself and is a 2 piece issue:

- When checking which qos to delete we are not taking into account the volume's name, and are only checking its id [1]

- The call to "update_migrated_volume" is not implemented on NetApp's NFS driver and uses the inherited one from cinder.volume.drivers.nfs.NFSDriver, where does an "os.rename" [2] to rename the file, creating a problem with the matching of the file and qos.

[1]: https://github.com/openstack/cinder/blob/d3ffa90baa959530eaa1cd1d4e3800fbe9148806/cinder/volume/drivers/netapp/utils.py#L269
[2]: https://github.com/openstack/cinder/blob/d3ffa90baa959530eaa1cd1d4e3800fbe9148806/cinder/volume/drivers/nfs.py#L484

Revision history for this message
Lucio Seki (lseki) wrote :

The change suggested by Gorka at [0] fixed the issue:
---
geguileo lseki: https://github.com/openstack/cinder/blob/d3ffa90baa959530eaa1cd1d4e3800fbe9148806/cinder/volume/drivers/netapp/utils.py#L269 18:07
geguileo lseki: it's using the id and not the name... 18:07
geguileo lseki: I believe changing that line to something like return OPENSTACK_PREFIX + (volume.get('name') or volume['id'])
---

I created a Cinder volume on an ONTAP NFS back end with QoS and migrate it to another ONTAP NFS back end.

The operation successfully created a new QoS policy group, and associated it to the new file backing the Cinder volume.

After a while, the ONTAP driver properly deleted the old QoS policy group.

Detailed commands and outputs are available here [1].

It was not necessary to implement update_migrated_volume method.

[0] http://eavesdrop.openstack.org/irclogs/%23openstack-cinder/%23openstack-cinder.2020-12-09.log.html#t2020-12-09T18:07:33
[1] http://paste.openstack.org/show/800913

Revision history for this message
Lucio Seki (lseki) wrote :

I'll test it again with different pools, as I was testing with 2 back ends using the same pool.

Revision history for this message
Lucio Seki (lseki) wrote :

Just modifying the `cinder/volume/drivers/netapp/utils.py:get_qos_policy_group_name` was not sufficient [0].
The QoS policy group gets associated to a wrong filename, and ends up being deleted after a while (as the associated file does not exist).

As Gorka suggested, I had to also implement `cinder/volume/drivers/netapp/dataontap/nfs_cmode.py:NetAppCmodeNfsDriver.update_migrated_volume` that raises a NotImplementedError in order to prevent the base class (cinder/volume/drivers/nfs.py:NfsDriver) from renaming the file.

Now the QoS policy group is associated to the correct file [1].

[0] http://paste.openstack.org/show/800918
[1] http://paste.openstack.org/show/800919

Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote :
Changed in cinder:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/766296
Committed: https://opendev.org/openstack/cinder/commit/da100e1fca610cefb7e77079ab3170b068e328a2
Submitter: "Zuul (22348)"
Branch: master

commit da100e1fca610cefb7e77079ab3170b068e328a2
Author: Gorka Eguileor <email address hidden>
Date: Wed Dec 9 21:03:49 2020 +0100

    NetApp ONTAP: Fix QoS lost after moving volume

    When a Cinder volume is created with a volume-type associated to a QoS
    entity, the driver creates a QoS policy group at the ONTAP back end, and
    associates it to the entity representing the Cinder volume (either a LUN
    or a file within an NFS share).

    On NetApp NFS, when a migrate operation is issued and it completes, the
    resulting volume ends up without a QoS. That happens because the file
    is being renamed while the QoS refers to the now non-existent file.

    This patch makes it so that the file is not renamed when finishing a
    migration and the driver code uses the ``name_id`` attribute instead of
    the ``id`` one to refer to the right UUID.

    Closes-Bug: #1906291
    Change-Id: Icd7a929e7cbce0c74f6b340f4e09f74a8098d752

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/cinder/+/800220

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/cinder/+/800228

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/cinder/+/800229

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/cinder/+/800231

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/800220
Committed: https://opendev.org/openstack/cinder/commit/ee49b67414ef990dd4963f3a4a18d8d3784d8626
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit ee49b67414ef990dd4963f3a4a18d8d3784d8626
Author: Gorka Eguileor <email address hidden>
Date: Wed Dec 9 21:03:49 2020 +0100

    NetApp ONTAP: Fix QoS lost after moving volume

    When a Cinder volume is created with a volume-type associated to a QoS
    entity, the driver creates a QoS policy group at the ONTAP back end, and
    associates it to the entity representing the Cinder volume (either a LUN
    or a file within an NFS share).

    On NetApp NFS, when a migrate operation is issued and it completes, the
    resulting volume ends up without a QoS. That happens because the file
    is being renamed while the QoS refers to the now non-existent file.

    This patch makes it so that the file is not renamed when finishing a
    migration and the driver code uses the ``name_id`` attribute instead of
    the ``id`` one to refer to the right UUID.

    Closes-Bug: #1906291
    Change-Id: Icd7a929e7cbce0c74f6b340f4e09f74a8098d752
    (cherry picked from commit da100e1fca610cefb7e77079ab3170b068e328a2)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 19.0.0.0b1

This issue was fixed in the openstack/cinder 19.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/800228
Committed: https://opendev.org/openstack/cinder/commit/7eff1a4d7fae410c9c7840519b23e3f724b2d026
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 7eff1a4d7fae410c9c7840519b23e3f724b2d026
Author: Gorka Eguileor <email address hidden>
Date: Wed Dec 9 21:03:49 2020 +0100

    NetApp ONTAP: Fix QoS lost after moving volume

    When a Cinder volume is created with a volume-type associated to a QoS
    entity, the driver creates a QoS policy group at the ONTAP back end, and
    associates it to the entity representing the Cinder volume (either a LUN
    or a file within an NFS share).

    On NetApp NFS, when a migrate operation is issued and it completes, the
    resulting volume ends up without a QoS. That happens because the file
    is being renamed while the QoS refers to the now non-existent file.

    This patch makes it so that the file is not renamed when finishing a
    migration and the driver code uses the ``name_id`` attribute instead of
    the ``id`` one to refer to the right UUID.

    Conflicts:
        cinder/tests/unit/volume/drivers/netapp/dataontap/test_nfs_base.py
        cinder/volume/drivers/netapp/dataontap/nfs_base.py
    Both conflicts are caused by the new code added by
    I507083c3e34e5a5cf1db9a3d1f6bef47bd51a9f8

    Closes-Bug: #1906291
    Change-Id: Icd7a929e7cbce0c74f6b340f4e09f74a8098d752
    (cherry picked from commit da100e1fca610cefb7e77079ab3170b068e328a2)
    (cherry picked from commit ee49b67414ef990dd4963f3a4a18d8d3784d8626)

tags: added: in-stable-victoria
tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/800229
Committed: https://opendev.org/openstack/cinder/commit/652ab0aff96a6e6e0aa83c49b20fcce2d1c1cde7
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 652ab0aff96a6e6e0aa83c49b20fcce2d1c1cde7
Author: Gorka Eguileor <email address hidden>
Date: Wed Dec 9 21:03:49 2020 +0100

    NetApp ONTAP: Fix QoS lost after moving volume

    When a Cinder volume is created with a volume-type associated to a QoS
    entity, the driver creates a QoS policy group at the ONTAP back end, and
    associates it to the entity representing the Cinder volume (either a LUN
    or a file within an NFS share).

    On NetApp NFS, when a migrate operation is issued and it completes, the
    resulting volume ends up without a QoS. That happens because the file
    is being renamed while the QoS refers to the now non-existent file.

    This patch makes it so that the file is not renamed when finishing a
    migration and the driver code uses the ``name_id`` attribute instead of
    the ``id`` one to refer to the right UUID.

    Conflicts:
        cinder/tests/unit/volume/drivers/netapp/dataontap/test_nfs_base.py
        cinder/volume/drivers/netapp/dataontap/nfs_base.py
    Both conflicts are caused by the new code added by
    I507083c3e34e5a5cf1db9a3d1f6bef47bd51a9f8

    Closes-Bug: #1906291
    Change-Id: Icd7a929e7cbce0c74f6b340f4e09f74a8098d752
    (cherry picked from commit da100e1fca610cefb7e77079ab3170b068e328a2)
    (cherry picked from commit ee49b67414ef990dd4963f3a4a18d8d3784d8626)
    (cherry picked from commit 7eff1a4d7fae410c9c7840519b23e3f724b2d026)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/train)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/800231
Committed: https://opendev.org/openstack/cinder/commit/bfa754c0700a5100ca0fecbfead21692c0aed710
Submitter: "Zuul (22348)"
Branch: stable/train

commit bfa754c0700a5100ca0fecbfead21692c0aed710
Author: Gorka Eguileor <email address hidden>
Date: Wed Dec 9 21:03:49 2020 +0100

    NetApp ONTAP: Fix QoS lost after moving volume

    When a Cinder volume is created with a volume-type associated to a QoS
    entity, the driver creates a QoS policy group at the ONTAP back end, and
    associates it to the entity representing the Cinder volume (either a LUN
    or a file within an NFS share).

    On NetApp NFS, when a migrate operation is issued and it completes, the
    resulting volume ends up without a QoS. That happens because the file
    is being renamed while the QoS refers to the now non-existent file.

    This patch makes it so that the file is not renamed when finishing a
    migration and the driver code uses the ``name_id`` attribute instead of
    the ``id`` one to refer to the right UUID.

    Conflicts:
        cinder/tests/unit/volume/drivers/netapp/dataontap/test_nfs_base.py
        cinder/volume/drivers/netapp/dataontap/nfs_base.py
    Both conflicts are caused by the new code added by
    I507083c3e34e5a5cf1db9a3d1f6bef47bd51a9f8

    Closes-Bug: #1906291
    Change-Id: Icd7a929e7cbce0c74f6b340f4e09f74a8098d752
    (cherry picked from commit da100e1fca610cefb7e77079ab3170b068e328a2)
    (cherry picked from commit ee49b67414ef990dd4963f3a4a18d8d3784d8626)
    (cherry picked from commit 7eff1a4d7fae410c9c7840519b23e3f724b2d026)
    (cherry picked from commit 652ab0aff96a6e6e0aa83c49b20fcce2d1c1cde7)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 18.1.0

This issue was fixed in the openstack/cinder 18.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 16.4.1

This issue was fixed in the openstack/cinder 16.4.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 17.2.0

This issue was fixed in the openstack/cinder 17.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder train-eol

This issue was fixed in the openstack/cinder train-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.