Cinder extend with nova multipath issues [netapp ontap]

Bug #1712651 reported by Adam DiBiase
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Lucio Seki

Bug Description

Greetings,

I am having an issue with nova starting an instance that is using a root volume that cinder has extended. More specifically, a volume that has been extended past the max resize limit of our Netapp filer. I am running Liberty and upgraded cinder packages to 7.0.3 from 7.0.0 to take advantage of this functionality. From what I can gather, it uses sub-lun cloning to get past the hard limit set by Netapp when cloning past 64G (starting from a 4G volume).

Environment:
Openstack Release: Liberty
Ubuntu Release: 14.04
Filer: Netapp
Protocol: Fiberchannel
Multipath: yes

Steps to reproduce:
Create new instance
stop instance
extend the volume by running the following commands:
cinder reset-state --state available (volume-ID or name)
cinder extend (volume-ID or name) 100
cinder reset-state --state in-use (volume-ID or name)
start instance with either nova start or nova reboot --hard --same result

I can see that the instance's multipath status is good before the resize...

360a98000417643556a2b496d58665473 dm-17 NETAPP ,LUN

size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=-1 status=active

| |- 6:0:1:5 sdy 65:128 active undef running

| `- 7:0:0:5 sdz 65:144 active undef running

`-+- policy='round-robin 0' prio=-1 status=enabled

  |- 6:0:0:5 sdx 65:112 active undef running

  `- 7:0:1:5 sdaa 65:160 active undef running

Once the volume is resized, the lun goes to a failed state and it does not show the new size:

360a98000417643556a2b496d58665473 dm-17 NETAPP ,LUN

size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw

|-+- policy='round-robin 0' prio=-1 status=enabled

| |- 6:0:1:5 sdy 65:128 failed undef running

| `- 7:0:0:5 sdz 65:144 failed undef running

`-+- policy='round-robin 0' prio=-1 status=enabled

  |- 6:0:0:5 sdx 65:112 failed undef running

  `- 7:0:1:5 sdaa 65:160 failed undef running

This is what I see on the compute node as well:

Command: for i in `ls -1 | grep lun`; do echo $i; /lib/udev/scsi_id --page 0x83 --whitelisted /dev/disk/by-path/$i; echo $?; echo; done

Results before extend:

pci-0000:05:00.0-fc-0x500a09838f535fbd-lun-5
360a98000417643556a2b496d58665477
0

pci-0000:05:00.0-fc-0x500a09838f535fbd-lun-5-part1
360a98000417643556a2b496d58665477
0

Results after extend:

pci-0000:05:00.1-fc-0x500a09839f535fbd-lun-5
1

pci-0000:05:00.1-fc-0x500a09839f535fbd-lun-5-part1
1

nova.log is showing:

ProcessExecutionError: Unexpected error while running command.
Command: sudo nova-rootwrap /etc/nova/rootwrap.conf scsi_id --page 0x83 --whitelisted /dev/disk/by-path/pci-0000:05:00.0-fc-0x500a09819f535fbd-lun-5

Exit code: 1
Stdout: u''
Stderr: u''

Like I said, this only happens on volumes that have been extended past 64G. Smaller sizes do not have this issue. I can only assume that the original lun is getting destroyed after the clone process and that is cause of the failed state. Why is it not picking up the new one and attaching it to the compute node in its place? Is there something I am missing? I have also upgraded python-os-brick to the latest for liberty.

ii os-brick-common 0.5.0-0ubuntu4~cloud0
ii python-os-brick 0.5.0-0ubuntu4~cloud0

Thanks in advance,

Adam

tags: added: brick drivers multipath netapp
Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote :

Hey Adam,

As you noted: Cinder did not allow extending non-available volumes in Liberty. However, you also discovered that you could reset-state, extend and reset-state again. This is considered a hack because it goes around the cinder API. The driver performs sub-lun cloning to get around lun geometry restrictions under the assertion that cinder would block API calls if the volume is attached.

Extending in-use volumes would need cooperation between nova, cinder and the glue-code that is os-brick.

As of API version 3.42, cinder itself removes this limitation to extend in-use volumes (version 3.42 is available in the Pike release of OpenStack Cinder). Even with this however, nova still needs to call the appropriate code in os-brick and cinder. Here are a bunch of blueprints/specs for this work:

Cinder: https://review.openstack.org/#/c/453286/ (Changes merged and available as of API version 3.42)
Os-brick: https://review.openstack.org/#/c/243730/ (Changes merged and available since a long time)
Nova: http://specs.openstack.org/openstack/nova-specs/specs/pike/approved/nova-support-attached-volume-extend.html (Available since Compute 2.51 API microversion)

With all of this, very likely we need to update the driver. I've opened a bug against the NetApp driver internally in our backlog.

Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote :

Circling back, just wanted to note that the "sub-lun-clone" is an ONTAP driver concept.
It is working around a ONTAP/SAN concept called lun-geometry that is chosen automatically on the backend when the cinder volume (LUN on ONTAP) is created. LUN geometry restricts the maximum resize to ~10x of the size of the volume.

Every time you request to extend a volume beyond its LUN geometry:
- the driver will clone the existing LUN with a larger size.
- The driver transfers the LUN metadata (name, path) to the new LUN, and
- deletes the old LUN.

Note however, the driver leaves the new LUN unmapped. This stems from the assumption that the driver doesn't expect the cinder volume in question to be in attached/"in-use" state.

This assumption works fine with secondary attached volumes, where you can detach, extend and re-attach; and the new LUN will get mapped correctly and your secondary volume is extended. However, since you can't detach root volumes, cinder/nova need to ensure that the mapping is correctly done.

What would solve your use case is if the new LUN clone inherits the mapping details on the backend. That way, when the instance wakes up, it sees the new LUN in place of the old LUN, with all the data intact. This is an enhancement that would be needed in the ONTAP driver.

As a workaround, if you can anticipate that you may one day need to extend your cinder volume beyond 64Gb, maybe you can consider creating bigger volumes to begin with? If you're worried about space consumption, you could disable space reservation (i.e, allow the driver to create sparse/thin LUNs) with "netapp_lun_space_reservation" set to "disabled": https://docs.openstack.org/liberty/config-reference/content/ontap-cluster-iscsi.html

Changed in cinder:
status: New → Confirmed
Revision history for this message
Adam DiBiase (adamdibiase) wrote : Re: [Bug 1712651] Re: Cinder extend with nova multipath issues
Download full text (6.1 KiB)

Thank you for the update. That makes sense now. I appreciate it.

Thanks,

Adam

Adam DiBiase
Network Operations
Digium Cloud Services
Main: 888.305.3850
Support: 877.344.4861 or http://www.digium.com/en/support
<http://www.digium.com/en/support?elq=65516445a5964d3597e25eaf566bc2cf&elqCampaignId=>

On Thu, Aug 24, 2017 at 5:34 PM, Goutham Pacha Ravi <email address hidden>
wrote:

> Circling back, just wanted to note that the "sub-lun-clone" is an ONTAP
> driver concept.
> It is working around a ONTAP/SAN concept called lun-geometry that is
> chosen automatically on the backend when the cinder volume (LUN on ONTAP)
> is created. LUN geometry restricts the maximum resize to ~10x of the size
> of the volume.
>
> Every time you request to extend a volume beyond its LUN geometry:
> - the driver will clone the existing LUN with a larger size.
> - The driver transfers the LUN metadata (name, path) to the new LUN, and
> - deletes the old LUN.
>
> Note however, the driver leaves the new LUN unmapped. This stems from
> the assumption that the driver doesn't expect the cinder volume in
> question to be in attached/"in-use" state.
>
> This assumption works fine with secondary attached volumes, where you
> can detach, extend and re-attach; and the new LUN will get mapped
> correctly and your secondary volume is extended. However, since you
> can't detach root volumes, cinder/nova need to ensure that the mapping
> is correctly done.
>
> What would solve your use case is if the new LUN clone inherits the
> mapping details on the backend. That way, when the instance wakes up, it
> sees the new LUN in place of the old LUN, with all the data intact. This
> is an enhancement that would be needed in the ONTAP driver.
>
> As a workaround, if you can anticipate that you may one day need to
> extend your cinder volume beyond 64Gb, maybe you can consider creating
> bigger volumes to begin with? If you're worried about space consumption,
> you could disable space reservation (i.e, allow the driver to create
> sparse/thin LUNs) with "netapp_lun_space_reservation" set to "disabled":
> https://docs.openstack.org/liberty/config-reference/content/ontap-
> cluster-iscsi.html
>
>
> ** Changed in: cinder
> Status: New => Confirmed
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1712651
>
> Title:
> Cinder extend with nova multipath issues
>
> Status in Cinder:
> Confirmed
>
> Bug description:
> Greetings,
>
> I am having an issue with nova starting an instance that is using a
> root volume that cinder has extended. More specifically, a volume that
> has been extended past the max resize limit of our Netapp filer. I am
> running Liberty and upgraded cinder packages to 7.0.3 from 7.0.0 to
> take advantage of this functionality. From what I can gather, it uses
> sub-lun cloning to get past the hard limit set by Netapp when cloning
> past 64G (starting from a 4G volume).
>
> Environment:
> Openstack Release: Liberty
> Ubuntu Release: 14.04
> Filer: Netapp
> Protocol: Fiberchannel
> Multipath: yes
>
>
> Steps to reproduce:
> Create new instance
> stop i...

Read more...

Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote : Re: Cinder extend with nova multipath issues

Hello,

Based on internal investigation, we concluded there is no way to implement this feature when we perform an extend beyond the LUN geometry. Even though it is possible to do an online extend when it's within the LUN geometry, there is no way we could support that consistently from the Cinder API as it currently is.

I suggest marking this as won't fix. We will submit a patch soon as a small improvement to prevent the extend method from starting the sub-lun-clone workflow when attached.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/567868

Changed in cinder:
assignee: nobody → Lucio Seki (lseki)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/567868
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=2b60912d5667350eae7ecbc67d4dba3658518d10
Submitter: Zuul
Branch: master

commit 2b60912d5667350eae7ecbc67d4dba3658518d10
Author: Lucio Seki <email address hidden>
Date: Wed Apr 18 17:35:56 2018 -0300

    NetApp ONTAP iSCSI: Force exception on online extend

    The Netapp ONTAP iSCSI driver does not support online volume extend. It
    may work if the requested size does not exceed the LUN max geometry,
    otherwise it will require the LUN to be detached. In such case, the
    backend currently detaches and leaves the volume in an inconsistent
    state.

    This patch forces the ONTAP iSCSI driver to raise an exception whenever
    an online extend is requested and it detects it would exceed the LUN max
    geometry.

    Change-Id: Ie3dddbc05c6cd32e27168d68f4cb819364b0438c
    Closes-Bug: #1712651

Changed in cinder:
status: In Progress → Fix Released
Eric Harney (eharney)
summary: - Cinder extend with nova multipath issues
+ Cinder extend with nova multipath issues [netapp ontap]
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/569836

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/569839

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/queens)

Reviewed: https://review.openstack.org/569836
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=41735db868fc1de2dac313ea60742e7e1cc76289
Submitter: Zuul
Branch: stable/queens

commit 41735db868fc1de2dac313ea60742e7e1cc76289
Author: Lucio Seki <email address hidden>
Date: Wed Apr 18 17:35:56 2018 -0300

    NetApp ONTAP iSCSI: Force exception on online extend

    The Netapp ONTAP iSCSI driver does not support online volume extend. It
    may work if the requested size does not exceed the LUN max geometry,
    otherwise it will require the LUN to be detached. In such case, the
    backend currently detaches and leaves the volume in an inconsistent
    state.

    This patch forces the ONTAP iSCSI driver to raise an exception whenever
    an online extend is requested and it detects it would exceed the LUN max
    geometry.

    Change-Id: Ie3dddbc05c6cd32e27168d68f4cb819364b0438c
    Closes-Bug: #1712651
    (cherry picked from commit 2b60912d5667350eae7ecbc67d4dba3658518d10)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/pike)

Reviewed: https://review.openstack.org/569839
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=e7d8a3997938e20541c9f00d508ffaee75a7c7ba
Submitter: Zuul
Branch: stable/pike

commit e7d8a3997938e20541c9f00d508ffaee75a7c7ba
Author: Lucio Seki <email address hidden>
Date: Wed Apr 18 17:35:56 2018 -0300

    NetApp ONTAP iSCSI: Force exception on online extend

    The Netapp ONTAP iSCSI driver does not support online volume extend. It
    may work if the requested size does not exceed the LUN max geometry,
    otherwise it will require the LUN to be detached. In such case, the
    backend currently detaches and leaves the volume in an inconsistent
    state.

    This patch forces the ONTAP iSCSI driver to raise an exception whenever
    an online extend is requested and it detects it would exceed the LUN max
    geometry.

    Change-Id: Ie3dddbc05c6cd32e27168d68f4cb819364b0438c
    Closes-Bug: #1712651
    (cherry picked from commit 2b60912d5667350eae7ecbc67d4dba3658518d10)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 13.0.0.0b2

This issue was fixed in the openstack/cinder 13.0.0.0b2 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 12.0.3

This issue was fixed in the openstack/cinder 12.0.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 11.1.1

This issue was fixed in the openstack/cinder 11.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.