Cinder extend with nova multipath issues [netapp ontap]
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Cinder |
Fix Released
|
Undecided
|
Lucio Seki |
Bug Description
Greetings,
I am having an issue with nova starting an instance that is using a root volume that cinder has extended. More specifically, a volume that has been extended past the max resize limit of our Netapp filer. I am running Liberty and upgraded cinder packages to 7.0.3 from 7.0.0 to take advantage of this functionality. From what I can gather, it uses sub-lun cloning to get past the hard limit set by Netapp when cloning past 64G (starting from a 4G volume).
Environment:
Openstack Release: Liberty
Ubuntu Release: 14.04
Filer: Netapp
Protocol: Fiberchannel
Multipath: yes
Steps to reproduce:
Create new instance
stop instance
extend the volume by running the following commands:
cinder reset-state --state available (volume-ID or name)
cinder extend (volume-ID or name) 100
cinder reset-state --state in-use (volume-ID or name)
start instance with either nova start or nova reboot --hard --same result
I can see that the instance's multipath status is good before the resize...
360a98000417643
size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=-1 status=active
| |- 6:0:1:5 sdy 65:128 active undef running
| `- 7:0:0:5 sdz 65:144 active undef running
`-+- policy='round-robin 0' prio=-1 status=enabled
|- 6:0:0:5 sdx 65:112 active undef running
`- 7:0:1:5 sdaa 65:160 active undef running
Once the volume is resized, the lun goes to a failed state and it does not show the new size:
360a98000417643
size=20G features='1 queue_if_no_path' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=-1 status=enabled
| |- 6:0:1:5 sdy 65:128 failed undef running
| `- 7:0:0:5 sdz 65:144 failed undef running
`-+- policy='round-robin 0' prio=-1 status=enabled
|- 6:0:0:5 sdx 65:112 failed undef running
`- 7:0:1:5 sdaa 65:160 failed undef running
This is what I see on the compute node as well:
Command: for i in `ls -1 | grep lun`; do echo $i; /lib/udev/scsi_id --page 0x83 --whitelisted /dev/disk/
Results before extend:
pci-0000:
360a98000417643
0
pci-0000:
360a98000417643
0
Results after extend:
pci-0000:
1
pci-0000:
1
nova.log is showing:
ProcessExecutio
Command: sudo nova-rootwrap /etc/nova/
Exit code: 1
Stdout: u''
Stderr: u''
Like I said, this only happens on volumes that have been extended past 64G. Smaller sizes do not have this issue. I can only assume that the original lun is getting destroyed after the clone process and that is cause of the failed state. Why is it not picking up the new one and attaching it to the compute node in its place? Is there something I am missing? I have also upgraded python-os-brick to the latest for liberty.
ii os-brick-common 0.5.0-0ubuntu4~
ii python-os-brick 0.5.0-0ubuntu4~
Thanks in advance,
Adam
tags: | added: brick drivers multipath netapp |
summary: |
- Cinder extend with nova multipath issues + Cinder extend with nova multipath issues [netapp ontap] |
Hey Adam,
As you noted: Cinder did not allow extending non-available volumes in Liberty. However, you also discovered that you could reset-state, extend and reset-state again. This is considered a hack because it goes around the cinder API. The driver performs sub-lun cloning to get around lun geometry restrictions under the assertion that cinder would block API calls if the volume is attached.
Extending in-use volumes would need cooperation between nova, cinder and the glue-code that is os-brick.
As of API version 3.42, cinder itself removes this limitation to extend in-use volumes (version 3.42 is available in the Pike release of OpenStack Cinder). Even with this however, nova still needs to call the appropriate code in os-brick and cinder. Here are a bunch of blueprints/specs for this work:
Cinder: https:/ /review. openstack. org/#/c/ 453286/ (Changes merged and available as of API version 3.42) /review. openstack. org/#/c/ 243730/ (Changes merged and available since a long time) specs.openstack .org/openstack/ nova-specs/ specs/pike/ approved/ nova-support- attached- volume- extend. html (Available since Compute 2.51 API microversion)
Os-brick: https:/
Nova: http://
With all of this, very likely we need to update the driver. I've opened a bug against the NetApp driver internally in our backlog.