Storwise SVC : unexpected removing of FC zoning

Bug #1468913 reported by LABOUEBE Michael
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
TaoBai
Nominated for Kilo by Jay Bryant

Bug Description

I think i've run into a bug that is present in Juno release and seems to be still present in master with the Storwise SVC driver.
When you attach the first volume for a VM in a compute node, the FC autozoning takes place and create the zoning between the initiator and the target.
All goes well for the subsequent volumes attach to this compute node.
But when you detach a volume, the driver calls the zonemanager and remove the zoning without checking if there is still volume attached to the compute node.
This is resulting in all other volumes attached to the compute node to be unavailable because of the lack of zoning.
HP seems to had the same problem with 3PAR but corrected it with the commit 5cf52914bd78fa553f853ad6ea2fba8a87d8075c

TaoBai (baitao2020)
Changed in cinder:
assignee: nobody → TaoBai (baitao2020)
Revision history for this message
TaoBai (baitao2020) wrote :

Hi gfarmerfr

Thanks for you report this bug, can you give me your email address ? Maybe I need it be verified after I finish the code bug fix.

I quickly went though HP 3Par code change, seems the solution is only return wwpn when there is no volume-attached .

tags: added: drivers storwise
Revision history for this message
LABOUEBE Michael (gfarmerfr) wrote :

TaoBai : i've fixed my launchpad privacy settings, and i've also send you an email.
I've no problem to test your bug fix when you're ready.
I've also come to the same conclusion based on the HP 3Par patch : when there is still volumes attached to the compute node (so on any of the VMs on the compute node), the "terminate_connection" function should return an empty hash for the "data" key of the "info" datastructure : info['data'] = {}.
In theory if i understand well the code the @fczm_utils.RemoveFCZone decorator will still be called by the driver but do nothing in this case.
See the "RemoveFCZone" function in cinder/zonemanager/utils.py

Revision history for this message
Jay Bryant (jsbryant) wrote :

Tao,

Thanks for working this one. Should try to get it fixed soon.

Jay

Changed in cinder:
importance: Undecided → High
status: New → Triaged
milestone: none → liberty-2
TaoBai (baitao2020)
Changed in cinder:
status: Triaged → In Progress
Revision history for this message
Jay Bryant (jsbryant) wrote :

Wanted to include this note that was sent via email.

The NPIV support was added a couple of releases back. It is quite possible that it introduced the bug listed below. Tao, let me know if you need any help with this.

I think I've found another bug in the Storwise SVC driver in cinder but I'm not sure if it's intentional or not so I would appreciate your input before opening a bug in launchpad.

This bug is present in Juno up to master. It might be present in earlier release but I've not checked.
It seems to me than there is a chicken and egg problem in the initialize_connection function that call the zoning function AddFCZone.
When you're in fiber channel mode initialize_connection calls the get_conn_fc_wwpns with in turn does a lsfabric on the SVC.
The problem is that, when it's the first time you want to attach a volume to a compute node,
there is no zoning present so get_conn_fc_wwpns(host_name) returns nothing.
When nothing is returned you enter the 'if len(conn_wwpns) == 0' condition and if storwize_svc_npiv_compatibility_mode is not configured, the zoning never happens.
So the only way to make the zone creation works is to define storwize_svc_npiv_compatibility_mode,
even if in your architecture the compute nodes are directly attached to the same fabric as the SVC (and not behind some NPIV Top of Rack switches).

What I think would be an appropriate decision tree is :
* launch get_conn_fc_wwpns(host_name) and check the return.
* if nothing is returned (so the host is not zoned or is in NPIV mode) : do the zoning.
* if something is returned (so the host is already zoned) : do nothing via not including the wwpns in the return datastructure of the function (don't create the initiator_target_map key : see zonemanager/utils.py). It will avoid a round trip to the zonemanager redoing the zoning that is already in place.

I've grep the source code and storwize_svc_npiv_compatibility_mode isn't used anywhere else.
Since it doesn't influence the decision to zone or not a host, shouldn't we get rid of it ?

Also shouldn't we call map_vol_to_host only after checking if the zoning is successful or not necessary ?
This should avoid having the volume mapped but no attached to the host.

Best Regards,
LABOUEBE Michael

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/196966
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=e634215a0aa20b8bb1819b3f18f1727517b5e0a7
Submitter: Jenkins
Branch: master

commit e634215a0aa20b8bb1819b3f18f1727517b5e0a7
Author: TaoBai <email address hidden>
Date: Tue Jun 30 00:03:38 2015 -0700

    Storwize Driver zone removing

    Storwize driver may remove zone even when the zone is still in use
    by an attached volume. To fix this issue, we need only report initiator
    and target map when there are no volumes attached.

    Closes-Bug: 1468913

    Change-Id: I075b8d37d4d312ae6d812b41bb3732f20987a72d

Changed in cinder:
status: In Progress → Fix Committed
Changed in cinder:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: liberty-2 → 7.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.