Zone manager removes zoning when there are still volumes attached

Bug #1308318 reported by Xing Yang on 2014-04-16
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Cinder
Medium
Walt Boring

Bug Description

I was testing FC Auto Zoning. I had volume1 attached to VM1 and zoning was created properly during the first attach volume operation. When I tried to attach the second volume, for some reason Nova couldn't discover the lun and therefore terminate_connection was called. So the 2nd volume was not attached. However during the terminate_connection call, zonemanager.delete_connection() was called and the zoning was deleted even though volume1 was still attached.

The zoning should only be deleted when there are no more volumes attached.

Xing Yang (xing-yang) on 2014-04-16
Changed in cinder:
assignee: nobody → Santhoshkumar Kolathur (skolathu)
Mike Perez (thingee) on 2014-04-16
tags: added: fibrechannel
tags: added: zonemanager
Changed in cinder:
importance: Undecided → Critical
status: New → Confirmed
Xing Yang (xing-yang) wrote :

Attach logs.

Xing Yang (xing-yang) wrote :

Hi Santhosh,

Search for "delete_connection" at the end of the log file. That was the last time when this was reproduced.

2014-04-22 03:44:09.288 ^[[00;32mDEBUG cinder.zonemanager.fc_zone_manager [req-29dd55fd-dacd-4ce1-9845-eeb600202ccc 205801599ac04b4c86984bee2d62c94a 5574fbe8699b49d5ad434746ef1a15f6^[[00;32m] ^[[00;32mDelete connection Fabric Map from SAN context: {'SW02': ['5006016736600759', '5006016f36600759']} from (pid=6965) delete_connection /opt/stack/cinder/cinder/zonemanager/fc_zone_manager.py:166

Thanks,
Xing

Xing Yang (xing-yang) wrote :

Santhosh,

Another related question. I saw the following in the logs:

oslo.messaging.rpc.dispatcher VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Unable to terminate volume connection: Fibre Channel connection control failure: Failed removing connection for fabric=None: Error:Failed to get SAN context Unsupported firmware on switch 10.108.250.93. Make sure switch is running firmware v6.4 or higher

From the switch:
L4DS-102Q-SW01:FID1:admin> firmwareshow
Appl Primary/Secondary Versions
------------------------------------------
FOS v7.1.0a
         v7.1.0a

v7.1a is higher than v6.4. Why did the message say "unsupported firmware"?

Thanks for providing the logs. Our current implementation does not keep track of attach/detach reference count and we may need to implement it to maintain the reference count for I-T pair and persist it in database to resolve the issue. I'll start working on it and update the changes here.

Regarding the firmware version check, you should not be getting the error if the switch configured (in this case 10.108.250.93) is running FOS ver 6.4 or above. Can you please turn on debug logs and see what is being printed as part of is_supported_firmware call? Also, please post the out put from this switch for CLI command 'version'.

Changed in cinder:
status: Confirmed → In Progress
Xing Yang (xing-yang) wrote :

The attached cinder-volume log already had debug turned on.

Here is the output from the CLI command "version":

L4DS-102Q-SW01:FID1:io2p> version
Kernel: 2.6.14.2
Fabric OS: v7.1.0a
Made on: Tue Jan 22 19:33:14 2013
Flash: Fri May 3 15:02:02 2013
BootProm: 1.0.15

Jeegn Chen (jeegn-chen) wrote :

So far as I know, when Nova call initialize_connection(), it will retry 3 times by default if the first calls timeout (probably because Cinder is too busy).
Thus in this scenario, Nova think initialize_connection() is invoked once while Cinder might see 3 initialize_connection() invocations at most.
If later, a terminate_connection() is served without retry later, both Nova and Cinder may just see one terminate_connection().

Will reference count solution take this case into consideration?

Fix proposed to branch: master
Review: https://review.openstack.org/97880

Changed in cinder:
assignee: Santhoshkumar Kolathur (skolathu) → Rahul Verma (rahul-verma-m)

Fix proposed to branch: master
Review: https://review.openstack.org/103266

Changed in cinder:
assignee: Rahul Verma (rahul-verma-m) → Walt Boring (walter-boring)

Reviewed: https://review.openstack.org/103266
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=5cf52914bd78fa553f853ad6ea2fba8a87d8075c
Submitter: Jenkins
Branch: master

commit 5cf52914bd78fa553f853ad6ea2fba8a87d8075c
Author: Walter A. Boring IV <email address hidden>
Date: Fri Jun 27 16:00:01 2014 -0700

    3PAR Only remove FC Zone on last volume detach

    This patch checks to make sure that we don't
    include the initiator_target_map in the return
    of terminate_connection if there are volumes
    still attached for a particular host. The
    FibreChannel ZoneManager doesn't remove zones
    if there isn't an initiator_target_map in the
    return of terminate_connection.

    Change-Id: I98db5adb6da38454933a6e4b78085193f1d37680
    Partial-Bug: #1308318

Changed in cinder:
importance: Critical → Medium
Walt Boring (walter-boring) wrote :

Every driver has to make sure to only return an initiator_target_map during terminate_connection if and only if there are no more attached volumes to that host.

Changed in cinder:
status: In Progress → Fix Committed
Changed in cinder:
milestone: none → juno-2
status: Fix Committed → Fix Released

Change abandoned by Rahul Verma (<email address hidden>) on branch: master
Review: https://review.openstack.org/97880

Thierry Carrez (ttx) on 2014-10-16
Changed in cinder:
milestone: juno-2 → 2014.2
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers