On a Brocade driver update of a zone, VM loses access to volume disk

Bug #1563060 reported by Yucong Feng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Angela Smith

Bug Description

This is mainly for initiator zoning. During the delete/update sequence of removing target wwpns for an initiator zone, if 'activate=True', the code first does a hard delete of the zone, and then a subsequent recreation. During the time in between the deletion and creation of the zone, the VM loses all access to the volumes disks.

brcd_fc_zone_driver.py
    @lockutils.synchronized('brcd', 'fcfabric-', True)
    def delete_connection(self, fabric, initiator_target_map, host_name=None,
    ...

             try:
                # Update zone membership.
                if zone_map:
                    conn.add_zones(
                        zone_map, zone_activate,
                        cfgmap_from_fabric)

brcd_fc_zone_client_cli.py
    def add_zones(self, zones, activate, active_zone_set=None):
        """Add zone configuration.

        for zone in zones.keys():
            # If zone exists, its an update. Delete & insert
            # TODO(skolathur): This still need to be optimized
            # to an update call later. Now we just handled the
            # same zone name with same zone members.
            if (zone in zone_list):
                if set(zones[zone]) == set(zone_list[zone]):
                    break
                try:
                    self.delete_zones(zone, activate, active_zone_set)
                except exception.BrocadeZoningCliException:
                    with excutils.save_and_reraise_exception():
                        LOG.error(_LE("Deleting zone failed %s"), zone)
                LOG.debug("Deleted Zone before insert : %s", zone)

This can be avoided if the delete_zones is a transient operation, by setting activate=False. And when the zone is recreated-the subsequent cfgenable call will configure everything in one go.

Tags: brocade
Revision history for this message
Angela Smith (aallen-m) wrote :

This can be fixed by addressing the TODO item. Instead of doing zonedelete/zonecreate, we can modify the zone using CLIs zoneremove/zoneadd. OR, you can use the new HTTP connector which already handles zone update this way. :)

Changed in cinder:
assignee: nobody → Angela Smith (aallen-m)
status: New → Confirmed
Revision history for this message
Angela Smith (aallen-m) wrote :

Working on a fix, will post code review soon. Thanks.

Revision history for this message
Yucong Feng (yfeng) wrote :

Is there any update on this? I have extra time to work on this, if there is need for assistance.

Revision history for this message
Angela Smith (aallen-m) wrote :

I should have a code review for SSH connector posted by Monday. Doing final testing now. Thanks.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/317061

Changed in cinder:
status: Confirmed → In Progress
Revision history for this message
Angela Smith (aallen-m) wrote :

Yucong, I have posted fix for this issue for both SSH and HTTP connectors. Could you please test the fix and verify that it has corrected the issue you reported? Thanks!

Revision history for this message
Yucong Feng (yfeng) wrote :

I was able to verify the patch fix. Volumes no longer lose connection to the VM

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/317061
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=03abe629a9efae92c2fae0b8632ca79e63edcb5a
Submitter: Jenkins
Branch: master

commit 03abe629a9efae92c2fae0b8632ca79e63edcb5a
Author: Angela Smith <email address hidden>
Date: Mon May 16 11:58:49 2016 -0700

    Fix Brcd zone driver initiator zone update

    Modified the logic to do zoneadd or zoneremove to
    modify the zone members when an initiator zone
    is being modified instead of current logic which
    deletes the zone and recreates it, in which time
    the existing targets lose connectivity to the
    initiator because of cfgdisable.

    Change-Id: I24bc3d56aa36a9e39cff403f7447e359c760e732
    Closes-Bug: #1563060

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/cinder 9.0.0.0b2

This issue was fixed in the openstack/cinder 9.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.