Comment 1 for bug 1844314

Revision history for this message
ramakrishnan (sriramasan) wrote : Re: Upgrading from SMI-S based driver to RESTAPI based driver with long hostnames

Code from Queens:
In File (cinder-stable-queens/cinder/volume/drivers/dell_emc/vmax/utils.py Line 253)
    def generate_unique_trunc_host(self, host_name):
        """Create a unique short host name under 16 characters.

        :param host_name: long host name
        :returns: truncated host name
        """
        if host_name and len(host_name) > 16:
        ...

Similar code from Ocata (SMI-S) level:
In File (cinder-stable-ocata/cinder/volume/drivers/dell_emc/vmax/utils.py Line 2547)

    def generate_unique_trunc_host(self, hostName):
        """Create a unique short host name under 40 chars

        :param sgName: long storage group name
        :returns: truncated storage group name
        """
        if hostName and len(hostName) > 38:
        ....

It seems this is what causes the masking view not to be found. In fc->terminate_connection(), there is:
In File (cinder-stable-queens/cinder/volume/drivers/dell_emc/vmax/fc.py Line 292)

        if connector:
            zoning_mappings = self._get_zoning_mappings(volume, connector)

        if zoning_mappings:
            self.common.terminate_connection(volume, connector)
            data = self._cleanup_zones(zoning_mappings)
        return data

If no zoning mappings are not found, then it skips calling the terminate_connection flow. So it responds like the detach worked, but it does not do anything.
The reason is that the _get_zoning_mappings() eventually gets to _get_masking_views_from_volume() to do the lookup. Since there is a shortened (mangled) 'host', this logic is happens for comparing hosts, but the host compare does not evaluate to true. The old masking veiw name had the full 26 character host name in it, and that does not compare to the now 16 char host name.

                if host_compare:
                    if host.lower() in mv.lower():
                        maskingview_list.append(mv)

This suggests the work around of passing an empty 'host' value in the connector. I tried this and the flow deadlocks with the following analysis:

common._remove_members()
  -->masking.remove_and_reset_members()
    -->masking._cleanup_deletion()
      Loop over storage groups because no 'host':
      -->masking.remove_volume_from_sg(storagegroup_name=OS-no_SLO-SG)
        -->do_remove_volume_from_sg(mv-sg) [lock on OS-no_SLO-SG]
          -->masking.multiple_vols_in_sg()
            -->masking.add_volume_to_default_storage_group(src_sg=<dft-sg>) [move=true flow]
              -->masking.get_or_create_default_storage_group()
                -->_move_vol_to_default_sg() [already there, deadlocks on OS-no_SLO-SG because that lock already held]
                  -->rest.move_volume_between_storage_groups()

It tries to operate on the default storage group in a nested fashion causing the deadlock.
Therefore, it appears the driver will need to be fixed for the original case of passing the host on the connector so that the terminate flow is not skipped.