Code from Queens:
In File (cinder-stable-queens/cinder/volume/drivers/dell_emc/vmax/utils.py Line 253)
def generate_unique_trunc_host(self, host_name):
"""Create a unique short host name under 16 characters.
:param host_name: long host name
:returns: truncated host name
"""
if host_name and len(host_name) > 16:
...
Similar code from Ocata (SMI-S) level:
In File (cinder-stable-ocata/cinder/volume/drivers/dell_emc/vmax/utils.py Line 2547)
def generate_unique_trunc_host(self, hostName):
"""Create a unique short host name under 40 chars
:param sgName: long storage group name
:returns: truncated storage group name
"""
if hostName and len(hostName) > 38:
....
It seems this is what causes the masking view not to be found. In fc->terminate_connection(), there is:
In File (cinder-stable-queens/cinder/volume/drivers/dell_emc/vmax/fc.py Line 292)
if connector: zoning_mappings = self._get_zoning_mappings(volume, connector)
if zoning_mappings: self.common.terminate_connection(volume, connector)
data = self._cleanup_zones(zoning_mappings)
return data
If no zoning mappings are not found, then it skips calling the terminate_connection flow. So it responds like the detach worked, but it does not do anything.
The reason is that the _get_zoning_mappings() eventually gets to _get_masking_views_from_volume() to do the lookup. Since there is a shortened (mangled) 'host', this logic is happens for comparing hosts, but the host compare does not evaluate to true. The old masking veiw name had the full 26 character host name in it, and that does not compare to the now 16 char host name.
if host_compare: if host.lower() in mv.lower(): maskingview_list.append(mv)
This suggests the work around of passing an empty 'host' value in the connector. I tried this and the flow deadlocks with the following analysis:
common._remove_members()
-->masking.remove_and_reset_members()
-->masking._cleanup_deletion()
Loop over storage groups because no 'host':
-->masking.remove_volume_from_sg(storagegroup_name=OS-no_SLO-SG) -->do_remove_volume_from_sg(mv-sg) [lock on OS-no_SLO-SG] -->masking.multiple_vols_in_sg() -->masking.add_volume_to_default_storage_group(src_sg=<dft-sg>) [move=true flow] -->masking.get_or_create_default_storage_group() -->_move_vol_to_default_sg() [already there, deadlocks on OS-no_SLO-SG because that lock already held] -->rest.move_volume_between_storage_groups()
It tries to operate on the default storage group in a nested fashion causing the deadlock.
Therefore, it appears the driver will need to be fixed for the original case of passing the host on the connector so that the terminate flow is not skipped.
Code from Queens: stable- queens/ cinder/ volume/ drivers/ dell_emc/ vmax/utils. py Line 253) unique_ trunc_host( self, host_name):
In File (cinder-
def generate_
"""Create a unique short host name under 16 characters.
:param host_name: long host name
:returns: truncated host name
"""
if host_name and len(host_name) > 16:
...
Similar code from Ocata (SMI-S) level: stable- ocata/cinder/ volume/ drivers/ dell_emc/ vmax/utils. py Line 2547)
In File (cinder-
def generate_ unique_ trunc_host( self, hostName):
"""Create a unique short host name under 40 chars
:param sgName: long storage group name
:returns: truncated storage group name
"""
if hostName and len(hostName) > 38:
....
It seems this is what causes the masking view not to be found. In fc->terminate_ connection( ), there is: stable- queens/ cinder/ volume/ drivers/ dell_emc/ vmax/fc. py Line 292)
In File (cinder-
if connector:
zoning_ mappings = self._get_ zoning_ mappings( volume, connector)
if zoning_mappings:
self. common. terminate_ connection( volume, connector) zones(zoning_ mappings)
data = self._cleanup_
return data
If no zoning mappings are not found, then it skips calling the terminate_ connection flow. So it responds like the detach worked, but it does not do anything. mappings( ) eventually gets to _get_masking_ views_from_ volume( ) to do the lookup. Since there is a shortened (mangled) 'host', this logic is happens for comparing hosts, but the host compare does not evaluate to true. The old masking veiw name had the full 26 character host name in it, and that does not compare to the now 16 char host name.
The reason is that the _get_zoning_
if host_compare:
if host.lower() in mv.lower():
maskingview_ list.append( mv)
This suggests the work around of passing an empty 'host' value in the connector. I tried this and the flow deadlocks with the following analysis:
common. _remove_ members( ) remove_ and_reset_ members( ) masking. _cleanup_ deletion( ) ->masking. remove_ volume_ from_sg( storagegroup_ name=OS- no_SLO- SG)
-->do_ remove_ volume_ from_sg( mv-sg) [lock on OS-no_SLO-SG]
-->masking. multiple_ vols_in_ sg()
-- >masking. add_volume_ to_default_ storage_ group(src_ sg=<dft- sg>) [move=true flow]
-->masking. get_or_ create_ default_ storage_ group()
-->_move_ vol_to_ default_ sg() [already there, deadlocks on OS-no_SLO-SG because that lock already held]
--> rest.move_ volume_ between_ storage_ groups( )
-->masking.
-->
Loop over storage groups because no 'host':
-
It tries to operate on the default storage group in a nested fashion causing the deadlock.
Therefore, it appears the driver will need to be fixed for the original case of passing the host on the connector so that the terminate flow is not skipped.