We have a customer using PowerMax cinder driver of stable/yoga branch and PowerMax storage [0] with array model 'PowerMax_8000'.
[0] PowerMax details
2023-04-20 07:47:05.666 1618765 INFO cinder.volume.drivers.dell_emc.powermax.rest [req-b73fc9b4-39ab-435e-837b-d01b9feb0584 - - - - -] Unisphere version V9.2.3.22 meets minimum requirement of version 9.2.0.0.
In customer executions, they are repeating below 2 sequence of operations.
1. Concurrent deploy of 3 VMs (dappt0068, dappt0069, dwebt004) each with 2 disks( 1 boot volume and 1 data volume) each.
volume-dappt0068_alt_disk
volume-dappt0068-boot
volume-dappt0069_alt_disk
volume-dappt0069-boot
volume-dwebt0044_alt_disk
volume-dwebt0044-boot
2. Delete the above deployed 3 VMs concurrently once they are deployed.
With OS-no_SLO-SG as default storage group, after some repeated runs of above 2 sequence of operations, they were seeing volumes stuck in attaching during deploys and detaching during delete VMs.
While analyzing the logs, we found couple of issues with locking on default storage group[2.c], [2.d], [3]
Issue 1:
After some above cycles of execution, during VM deploy, one of the volumes attach operation failed to create initiator group [1] because of exists initiator group(which was created from previous runs and not cleaned up). Due to this, it is going through rollback flow [2] where it is trying to take lock on default storage group twice [2.c], [2.d], causing it to hang, then it is not allowing further attach/detach of volumes.
[1] create initiator group failed.
2023-04-20 10:57:36.862 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Found initiator group OS-t_v3v47b1240def5-F-IG, but could not find initiator_names ['c050760c299702de', 'c050760c299702df', 'c050760c299702e2', 'c050760c299702e3', 'c050760c299702dc', 'c050760c299702dd', 'c050760c299702e0', 'c050760c299702e1'] in the login table. The contained initiators ['c050760c299702cc', 'c050760c299702cd', 'c050760c299702ce', 'c050760c299702cf', 'c050760c299702d0', 'c050760c299702d1', 'c050760c299702d2', 'c050760c299702d3'] do match up with those in the connector object. Delete initiator group OS-t_v3v47b1240def5-F-IG and retry.
2023-04-20 10:57:36.875 1646354 INFO cinder.volume.drivers.dell_emc.powermax.rest [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] GET request to httpspvc://172.30.115.229:8443/univmax/restapi/92/sloprovisioning/symmetrix/000197602020/maskingview/OS-t_v3v47b1240def5-F-PG4-MV has returned with a status code of: 404.
2023-04-20 10:57:36.876 1646354 INFO cinder.volume.drivers.dell_emc.powermax.rest [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Masking view details in get_element_from_masking_view are None
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.rest [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Error retrieving masking group.
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Masking View creation or retrieval was not successful for masking view OS-t_v3v47b1240def5-F-PG4-MV. Attempting rollback.: cinder.exception.VolumeBackendAPIException: Error retrieving masking group.
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking Traceback (most recent call last):
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/masking.py", line 108, in get_or_create_masking_view_and_map_lun
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking self.rest.get_element_from_masking_view(
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/rest.py", line 1999, in get_element_from_masking_view
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking raise exception.VolumeBackendAPIException(
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking cinder.exception.VolumeBackendAPIException: Error retrieving masking group.
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking
We have a customer using PowerMax cinder driver of stable/yoga branch and PowerMax storage [0] with array model 'PowerMax_8000'.
[0] PowerMax details
2023-04-20 07:47:05.666 1618765 INFO cinder. volume. drivers. dell_emc. powermax. rest [req-b73fc9b4- 39ab-435e- 837b-d01b9feb05 84 - - - - -] Unisphere version V9.2.3.22 meets minimum requirement of version 9.2.0.0.
2023-04-20 07:47:05.678 1618765 INFO powervc_ cinder. volume. drivers. emc.vmax_ product_ rest [req-b73fc9b4- 39ab-435e- 837b-d01b9feb05 84 - - - - -] PowerVC common extensions loaded. Array_info: {'RestServerIp': '*********, 'RestServerPort': 8443, 'RestUserName': 'powervc', 'SerialNumber': '*********', 'srpName': 'SRP_1', 'SSLVerify': '**********', 'PortGroups': ['PG1', 'PG2', 'PG3', 'PG4', 'SRDF_RM_Test'], 'is_V3': True, 'generation': 'V4', 'firmware': '5978.711.711', 'array_model': 'PowerMax_8000'}
2023-04-20 07:47:09.555 1618791 INFO cinder.service [req-b73fc9b4- 39ab-435e- 837b-d01b9feb05 84 - - - - -] Starting cinder-volume node (version 18.0.0) volume. manager [req-b73fc9b4- 39ab-435e- 837b-d01b9feb05 84 - - - - -] Starting volume driver VMaxRestProduct Driver (4.4.0)
2023-04-20 07:47:09.674 1618791 INFO cinder.
In customer executions, they are repeating below 2 sequence of operations.
1. Concurrent deploy of 3 VMs (dappt0068, dappt0069, dwebt004) each with 2 disks( 1 boot volume and 1 data volume) each.
volume- dappt0068_ alt_disk dappt0068- boot
volume-
volume- dappt0069_ alt_disk dappt0069- boot
volume-
volume- dwebt0044_ alt_disk dwebt0044- boot
volume-
2. Delete the above deployed 3 VMs concurrently once they are deployed.
With OS-no_SLO-SG as default storage group, after some repeated runs of above 2 sequence of operations, they were seeing volumes stuck in attaching during deploys and detaching during delete VMs.
While analyzing the logs, we found couple of issues with locking on default storage group[2.c], [2.d], [3]
Issue 1:
After some above cycles of execution, during VM deploy, one of the volumes attach operation failed to create initiator group [1] because of exists initiator group(which was created from previous runs and not cleaned up). Due to this, it is going through rollback flow [2] where it is trying to take lock on default storage group twice [2.c], [2.d], causing it to hang, then it is not allowing further attach/detach of volumes.
[1] create initiator group failed.
2023-04-20 10:57:36.862 1646354 ERROR cinder. volume. drivers. dell_emc. powermax. masking [req-249b24fd- 0388-4dd2- 9178-78855e6d5d 7c d4ca8b69730ffe7 ee8b6f229874e4b 9e4e33ec957df14 18d87f9287d692b 3e28 f65345a5ee994f4 8a6571c43180f85 a1 - - -] Found initiator group OS-t_v3v47b1240 def5-F- IG, but could not find initiator_names ['c050760c29970 2de', 'c050760c299702df', 'c050760c299702e2', 'c050760c299702e3', 'c050760c299702dc', 'c050760c299702dd', 'c050760c299702e0', 'c050760c299702e1'] in the login table. The contained initiators ['c050760c29970 2cc', 'c050760c299702cd', 'c050760c299702ce', 'c050760c299702cf', 'c050760c299702d0', 'c050760c299702d1', 'c050760c299702d2', 'c050760c299702d3'] do match up with those in the connector object. Delete initiator group OS-t_v3v47b1240 def5-F- IG and retry. volume. drivers. dell_emc. powermax. rest [req-249b24fd- 0388-4dd2- 9178-78855e6d5d 7c d4ca8b69730ffe7 ee8b6f229874e4b 9e4e33ec957df14 18d87f9287d692b 3e28 f65345a5ee994f4 8a6571c43180f85 a1 - - -] GET request to httpspvc: //172.30. 115.229: 8443/univmax/ restapi/ 92/sloprovision ing/symmetrix/ 000197602020/ maskingview/ OS-t_v3v47b1240 def5-F- PG4-MV has returned with a status code of: 404. volume. drivers. dell_emc. powermax. rest [req-249b24fd- 0388-4dd2- 9178-78855e6d5d 7c d4ca8b69730ffe7 ee8b6f229874e4b 9e4e33ec957df14 18d87f9287d692b 3e28 f65345a5ee994f4 8a6571c43180f85 a1 - - -] Masking view details in get_element_ from_masking_ view are None volume. drivers. dell_emc. powermax. rest [req-249b24fd- 0388-4dd2- 9178-78855e6d5d 7c d4ca8b69730ffe7 ee8b6f229874e4b 9e4e33ec957df14 18d87f9287d692b 3e28 f65345a5ee994f4 8a6571c43180f85 a1 - - -] Error retrieving masking group. volume. drivers. dell_emc. powermax. masking [req-249b24fd- 0388-4dd2- 9178-78855e6d5d 7c d4ca8b69730ffe7 ee8b6f229874e4b 9e4e33ec957df14 18d87f9287d692b 3e28 f65345a5ee994f4 8a6571c43180f85 a1 - - -] Masking View creation or retrieval was not successful for masking view OS-t_v3v47b1240 def5-F- PG4-MV. Attempting rollback.: cinder. exception. VolumeBackendAP IException: Error retrieving masking group. volume. drivers. dell_emc. powermax. masking Traceback (most recent call last): volume. drivers. dell_emc. powermax. masking File "/usr/lib/ python3. 9/site- packages/ cinder/ volume/ drivers/ dell_emc/ powermax/ masking. py", line 108, in get_or_ create_ masking_ view_and_ map_lun volume. drivers. dell_emc. powermax. masking self.rest. get_element_ from_masking_ view( volume. drivers. dell_emc. powermax. masking File "/usr/lib/ python3. 9/site- packages/ cinder/ volume/ drivers/ dell_emc/ powermax/ rest.py" , line 1999, in get_element_ from_masking_ view volume. drivers. dell_emc. powermax. masking raise exception. VolumeBackendAP IException( volume. drivers. dell_emc. powermax. masking cinder. exception. VolumeBackendAP IException: Error retrieving masking group. volume. drivers. dell_emc. powermax. masking
2023-04-20 10:57:36.875 1646354 INFO cinder.
2023-04-20 10:57:36.876 1646354 INFO cinder.
2023-04-20 10:57:36.876 1646354 ERROR cinder.
2023-04-20 10:57:36.876 1646354 ERROR cinder.
2023-04-20 10:57:36.876 1646354 ERROR cinder.
2023-04-20 10:57:36.876 1646354 ERROR cinder.
2023-04-20 10:57:36.876 1646354 ERROR cinder.
2023-04-20 10:57:36.876 1646354 ERROR cinder.
2023-04-20 10:57:36.876 1646354 ERROR cinder.
2023-04-20 10:57:36.876 1646354 ERROR cinder.
2023-04-20 10:57:36.876 1646354 ERROR cinder.
[2] rollback flow:
[2.a] /github. com/openstack/ cinder/ blob/stable/ yoga/cinder/ volume/ drivers/ dell_emc/ powermax/ masking. py#L118- L119 [github.com] [github.com] [github.com] [github.com]
à https:/
self.check_ if_rollback_ action_ for_masking_ required(
serial_number, volume, device_id, rollback_dict)
[2.b] /github. com/openstack/ cinder/ blob/stable/ yoga/cinder/ volume/ drivers/ dell_emc/ powermax/ masking. py#L1028- L1032 [github.com] [github.com] [github.com] [github.com]
self. remove_ and_reset_ members(
serial_ number, volume, device_id,
rollback_ dict[utils. VOL_NAME] ,
rollback_ dict[utils. EXTRA_SPECS] , reset,
rollback_ dict[utils. CONNECTOR] )
à https:/
[2.c] /github. com/openstack/ cinder/ blob/stable/ yoga/cinder/ volume/ drivers/ dell_emc/ powermax/ masking. py#L1324- L1325 [github.com] [github.com] [github.com] [github.com]
@coordinat ion.synchronize d("emc- sg-{sg_ name}-{ serial_ number} ") volume_ from_sg( sg_name, serial_number):
à https:/
def do_remove_
2023-04-20 10:57:37.121 1646354 INFO cinder.coordination [req-249b24fd- 0388-4dd2- 9178-78855e6d5d 7c d4ca8b69730ffe7 ee8b6f229874e4b 9e4e33ec957df14 18d87f9287d692b 3e28 f65345a5ee994f4 8a6571c43180f85 a1 - - -] Lock "/var/lib/ cinder/ cinder- emc-sg- OS-no_SLO- SG-000197602020 " acquired by "do_remove_ volume_ from_sg" :: waited 0.000s
[2.d] /github. com/openstack/ cinder/ blob/stable/ yoga/cinder/ volume/ drivers/ dell_emc/ powermax/ masking. py#L1663- L1667 [github.com] [github.com] [github.com] [github.com]
à https:/
def _move_vol_