Activity log for bug #2021999

Date Who What changed Old value New value Message
2023-05-31 18:50:28 prashkre bug added bug
2023-05-31 18:53:24 prashkre description We have a customer using PowerMax cinder driver of stable/yoga branch and PowerMax storage [0] with array model 'PowerMax_8000'. [0] PowerMax details 2023-04-20 07:47:05.666 1618765 INFO cinder.volume.drivers.dell_emc.powermax.rest [req-b73fc9b4-39ab-435e-837b-d01b9feb0584 - - - - -] Unisphere version V9.2.3.22 meets minimum requirement of version 9.2.0.0. 2023-04-20 07:47:05.678 1618765 INFO powervc_cinder.volume.drivers.emc.vmax_product_rest [req-b73fc9b4-39ab-435e-837b-d01b9feb0584 - - - - -] PowerVC common extensions loaded. Array_info: {'RestServerIp': '*********, 'RestServerPort': 8443, 'RestUserName': 'powervc', 'SerialNumber': '*********', 'srpName': 'SRP_1', 'SSLVerify': '**********', 'PortGroups': ['PG1', 'PG2', 'PG3', 'PG4', 'SRDF_RM_Test'], 'is_V3': True, 'generation': 'V4', 'firmware': '5978.711.711', 'array_model': 'PowerMax_8000'} 2023-04-20 07:47:09.555 1618791 INFO cinder.service [req-b73fc9b4-39ab-435e-837b-d01b9feb0584 - - - - -] Starting cinder-volume node (version 18.0.0) 2023-04-20 07:47:09.674 1618791 INFO cinder.volume.manager [req-b73fc9b4-39ab-435e-837b-d01b9feb0584 - - - - -] Starting volume driver VMaxRestProductDriver (4.4.0) In customer executions, they are repeating below 2 sequence of operations. 1. Concurrent deploy of 3 VMs (dappt0068, dappt0069, dwebt004) each with 2 disks( 1 boot volume and 1 data volume) each. volume-dappt0068_alt_disk volume-dappt0068-boot volume-dappt0069_alt_disk volume-dappt0069-boot volume-dwebt0044_alt_disk volume-dwebt0044-boot 2. Delete the above deployed 3 VMs concurrently once they are deployed. With OS-no_SLO-SG as default storage group, after some repeated runs of above 2 sequence of operations, they were seeing volumes stuck in attaching during deploys and detaching during delete VMs. While analyzing the logs, we found couple of issues with locking on default storage group[2.c], [2.d], [3] Issue 1: After some above cycles of execution, during VM deploy, one of the volumes attach operation failed to create initiator group [1] because of exists initiator group(which was created from previous runs and not cleaned up). Due to this, it is going through rollback flow [2] where it is trying to take lock on default storage group twice [2.c], [2.d], causing it to hang, then it is not allowing further attach/detach of volumes. [1] create initiator group failed. 2023-04-20 10:57:36.862 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Found initiator group OS-t_v3v47b1240def5-F-IG, but could not find initiator_names ['c050760c299702de', 'c050760c299702df', 'c050760c299702e2', 'c050760c299702e3', 'c050760c299702dc', 'c050760c299702dd', 'c050760c299702e0', 'c050760c299702e1'] in the login table. The contained initiators ['c050760c299702cc', 'c050760c299702cd', 'c050760c299702ce', 'c050760c299702cf', 'c050760c299702d0', 'c050760c299702d1', 'c050760c299702d2', 'c050760c299702d3'] do match up with those in the connector object. Delete initiator group OS-t_v3v47b1240def5-F-IG and retry. 2023-04-20 10:57:36.875 1646354 INFO cinder.volume.drivers.dell_emc.powermax.rest [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] GET request to httpspvc://172.30.115.229:8443/univmax/restapi/92/sloprovisioning/symmetrix/000197602020/maskingview/OS-t_v3v47b1240def5-F-PG4-MV has returned with a status code of: 404. 2023-04-20 10:57:36.876 1646354 INFO cinder.volume.drivers.dell_emc.powermax.rest [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Masking view details in get_element_from_masking_view are None 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.rest [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Error retrieving masking group. 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Masking View creation or retrieval was not successful for masking view OS-t_v3v47b1240def5-F-PG4-MV. Attempting rollback.: cinder.exception.VolumeBackendAPIException: Error retrieving masking group. 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking Traceback (most recent call last): 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/masking.py", line 108, in get_or_create_masking_view_and_map_lun 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking self.rest.get_element_from_masking_view( 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/rest.py", line 1999, in get_element_from_masking_view 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking raise exception.VolumeBackendAPIException( 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking cinder.exception.VolumeBackendAPIException: Error retrieving masking group. 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking [2] rollback flow: [2.a] à https://github.com/openstack/cinder/blob/stable/yoga/cinder/volume/drivers/dell_emc/powermax/masking.py#L118-L119 [github.com] [github.com] [github.com] [github.com] self.check_if_rollback_action_for_masking_required( serial_number, volume, device_id, rollback_dict) [2.b] à https://github.com/openstack/cinder/blob/stable/yoga/cinder/volume/drivers/dell_emc/powermax/masking.py#L1028-L1032 [github.com] [github.com] [github.com] [github.com] self.remove_and_reset_members( serial_number, volume, device_id, rollback_dict[utils.VOL_NAME], rollback_dict[utils.EXTRA_SPECS], reset, rollback_dict[utils.CONNECTOR]) [2.c] à https://github.com/openstack/cinder/blob/stable/yoga/cinder/volume/drivers/dell_emc/powermax/masking.py#L1324-L1325 [github.com] [github.com] [github.com] [github.com] @coordination.synchronized("emc-sg-{sg_name}-{serial_number}") def do_remove_volume_from_sg(sg_name, serial_number): 2023-04-20 10:57:37.121 1646354 INFO cinder.coordination [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Lock "/var/lib/cinder/cinder-emc-sg-OS-no_SLO-SG-000197602020" acquired by "do_remove_volume_from_sg" :: waited 0.000s [2.d] à https://github.com/openstack/cinder/blob/stable/yoga/cinder/volume/drivers/dell_emc/powermax/masking.py#L1663-L1667 [github.com] [github.com] [github.com] [github.com] @coordination.synchronized( "emc-sg-{default_sg_name}-{serial_number}") def _move_vol_to_default_sg(default_sg_name, serial_number, device_id): We have a customer using PowerMax cinder driver of stable/yoga branch and PowerMax storage [0] with array model 'PowerMax_8000'. [0] PowerMax details 2023-04-20 07:47:05.666 1618765 INFO cinder.volume.drivers.dell_emc.powermax.rest [req-b73fc9b4-39ab-435e-837b-d01b9feb0584 - - - - -] Unisphere version V9.2.3.22 meets minimum requirement of version 9.2.0.0. 2023-04-20 07:47:05.678 1618765 INFO powervc_cinder.volume.drivers.emc.vmax_product_rest [req-b73fc9b4-39ab-435e-837b-d01b9feb0584 - - - - -] PowerVC common extensions loaded. Array_info: {'RestServerIp': '*********, 'RestServerPort': 8443, 'RestUserName': 'powervc', 'SerialNumber': '*********', 'srpName': 'SRP_1', 'SSLVerify': '**********', 'PortGroups': ['PG1', 'PG2', 'PG3', 'PG4', 'SRDF_RM_Test'], 'is_V3': True, 'generation': 'V4', 'firmware': '5978.711.711', 'array_model': 'PowerMax_8000'} 2023-04-20 07:47:09.555 1618791 INFO cinder.service [req-b73fc9b4-39ab-435e-837b-d01b9feb0584 - - - - -] Starting cinder-volume node (version 18.0.0) 2023-04-20 07:47:09.674 1618791 INFO cinder.volume.manager [req-b73fc9b4-39ab-435e-837b-d01b9feb0584 - - - - -] Starting volume driver VMaxRestProductDriver (4.4.0) In customer executions, they are repeating below 2 sequence of operations. 1. Concurrent deploy of 3 VMs (dappt0068, dappt0069, dwebt004) each with 2 disks( 1 boot volume and 1 data volume) each. volume-dappt0068_alt_disk volume-dappt0068-boot volume-dappt0069_alt_disk volume-dappt0069-boot volume-dwebt0044_alt_disk volume-dwebt0044-boot 2. Delete the above deployed 3 VMs concurrently once they are deployed. With OS-no_SLO-SG as default storage group, after some repeated runs of above 2 sequence of operations, they were seeing volumes stuck in attaching during deploys and detaching during delete VMs. While analyzing the logs, we found couple of issues with locking on default storage group[2.c], [2.d], [3] Issue 1: After some above cycles of execution, during VM deploy, one of the volumes attach operation failed to create initiator group [1] because of exists initiator group(which was created from previous runs and not cleaned up). Due to this, it is going through rollback flow [2] where it is trying to take lock on default storage group twice [2.c], [2.d], causing it to hang, then it is not allowing further attach/detach of volumes. [1] create initiator group failed. 2023-04-20 10:57:36.862 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Found initiator group OS-t_v3v47b1240def5-F-IG, but could not find initiator_names ['c050760c299702de', 'c050760c299702df', 'c050760c299702e2', 'c050760c299702e3', 'c050760c299702dc', 'c050760c299702dd', 'c050760c299702e0', 'c050760c299702e1'] in the login table. The contained initiators ['c050760c299702cc', 'c050760c299702cd', 'c050760c299702ce', 'c050760c299702cf', 'c050760c299702d0', 'c050760c299702d1', 'c050760c299702d2', 'c050760c299702d3'] do match up with those in the connector object. Delete initiator group OS-t_v3v47b1240def5-F-IG and retry. 2023-04-20 10:57:36.875 1646354 INFO cinder.volume.drivers.dell_emc.powermax.rest [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] GET request to httpspvc://172.30.115.229:8443/univmax/restapi/92/sloprovisioning/symmetrix/000197602020/maskingview/OS-t_v3v47b1240def5-F-PG4-MV has returned with a status code of: 404. 2023-04-20 10:57:36.876 1646354 INFO cinder.volume.drivers.dell_emc.powermax.rest [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Masking view details in get_element_from_masking_view are None 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.rest [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Error retrieving masking group. 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Masking View creation or retrieval was not successful for masking view OS-t_v3v47b1240def5-F-PG4-MV. Attempting rollback.: cinder.exception.VolumeBackendAPIException: Error retrieving masking group. 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking Traceback (most recent call last): 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/masking.py", line 108, in get_or_create_masking_view_and_map_lun 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking self.rest.get_element_from_masking_view( 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/rest.py", line 1999, in get_element_from_masking_view 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking raise exception.VolumeBackendAPIException( 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking cinder.exception.VolumeBackendAPIException: Error retrieving masking group. 2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking [2] rollback flow: [2.a] à https://github.com/openstack/cinder/blob/stable/yoga/cinder/volume/drivers/dell_emc/powermax/masking.py#L118-L119 [github.com] [github.com] [github.com] [github.com] self.check_if_rollback_action_for_masking_required( serial_number, volume, device_id, rollback_dict) [2.b] à https://github.com/openstack/cinder/blob/stable/yoga/cinder/volume/drivers/dell_emc/powermax/masking.py#L1028-L1032 [github.com] [github.com] [github.com] [github.com]             self.remove_and_reset_members(                 serial_number, volume, device_id,                 rollback_dict[utils.VOL_NAME],                 rollback_dict[utils.EXTRA_SPECS], reset,                 rollback_dict[utils.CONNECTOR]) [2.c] à https://github.com/openstack/cinder/blob/stable/yoga/cinder/volume/drivers/dell_emc/powermax/masking.py#L1324-L1325 [github.com] [github.com] [github.com] [github.com]             @coordination.synchronized("emc-sg-{sg_name}-{serial_number}")             def do_remove_volume_from_sg(sg_name, serial_number): 2023-04-20 10:57:37.121 1646354 INFO cinder.coordination [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Lock "/var/lib/cinder/cinder-emc-sg-OS-no_SLO-SG-000197602020" acquired by "do_remove_volume_from_sg" :: waited 0.000s [2.d] à https://github.com/openstack/cinder/blob/stable/yoga/cinder/volume/drivers/dell_emc/powermax/masking.py#L1663-L1667 [github.com] [github.com] [github.com] [github.com]             @coordination.synchronized(                 "emc-sg-{default_sg_name}-{serial_number}")             def _move_vol_to_default_sg(default_sg_name, serial_number,                                         device_id): ________________________________________________________________________________________ Issue 2: -------------- In concurrent delete VMs, we found that delete of volume with device_id 00208 acquired a lock on "/var/lib/cinder/cinder-emc-sg-OS-no_SLO-SG-000197602020” [3] and never released, we didn’t see any logs related to response for below API. Due to this further detach operations on other volumes are hung as it is the common lock. GET request to httpspvc://***********:8443/univmax/restapi/92/sloprovisioning/symmetrix/<<array_serial_number>>/volume/00208 [3] https://github.com/openstack/cinder/blob/stable/yoga/cinder/volume/drivers/dell_emc/powermax/masking.py#L1360-L1363 @coordination.synchronized("emc-mv-{parent_name}-{serial_number}") @coordination.synchronized("emc-mv-{mv_name}-{serial_number}") @coordination.synchronized("emc-sg-{sg_name}-{serial_number}") def do_remove_volume_from_sg( mv_name, sg_name, parent_name, serial_number): 2023-04-29 13:00:21.765 3785904 INFO cinder.coordination [req-d0fe3a3f-6fe3-47c8-ace3-25e13b0f2de9 0688b01e6439ca32d698d20789d52169126fb41fb1a4ddafcebb97d854e836c9 f65345a5ee994f48a6571c43180f85a1 - - -] Lock "/var/lib/cinder/cinder-emc-sg-OS-no_SLO-SG-000197602020" acquired by "do_remove_volume_from_sg" :: waited 36.105s
2023-06-01 00:09:46 OpenStack Infra cinder: status New In Progress
2023-06-07 12:06:06 Sofia Enriquez tags drivers powermax
2023-06-07 12:07:04 Sofia Enriquez cinder: importance Undecided Low
2023-06-12 16:56:11 prashkre cinder: assignee prashkre (prashkre)