Comment 0 for bug 2021999

Revision history for this message
prashkre (prashkre) wrote :

We have a customer using PowerMax cinder driver of stable/yoga branch and PowerMax storage [0] with array model 'PowerMax_8000'.

[0] PowerMax details

2023-04-20 07:47:05.666 1618765 INFO cinder.volume.drivers.dell_emc.powermax.rest [req-b73fc9b4-39ab-435e-837b-d01b9feb0584 - - - - -] Unisphere version V9.2.3.22 meets minimum requirement of version 9.2.0.0.

2023-04-20 07:47:05.678 1618765 INFO powervc_cinder.volume.drivers.emc.vmax_product_rest [req-b73fc9b4-39ab-435e-837b-d01b9feb0584 - - - - -] PowerVC common extensions loaded. Array_info: {'RestServerIp': '*********, 'RestServerPort': 8443, 'RestUserName': 'powervc', 'SerialNumber': '*********', 'srpName': 'SRP_1', 'SSLVerify': '**********', 'PortGroups': ['PG1', 'PG2', 'PG3', 'PG4', 'SRDF_RM_Test'], 'is_V3': True, 'generation': 'V4', 'firmware': '5978.711.711', 'array_model': 'PowerMax_8000'}

2023-04-20 07:47:09.555 1618791 INFO cinder.service [req-b73fc9b4-39ab-435e-837b-d01b9feb0584 - - - - -] Starting cinder-volume node (version 18.0.0)
2023-04-20 07:47:09.674 1618791 INFO cinder.volume.manager [req-b73fc9b4-39ab-435e-837b-d01b9feb0584 - - - - -] Starting volume driver VMaxRestProductDriver (4.4.0)

In customer executions, they are repeating below 2 sequence of operations.

1. Concurrent deploy of 3 VMs (dappt0068, dappt0069, dwebt004) each with 2 disks( 1 boot volume and 1 data volume) each.

volume-dappt0068_alt_disk
volume-dappt0068-boot

volume-dappt0069_alt_disk
volume-dappt0069-boot

volume-dwebt0044_alt_disk
volume-dwebt0044-boot

2. Delete the above deployed 3 VMs concurrently once they are deployed.

With OS-no_SLO-SG as default storage group, after some repeated runs of above 2 sequence of operations, they were seeing volumes stuck in attaching during deploys and detaching during delete VMs.
While analyzing the logs, we found couple of issues with locking on default storage group[2.c], [2.d], [3]

Issue 1:

After some above cycles of execution, during VM deploy, one of the volumes attach operation failed to create initiator group [1] because of exists initiator group(which was created from previous runs and not cleaned up). Due to this, it is going through rollback flow [2] where it is trying to take lock on default storage group twice [2.c], [2.d], causing it to hang, then it is not allowing further attach/detach of volumes.

[1] create initiator group failed.

2023-04-20 10:57:36.862 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Found initiator group OS-t_v3v47b1240def5-F-IG, but could not find initiator_names ['c050760c299702de', 'c050760c299702df', 'c050760c299702e2', 'c050760c299702e3', 'c050760c299702dc', 'c050760c299702dd', 'c050760c299702e0', 'c050760c299702e1'] in the login table. The contained initiators ['c050760c299702cc', 'c050760c299702cd', 'c050760c299702ce', 'c050760c299702cf', 'c050760c299702d0', 'c050760c299702d1', 'c050760c299702d2', 'c050760c299702d3'] do match up with those in the connector object. Delete initiator group OS-t_v3v47b1240def5-F-IG and retry.
2023-04-20 10:57:36.875 1646354 INFO cinder.volume.drivers.dell_emc.powermax.rest [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] GET request to httpspvc://172.30.115.229:8443/univmax/restapi/92/sloprovisioning/symmetrix/000197602020/maskingview/OS-t_v3v47b1240def5-F-PG4-MV has returned with a status code of: 404.
2023-04-20 10:57:36.876 1646354 INFO cinder.volume.drivers.dell_emc.powermax.rest [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Masking view details in get_element_from_masking_view are None
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.rest [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Error retrieving masking group.
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Masking View creation or retrieval was not successful for masking view OS-t_v3v47b1240def5-F-PG4-MV. Attempting rollback.: cinder.exception.VolumeBackendAPIException: Error retrieving masking group.
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking Traceback (most recent call last):
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/masking.py", line 108, in get_or_create_masking_view_and_map_lun
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking self.rest.get_element_from_masking_view(
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking File "/usr/lib/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/rest.py", line 1999, in get_element_from_masking_view
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking raise exception.VolumeBackendAPIException(
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking cinder.exception.VolumeBackendAPIException: Error retrieving masking group.
2023-04-20 10:57:36.876 1646354 ERROR cinder.volume.drivers.dell_emc.powermax.masking

[2] rollback flow:

[2.a]
à https://github.com/openstack/cinder/blob/stable/yoga/cinder/volume/drivers/dell_emc/powermax/masking.py#L118-L119 [github.com] [github.com] [github.com] [github.com]

self.check_if_rollback_action_for_masking_required(
serial_number, volume, device_id, rollback_dict)

[2.b]
à https://github.com/openstack/cinder/blob/stable/yoga/cinder/volume/drivers/dell_emc/powermax/masking.py#L1028-L1032 [github.com] [github.com] [github.com] [github.com]
            self.remove_and_reset_members(
                serial_number, volume, device_id,
                rollback_dict[utils.VOL_NAME],
                rollback_dict[utils.EXTRA_SPECS], reset,
                rollback_dict[utils.CONNECTOR])

[2.c]
à https://github.com/openstack/cinder/blob/stable/yoga/cinder/volume/drivers/dell_emc/powermax/masking.py#L1324-L1325 [github.com] [github.com] [github.com] [github.com]
            @coordination.synchronized("emc-sg-{sg_name}-{serial_number}")
            def do_remove_volume_from_sg(sg_name, serial_number):

2023-04-20 10:57:37.121 1646354 INFO cinder.coordination [req-249b24fd-0388-4dd2-9178-78855e6d5d7c d4ca8b69730ffe7ee8b6f229874e4b9e4e33ec957df1418d87f9287d692b3e28 f65345a5ee994f48a6571c43180f85a1 - - -] Lock "/var/lib/cinder/cinder-emc-sg-OS-no_SLO-SG-000197602020" acquired by "do_remove_volume_from_sg" :: waited 0.000s

[2.d]
à https://github.com/openstack/cinder/blob/stable/yoga/cinder/volume/drivers/dell_emc/powermax/masking.py#L1663-L1667 [github.com] [github.com] [github.com] [github.com]

            @coordination.synchronized(
                "emc-sg-{default_sg_name}-{serial_number}")
            def _move_vol_to_default_sg(default_sg_name, serial_number,
                                        device_id):