Dell PowerMax Live Migration Fails Without a Pool Name

Bug #2034937 reported by Jay Jahns
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
dell openstack engineering

Bug Description

Observed on 2023.1

In cinder, we have a volume type that only specifies the volume_backend_name. We use this type across multiple AZs that talk to different powermax arrays.

Because we don't use a pool_name in the extra specs, we need to configure the service level and workload in the cinder.conf. In our config we use the following variables:

powermax_array
powermax_srp
powermax_service_level
vmax_workload

We understand that Dell recommends using the pool_name, but we need to be able to validate the workflow on creating an instance that creates a volume, as you do not have the option of specifying volume type and Nova will use the default volume type.

We also have cross az attach set to false.

In this configuration, we are able to create instances that create volumes, and create volumes by themselves without issue.

When we try to live migrate, within the AZ, we experience failures. The traceback is below.

2023-09-07 23:27:09.964 33 ERROR cinder.volume.manager [req-e1f1da4a-653d-4001-bcaf-6555d3be5ff6 req-52ee0d71-2a2c-419b-9a6c-bbe1f56c307a bf5e54827fb63a5f1517efaaf4d9ed3729b7db2cac96ab84f20b01b8e37e37c6 486de587f28541b0be62a90ac6abb357 - - - -] Driver initialize connection failed (error: 'pool_name').: KeyError: 'pool_name'
2023-09-07 23:27:09.964 33 ERROR cinder.volume.manager Traceback (most recent call last):
2023-09-07 23:27:09.964 33 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/manager.py", line 4854, in _connection_create
2023-09-07 23:27:09.964 33 ERROR cinder.volume.manager conn_info = self.driver.initialize_connection(volume, connector)
2023-09-07 23:27:09.964 33 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/fc.py", line 288, in initialize_connection
2023-09-07 23:27:09.964 33 ERROR cinder.volume.manager device_info = self.common.initialize_connection(
2023-09-07 23:27:09.964 33 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/common.py", line 988, in initialize_connection
2023-09-07 23:27:09.964 33 ERROR cinder.volume.manager masking_view_dict = self.masking.pre_multiattach(
2023-09-07 23:27:09.964 33 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/masking.py", line 2044, in pre_multiattach
2023-09-07 23:27:09.964 33 ERROR cinder.volume.manager split_pool = extra_specs['pool_name'].split('+')
2023-09-07 23:27:09.964 33 ERROR cinder.volume.manager KeyError: 'pool_name'
2023-09-07 23:27:09.964 33 ERROR cinder.volume.manager
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server [req-e1f1da4a-653d-4001-bcaf-6555d3be5ff6 req-52ee0d71-2a2c-419b-9a6c-bbe1f56c307a bf5e54827fb63a5f1517efaaf4d9ed3729b7db2cac96ab84f20b01b8e37e37c6 486de587f28541b0be62a90ac6abb357 - - - -] Exception during message handling: cinder.exception.VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Driver initialize connection failed (error: 'pool_name').
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/manager.py", line 4854, in _connection_create
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server conn_info = self.driver.initialize_connection(volume, connector)
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/fc.py", line 288, in initialize_connection
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server device_info = self.common.initialize_connection(
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/common.py", line 988, in initialize_connection
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server masking_view_dict = self.masking.pre_multiattach(
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/masking.py", line 2044, in pre_multiattach
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server split_pool = extra_specs['pool_name'].split('+')
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server KeyError: 'pool_name'
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred:
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/manager.py", line 4911, in attachment_update
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server connection_info = self._connection_create(context,
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/manager.py", line 4860, in _connection_create
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server raise exception.VolumeBackendAPIException(data=err_msg)
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server cinder.exception.VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Driver initialize connection failed (error: 'pool_name').
2023-09-07 23:27:09.965 33 ERROR oslo_messaging.rpc.server

It appears that the line of code that is causing this problem is here:
https://github.com/openstack/cinder/blob/bdf0a3d52681fcfd6aa85bb01491f3a3f5557127/cinder/volume/drivers/dell_emc/powermax/masking.py#L2044

In order to correct this on our environment, I made an adhoc change that probably is not ideal, but restores the functionality:

        if 'pool_name' in extra_specs:
            split_pool = extra_specs['pool_name'].split('+')
        else:
            split_pool = [
                extra_specs[utils.SLO],
                extra_specs[utils.WORKLOAD],
                extra_specs[utils.SRP],
                extra_specs[utils.ARRAY]
            ]

Using this causes the driver to look at the configuration from the cinder.conf if pool_name is not present in the volume type. It also maintains the pool_name functionality.

We would like to get this updated ASAP as it impacts live migration ability in environments using PowerMax, and it is not documented that this function would fail without pool_name specified. Live migration is a core function that is required for our environment.

Revision history for this message
Jay Jahns (jjahns) wrote :

I found another area where this is broken:

2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager [None req-5bd7be95-6eed-4828-ab16-be3d99de3c54 bf5e54827fb63a5f1517efaaf4d9ed3729b7db2cac96ab84f20b01b8e37e37c6 486de587f28541b0be62a90ac6abb357 - - - -] Volume b77e3484-5acb-4151-bc6f-6200c28bdd97: driver error when trying to retype, falling back to generic mechanism.: AttributeError: 'NoneType' object has no attribute 'split'
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager Traceback (most recent call last):
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/manager.py", line 3095, in retype
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager ret = self.driver.retype(context,
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/fc.py", line 652, in retype
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager return self.common.retype(volume, new_type, host)
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/common.py", line 3984, in retype
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager return self._slo_workload_migration(device_id, volume, host,
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/common.py", line 4007, in _slo_workload_migration
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager do_change_compression = (self.utils.change_compression_type(
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/utils.py", line 561, in change_compression_type
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager is_target_compr_disabled = self.is_compression_disabled(extra_specs)
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/utils.py", line 545, in is_compression_disabled
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager __, __, service_level, __ = self.parse_specs_from_pool_name(
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib64/python3.9/site-packages/cinder/volume/drivers/dell_emc/powermax/utils.py", line 2095, in parse_specs_from_pool_name
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager pool_details = pool_name.split('+')
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager AttributeError: 'NoneType' object has no attribute 'split'
2023-09-08 21:37:55.164 33 ERROR cinder.volume.manager

This was during a volume re-type action, where we were changing the type to one without a pool_name. It would appear we are not looking at the conf values during these actions, yet the re-type continued and was successful.

Revision history for this message
Jean Pierre Roquesalane (jproque15130) wrote :

Can you add in the description the command sequence and steps you are running through when facing the issue?

Revision history for this message
Cuiye Liu (cuiyeliu) wrote :

Hi, Jay Jahns, Can you add in the description the command sequence and steps you are running through when facing the issue?

Revision history for this message
Jay Jahns (jjahns) wrote :

Hi - the steps I conducted for this are as follows. All of the actions are done through the UI, however the CLI commands to execute are identical in this regard.

* Create a volume from an image using volume type without pool name (uses the defaults from the backend config)

* Mark volume bootable

* Attach volume to new instance

* Conduct a live migration of instance to another host

Observation:

All of the volume creation, attachment steps complete without issue. Logs indicate that pool name was not specified, so the default is set to Diamond/NONE

Once a live migration occurs, the trace as mentioned is generated. It appears that when we are trying to map the volume to the other compute node, a masking view needs to be created and it relies solely on the existence of extra_specs['pool_name']. Since that does not exist, the live migration fails.

Further analysis indicates that when we set the extra specs, we check if pool_name was there. If it was not, we establish the service level and workload from the conf file; but we do not create a pool_name at this point.

Artificially setting a pool_name using the defaults will prevent this behavior from occurring because it now exists.

Revision history for this message
Jean Pierre Roquesalane (jproque15130) wrote :

Thank you, taking a closer look now it's confirmed.

Changed in cinder:
status: New → Confirmed
Changed in cinder:
assignee: nobody → dell openstack engineering (dell-openstack)
Changed in cinder:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/898188
Committed: https://opendev.org/openstack/cinder/commit/9a470d41f4d2cd389a0f85e461d87ed5a3a664e3
Submitter: "Zuul (22348)"
Branch: master

commit 9a470d41f4d2cd389a0f85e461d87ed5a3a664e3
Author: cuiyeliu <email address hidden>
Date: Mon Oct 16 04:21:11 2023 +0000

    PowerMax: Allow live migration without pool name

    This change is to update the live migration ability in environments
    using PowerMax. In previous 2023.1 version, the live migration fails
    without a pool name.
    The update adds the ability of live migration without a pool name.

    Change-Id: Iad767cd516c8527136508470629236f68e0c7cc2
    Closes-Bug: #2034937

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/2024.1)

Fix proposed to branch: stable/2024.1
Review: https://review.opendev.org/c/openstack/cinder/+/914582

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/2024.1)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/914582
Committed: https://opendev.org/openstack/cinder/commit/1aa541d02445031945f06a58c77c6c3fb23909e6
Submitter: "Zuul (22348)"
Branch: stable/2024.1

commit 1aa541d02445031945f06a58c77c6c3fb23909e6
Author: cuiyeliu <email address hidden>
Date: Mon Oct 16 04:21:11 2023 +0000

    PowerMax: Allow live migration without pool name

    This change is to update the live migration ability in environments
    using PowerMax. In previous 2023.1 version, the live migration fails
    without a pool name.
    The update adds the ability of live migration without a pool name.

    Change-Id: Iad767cd516c8527136508470629236f68e0c7cc2
    Closes-Bug: #2034937
    (cherry picked from commit 9a470d41f4d2cd389a0f85e461d87ed5a3a664e3)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.