[Pure Storage] Missing replication pod can cause driver failure on restart

Bug #2035404 reported by Simon Dodsley
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Simon Dodsley

Bug Description

If synchronous replication is configured, under certain circumstances, if the replication pod defined by the parameter `pure_replication_pod_name` does not exist on the backend at driver restart. Forexample, if someone accidentally deletes the pod, this will cause the restart of the driver to fail.

The driver should be capable of correctly recreating the pod.

Error traceback example:

Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager Traceback (most recent call last):
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager File "/opt/stack/cinder/cinder/volume/drivers/pure.py", line 2864, in _get_current_array
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager pod_info = target_array.get_pod(self._replication_pod_name)
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager File "/opt/stack/.local/lib/python3.10/site-packages/purestorage/purestorage.py", line 3291, in get_pod
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager return self._request("GET", "pod/{0}".format(pod), kwargs)
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager File "/opt/stack/cinder/cinder/volume/drivers/pure.py", line 1884, in wrapper
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager ret = fn(*args, **kwargs)
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager File "/opt/stack/.local/lib/python3.10/site-packages/purestorage/purestorage.py", line 202, in _request
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager raise PureHTTPError(self._target, str(self._rest_version), response)
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager purestorage.purestorage.PureHTTPError: PureHTTPError status code 400 returned by REST version 1.19 at 10.21.228.28: BAD REQUEST
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager [{"msg": "Pod does not exist.", "ctx": "simon-cinder"}]
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager During handling of the above exception, another exception occurred:
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager Traceback (most recent call last):
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager File "/opt/stack/cinder/cinder/volume/manager.py", line 524, in _init_host
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager self.driver.do_setup(ctxt)
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager File "/opt/stack/cinder/cinder/volume/drivers/pure.py", line 432, in do_setup
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager self.do_setup_replication()
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager File "/opt/stack/cinder/cinder/volume/drivers/pure.py", line 525, in do_setup_replication
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager self._setup_replicated_pods(
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager File "/opt/stack/cinder/cinder/volume/drivers/pure.py", line 200, in wrapper
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager backend_name = driver._get_current_array().backend_id
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager File "/opt/stack/cinder/cinder/volume/drivers/pure.py", line 2877, in _get_current_array
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager raise purestorage.PureError('No functional arrays '
Sep 13 20:36:13 sn1-pool-c07-03 cinder-volume[2654547]: ERROR cinder.volume.manager purestorage.purestorage.PureError: PureError: No functional arrays available

summary: - [Pure Storage] Replication Failover with sync-repl cvolumes faiure
+ [Pure Storage] Missing replication pod can cause driver failure on
+ restart
Changed in cinder:
assignee: nobody → Simon Dodsley (simon-dodsley)
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/895008

Changed in cinder:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by "Simon Dodsley <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/895008
Reason: This code is going to be included in https://review.opendev.org/c/openstack/cinder/+/895731

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/895731
Committed: https://opendev.org/openstack/cinder/commit/e1d93531b94dfcfecb3cac08cfd1d849f29db467
Submitter: "Zuul (22348)"
Branch: master

commit e1d93531b94dfcfecb3cac08cfd1d849f29db467
Author: Simon Dodsley <email address hidden>
Date: Mon Sep 18 15:14:20 2023 -0400

    [Pure Storage] Enable sync repl volume creation during failover

    Currently when cinder failover is invoked, due to the primary
    storage backend being down, it is not possible, through the
    driver, to create a new volume with sync replication
    functionality. Non-replicated and async replicated volumes can
    be created in this scenario - although not recommended due to
    potential issues after failback.

    A synchronously replicated volume could be safely created
    during failover as the Pure Storage architecture can allow
    this to happen. When the failed array is available again, any
    new sync replication volumes created during the outage will be
    automatically recovered by the backend's own internal systems.

    This patch updates the driver to check, during volume creation,
    if the backend is in failover mode and then allow sync volumes
    to be correctly created, even though the primary array could be
    inaccessible. Sync volume attachment will also be allowed to
    continue should one of the backend replica pair arrays be down.

    Creating different replication volume types have been tested
    both failover and failback scenarios in Pure's labs and this
    patch has proved to work as expected.

    Additionally included is work from abandoned
    change I7ed3ebd7fec389870edad0c1cc07ac553854dd8a, which
    resolves replication issues in A/A deployments.

    Also, fixes bug where a deleted replication pod can cause the
    driver to fail on restart.

    Closes-Bug: #2035404
    Change-Id: I58f0f10b63431896e7532b16b561683cd242e9ee

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 24.0.0.0rc1

This issue was fixed in the openstack/cinder 24.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.