It has been observed that, despite the synchronicity and uniformness being enabled, the loss of one of the arrays - be it primary or secondary, network issue (e.g. no reachability) or storage issue (e.g. pod down) - causes the Cinder driver to stop working in some or even all aspects (details depending on the particular failure that was being simulated).
This applied to both the instance-facing functionalities, such as attaching and detaching the volumes, and
general management ones, such as CRUD on volumes, or even the basic ability to restart the cinder-
volume service - the driver would fail repeatedly.
The ugly workaround is to reconfigure cinder-volume and restart it each time there was a problem with storage array, which isinflexible and prevents automatic failure recovery.
Fix proposed to branch: master /review. opendev. org/c/openstack /cinder/ +/855060
Review: https:/