Dell PowerFlex (Scaleio) connector doesn't handle volume disconnection/unmapping properly

Bug #2034685 reported by Mateusz Janowicz
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
os-brick
Fix Released
Undecided
dell openstack

Bug Description

When using Dell PowerFlex (Scaleio) as storage backend for Cinder Volume an issue occurs, when having multiple instances (booted from PowerFlex volumes), doing a hard reboot simultaneously not all VMs start.
Having 60 instances sometimes it is seen, that few instances end up in ERROR state.

for vm in $(openstack server list -f value -c ID); do (openstack server reboot --hard ${vm} &); done

Error in nova compute log for some VMs:
2023-08-11 11:00:57.593 7 ERROR nova.virt.libvirt.driver [req-0a02cc37-6310-46ab-bc61-0b36e552ee32 f8c08487e84e4d3da0fa29cbe72c5dff a6195
d0db62844dc9e58c635f0ae42de - default default] [instance: 67d4f1bc-1899-44da-a384-dd1923146438] Failed to start libvirt guest: libvirt.libvirtError: Cannot access storage file '/dev/disk/by-id/emc-vol-000c86e01b
13e30f-363722d500000046': No such file or directory

The problem is that the os-brick scaleio connector currently doesn't handle the volume unmapping properly.
This means it doesn't wait the necessary amount of time for the old links to disappear.

A potential fix should include a waiting time until the old symlink completely disappears by volume disconnection / unmap before it proceeds further to create a new one and so on....

Revision history for this message
Jean Pierre Roquesalane (jproque15130) wrote :

Hello,

Can you provide more details on your versions and environment?

Revision history for this message
Jon Bernard (jbernard) wrote :

Can you provide the versions of OS / brick / nova, etc and describe your environment? It would help to be able to reproduce this efficiently.

Revision history for this message
Mateusz Janowicz (matjan) wrote (last edit ):

Hi,

Some details below:

OS:

│NAME="SLES" │VERSION="15-SP4" │VERSION_ID="15.4" │PRETTY_NAME="SUSE Linux Enterprise Server 15 SP4"

OpenStack release: Victoria

Dell PowerFlex 3.6.1000 used as Cinder-Volume backend.
PowerFlex used in a hyperconverged form as 5 node cluster.

Let me know what else do you need.

Revision history for this message
Mateusz Janowicz (matjan) wrote :

Hi, do we have any updates?

Revision history for this message
Mateusz Janowicz (matjan) wrote :

Hi, have you been able to reproduce the issue?

Changed in os-brick:
assignee: nobody → dell openstack engineering (dell-openstack)
Revision history for this message
Nilesh Thathagar (nileshthathagar) wrote :

Hello,

Will try to reproduce the issue.

Have a Question,

Do you want this issue needs to fix into Yoga release or Latest release?

Revision history for this message
Mateusz Janowicz (matjan) wrote :

Hi,

I think it needs to be included in the latest release anyways.

Revision history for this message
Nilesh Thathagar (nileshthathagar) wrote :

Hello,

I am successfully able to reproduce the issue in the Antelope release. I will start working on fixing the part.

Changed in os-brick:
status: New → Confirmed
status: Confirmed → In Progress
Revision history for this message
Nilesh Thathagar (nileshthathagar) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to os-brick (master)

Reviewed: https://review.opendev.org/c/openstack/os-brick/+/916062
Committed: https://opendev.org/openstack/os-brick/commit/267895ce071abc246f9d5b484734eb3b7b01abfb
Submitter: "Zuul (22348)"
Branch: master

commit 267895ce071abc246f9d5b484734eb3b7b01abfb
Author: Nilesh Thathagar <email address hidden>
Date: Thu Apr 18 11:32:20 2024 +0000

    Dell PowerFlex: Added retry after disconnect volume

    Implemented a retry mechanism post-volume
    disconnection to ensure reliability.
    Integrated a function execution to validate
    the existence of the volume, safeguarding
    against connectivity issues arising from the
    removal of the old path during a hard reboot.

    Closes-Bug: #2034685
    Change-Id: Iae1fd01f85bc429c7976e71fb9eeb8dbda795789

Changed in os-brick:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to os-brick (stable/2024.1)

Fix proposed to branch: stable/2024.1
Review: https://review.opendev.org/c/openstack/os-brick/+/926929

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/os-brick 6.9.0

This issue was fixed in the openstack/os-brick 6.9.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.