Dell PowerFlex (Scaleio) connector doesn't handle volume disconnection/unmapping properly

Bug #2034685 reported by Mateusz Janowicz
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
os-brick
In Progress
Undecided
dell openstack engineering

Bug Description

When using Dell PowerFlex (Scaleio) as storage backend for Cinder Volume an issue occurs, when having multiple instances (booted from PowerFlex volumes), doing a hard reboot simultaneously not all VMs start.
Having 60 instances sometimes it is seen, that few instances end up in ERROR state.

for vm in $(openstack server list -f value -c ID); do (openstack server reboot --hard ${vm} &); done

Error in nova compute log for some VMs:
2023-08-11 11:00:57.593 7 ERROR nova.virt.libvirt.driver [req-0a02cc37-6310-46ab-bc61-0b36e552ee32 f8c08487e84e4d3da0fa29cbe72c5dff a6195
d0db62844dc9e58c635f0ae42de - default default] [instance: 67d4f1bc-1899-44da-a384-dd1923146438] Failed to start libvirt guest: libvirt.libvirtError: Cannot access storage file '/dev/disk/by-id/emc-vol-000c86e01b
13e30f-363722d500000046': No such file or directory

The problem is that the os-brick scaleio connector currently doesn't handle the volume unmapping properly.
This means it doesn't wait the necessary amount of time for the old links to disappear.

A potential fix should include a waiting time until the old symlink completely disappears by volume disconnection / unmap before it proceeds further to create a new one and so on....

Revision history for this message
Jean Pierre Roquesalane (jproque15130) wrote :

Hello,

Can you provide more details on your versions and environment?

Revision history for this message
Jon Bernard (jbernard) wrote :

Can you provide the versions of OS / brick / nova, etc and describe your environment? It would help to be able to reproduce this efficiently.

Revision history for this message
Mateusz Janowicz (matjan) wrote (last edit ):

Hi,

Some details below:

OS:

│NAME="SLES" │VERSION="15-SP4" │VERSION_ID="15.4" │PRETTY_NAME="SUSE Linux Enterprise Server 15 SP4"

OpenStack release: Victoria

Dell PowerFlex 3.6.1000 used as Cinder-Volume backend.
PowerFlex used in a hyperconverged form as 5 node cluster.

Let me know what else do you need.

Revision history for this message
Mateusz Janowicz (matjan) wrote :

Hi, do we have any updates?

Revision history for this message
Mateusz Janowicz (matjan) wrote :

Hi, have you been able to reproduce the issue?

Changed in os-brick:
assignee: nobody → dell openstack engineering (dell-openstack)
Revision history for this message
Nilesh Thathagar (nileshthathagar) wrote :

Hello,

Will try to reproduce the issue.

Have a Question,

Do you want this issue needs to fix into Yoga release or Latest release?

Revision history for this message
Mateusz Janowicz (matjan) wrote :

Hi,

I think it needs to be included in the latest release anyways.

Revision history for this message
Nilesh Thathagar (nileshthathagar) wrote :

Hello,

I am successfully able to reproduce the issue in the Antelope release. I will start working on fixing the part.

Changed in os-brick:
status: New → Confirmed
status: Confirmed → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.