IBM SVf : Retype of iogrp property of an 'in-use' volume causes VM to crash

Bug #2055149 reported by Jagdish Choudhary
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
In Progress
Undecided
Harsh Ailani

Bug Description

Steps to recreate -

1- Deploy an instance/ VM on volume type IOgroup 0
2- Now run retype on that volume to IOgroup 1

It fails , there are two issues -

i) Since both IO groups aren't part of Host group , it fails with following error -

2024-02-20 03:58:35.954 1082062 ERROR cinder.volume.drivers.ibm.storwize_svc.storwize_svc_common [req-706511be-c3ab-4daf-b0e8-af7f4051f8fe 1b521bc6260b4d78a0c136bcc3f67b6d f319756b42f342c8a31aea4516f34e69 - - -] Error has occurred: Unexpected error while running command.
Command: svctask addvdiskaccess -iogrp 1 "volume-ABC-4d54854a-00000008-boot-0-44c19820-ec48"
Exit code: 1
Stdout: ''
Stderr: 'CMMVC8997E The host does not belong to one or more of the IO groups specified or inferred.\n'

On storwize

IBM_Storwize:c387f15u41_V7000:superuser>lshostiogrp ABC-15ca9ed5-00000014-98949120
id name
0 io_grp0
IBM_Storwize:c387f15u41_V7000:superuser>

ii) To overcome this problem , attach new disk to the VM from other volume type which is using IOgroup1

Now run volume retype , it remove disk access to old iogrp , hence VM looses path to disk and leads to crash .

Revision history for this message
Harsh Ailani (harshailani) wrote :

Analysis:
Currently, IBM SVf driver's retype operation runs the below procedure to change the iogrp property of a volume:

Issue the following command: addvdiskaccess – To add the access of the volume to iogrp1
Issue the following command: movevdisk – Move the volume to the new iogrp1 but the volume still has access to iogrp0
Issue the following command: rmvdiskaccess – To remove access from the old I/O group , i.e., iogrp0

In the above process, as soon as the rmvdiskaccess is called, the bootable disk looses access of the VM. Due to this the VM goes into read-only mode as there is no path to the OS disk.

RCA:
This happens because when the volume is shifted to a new iogrp1, the VM needs user intervention to run the discovery of the volume again to create and store the new path for the volume through iogrp1.
After the discovery is completed, and now the VM knows that the OS-volume is access through paths of iogrp0 and iogrp1, only then the rmvdiskaccess should be executed to remove the old path of the old iogrp0.

Currently, our retype operation doesn’t give the user to run discovery for the new iogrp1 of the volume and directly runs rmvdiskaccess.

Resolution:
Part-1: Need to check if the VM has access to iogrp1 or not. If not, then need to first give iogrp1 access to the VM.

Part-2: Need to stop the control before rmvdiskaccess for the user to intervene and run discovery for the new iogrp1 of the volume.

Part-3: Once the discovery is completed need to run the rmvdiskaccess for the old iogrp0 of the volume.

Changed in cinder:
assignee: nobody → Harsh Ailani (harshailani)
summary: - Retype of in-use storage causes OS to crash
+ IBM SVf : Retype of iogrp property of an 'in-use' volume causes VM to
+ crash
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/912060

Changed in cinder:
status: New → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.