Cinder

Tiramisu: After failover, active_backend_id is not set to secondary backend id

Bug #1773069 reported by Vivek Soni on 2018-05-24

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Cinder	Fix Released	Undecided	Vivek Soni

Bug Description

3PAR Tiramisu: After performing failover, attaching a volume points to primary array and it create a vlun in primary array instead of secondary

Steps:
----------------------
1) group create
    cinder group-create --name GROUP GVGT 3pariscsirep
2) create volumes
3) group update - addition of volumes in group
    cinder --os-volume-api-version=3.38 group-update --add-volumes <vol1,vol2> <group-name>
4) attach volume to a nova instance
4) perform group failover
    cinder --os-volume-api-version=3.38 group-failover-replication <group-id> --allow-attached-volume --secondary-backend-id <target_array>
5) detach and re-attach

re-attach the volume create a vlun in primary array instead of secondary

Tags:

Vivek Soni (viveksoni) on 2018-05-24

Changed in cinder:
assignee:	nobody → Vivek Soni (viveksoni)
status:	New → In Progress

Revision history for this message

Vivek Soni (viveksoni) wrote on 2018-06-04:

"active_backend_id" is not set to secondary array after fail over, therefore after attachment vluns are created in primary array only instead of secondary array

Revision history for this message

TommyLike (hu-husheng) wrote on 2018-06-04:

I am not sure whether this is designed for purpose, but now we don't have any attribute that stands for the group replication's **active_backend_id** (we have it in service for failover host mechanism.
We can either add this back into group object or ask driver to maintain this relationship. Anyway, this needs Xing Yang or anyone who is more familiar with group replication feature's feedback.

Vivek Soni (viveksoni) on 2018-06-04

summary:	- 3PAR Tiramisu: After failover, attaching to primary instead of secondary + Tiramisu: After failover, active_backend_id is not set to secondary + backend id
tags:	added: cinder removed: 3par

Revision history for this message

Xing Yang (xing-yang) wrote on 2018-06-04:

This works as designed. To change the "active_backend_id", you need to use the promote feature designed by the following spec: https://specs.openstack.org/openstack/cinder-specs/specs/newton/cheesecake-promote-backend.html. I'm not sure if anyone is working on the promote feature though.

Revision history for this message

Vivek Soni (viveksoni) wrote on 2018-06-04:

Hi Xing,

I am facing issue with Tiramisu replication.
'active_backend_id' is not set to failed-over backend i.e. secondary array

For example, arrat A is primary and array B is secondary, so after failover, i am expecting 'active_backend_id' value should be 'array B'
And this is not happening.

Revision history for this message

Vivek Soni (viveksoni) wrote on 2018-06-04:

With Cheesecake, after failing over to secondary array B, 'active_backend_id' is set properly as array B.

Revision history for this message

TommyLike (hu-husheng) wrote on 2018-06-05:

After a conversation with Xing Yang, she mentioned that the active_backend_id will change only when the whole backend fail over, so it seems that the driver need to keep which is primary and which is secondary for one specific replication group in this case:).

Revision history for this message

Imran Ansari (imran.ansari) wrote on 2018-06-05:

Driver can hold the information in-memory but the problem is, how do you persist it across service restarts or reboots? Looks like Cinder framework would need to maintain this information for each GVG.

Revision history for this message

Vivek Soni (viveksoni) wrote on 2018-06-05:

I think this info can be fetched from volume or group object. i am about to identify the code changes required for the same. I will update you on this.

Strange observation seen with cheesecake replication:
----------------------------------------------------------
In Queens setup, attached volume after failover, were able to DETACHED successfully from primary array and reattaching the volumes pointed to secondary backend & this was expected.

whereas in Master(Rocky) setup, attached volume after failover, FAILED to DETACH from primary array itself.

Looking at the HPE driver code from both releases, there seems to be hardly any difference around that flow. I suspect some changes in cinder framework resulting in this different observations amongst both releases.

Revision history for this message

Vivek Soni (viveksoni) wrote on 2018-06-13:

please ignore my comments w.r.t to cheesecake, there was some issue with setup
----------------------------------------------

In Group Replication(Tiramisu), decision on pointing to primary or secondary backend info can be based on the volume attribute (i.e. replication_driver_data).
This will work fine when performing the attachment after failover, but will not work with detachment after failover. please check the below scenario:

1) volume1 is in group1 which is having group_replication enabled
2) volume1 is attached to a nova instances ### here, replication_driver_data is 'None'
3) Performing a group fail-over
4) detaching ### here in terminate_connection, volume object 'replication_driver_data' will have secondary array ID & my configuration points to secondary array and my vluns were present in primary array and this will fail

one way to handle this, we will first try with deleting vlun with secondary array configuration if it fails then try with primary array configuration

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-06-15: Fix proposed to cinder (master)

#10

Fix proposed to branch: master
Review: https://review.openstack.org/575702

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-06-20: Fix merged to cinder (master)

#11

Reviewed: https://review.openstack.org/575702
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=a433c19ef09c2f3e44dd3dd23fa1283c401aaf30
Submitter: Zuul
Branch: master

commit a433c19ef09c2f3e44dd3dd23fa1283c401aaf30
Author: Vivek Soni <email address hidden>
Date: Fri Jun 15 01:31:17 2018 -0700

HPE3PAR: Fix pointing to backend in group failover

    Issue: After group failover, subsequent operations
    like attach & detach volume, which are part of that
    failed over group points to primary backend instead
    of secondary.

    This patch fixes the above issue by setting up the
    appropriate backend in case of subsequent operation
    on volume, which are part of group which is failed
    over.

Change-Id: I679b11317c91ad28cefdf995a8d6849dc71bc1c5
Closes-Bug: #1773069

Changed in cinder:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-06-21: Fix proposed to cinder (stable/queens)

#12

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/577044

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-06-22: Fix merged to cinder (stable/queens)

#13

Reviewed: https://review.openstack.org/577044
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=e6f24570c3b49ffbc7775add8a13223da685baf8
Submitter: Zuul
Branch: stable/queens

commit e6f24570c3b49ffbc7775add8a13223da685baf8
Author: Vivek Soni <email address hidden>
Date: Thu Jun 21 00:52:48 2018 -0400

HPE3PAR: Fix pointing to backend in group failover

    Issue: After group failover, subsequent operations
    like attach & detach volume, which are part of that
    failed over group points to primary backend instead
    of secondary.

    This patch fixes the above issue by setting up the
    appropriate backend in case of subsequent operation
    on volume, which are part of group which is failed
    over.

    Change-Id: I679b11317c91ad28cefdf995a8d6849dc71bc1c5
    Closes-Bug: #1773069
    (cherry picked from commit a433c19ef09c2f3e44dd3dd23fa1283c401aaf30)