Cinder

Replication freeze doesn't work as expected

Bug #1616974 reported by Gorka Eguileor on 2016-08-25

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Cinder	Fix Released	Medium	Gorka Eguileor

Bug Description

Freeze functionality in the replication feature doesn't work as expected.

According to the specs [1]: "Freeze option provided as an argument to Failover The failover command includes a “freeze” option. This option indicates that a volume may still be read or written to, HOWEVER that we will not allow any additional resource create or delete options until an admin issues a “thaw” command."

Which is ratified by our devref documentation [2]: "freeze_backend Puts a backend host/service into a R/O state for the control plane. For example if a failover is issued, it is likely desirable that while data access to existing volumes is maintained, it likely would not be wise to continue doing things like creates, deletes, extends etc."

But we are not freezing half the operations we say we do. We can do deletes, create snapshots, delete snapshots...

The only mechanism that has been implemented is to disable the service in the scheduler, which means that only operations that go through the scheduler are frozen.

[1]: https://specs.openstack.org/openstack/cinder-specs/specs/mitaka/cheesecake.html
[2]: http://docs.openstack.org/developer/cinder/devref/replication.html

Sean McGinnis (sean-mcginnis) on 2016-08-25

Changed in cinder:
status:	New → Confirmed
importance:	Undecided → Medium

Gorka Eguileor (gorka) on 2016-11-25

Changed in cinder:
assignee:	nobody → Gorka Eguileor (gorka)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-25: Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/402922

Changed in cinder:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-25: Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/402923

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-19: Related fix merged to cinder (master)

Reviewed: https://review.openstack.org/402922
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=73603d524814d5e417dad4c40d3667b703c58105
Submitter: Jenkins
Branch: master

commit 73603d524814d5e417dad4c40d3667b703c58105
Author: Gorka Eguileor <email address hidden>
Date: Fri Nov 25 14:26:23 2016 +0100

Move service and cluster creation in test to utils

    In this patch we move service and cluster creation methods from the
    tests to the test utils file so they can be easily reused by other tests
    that need to create them.

    This change is required by the patch that fixes the replication freeze
    mechanism but wasn't included in that patch to facilitate the review by
    splitting the 2 different concepts: moving these convenience methods to
    test utils and fixing the freeze mechanism.

Related-Bug: #1616974
Change-Id: I7d8552f38e9495f72a5c1af61f4f57b3b4683157

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-19: Fix merged to cinder (master)

Reviewed: https://review.openstack.org/402923
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=2195885e7710adfc6c4be944f22e5cc01a2361e8
Submitter: Jenkins
Branch: master

commit 2195885e7710adfc6c4be944f22e5cc01a2361e8
Author: Gorka Eguileor <email address hidden>
Date: Fri Nov 25 15:56:51 2016 +0100

Fix replication freeze mechanism

    Freeze functionality in the replication feature doesn't work as
    expected, since it is not being used on the scheduler to exclude
    backends or used on the API or volume nodes so API-to-Vol operations
    like delete and create snapshot will also work.

    This patch fixes the freeze mechanism by excluding frozen backends in
    the scheduler and checking the if the service is frozen on all other
    modifying operations.

Since extend operation now goes through the scheduler it will be frozen
there.

Closes-Bug: #1616974
Change-Id: I4561500746c95b96136878ddfde8ca88e96b28c6

Changed in cinder:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-27: Fix included in openstack/cinder 10.0.0.0b3

This issue was fixed in the openstack/cinder 10.0.0.0b3 development milestone.

Revision history for this message

Gorka Eguileor (gorka) wrote on 2017-03-21:

The fix for this issue has a bug, so it needs to be corrected and the status cannot be considered closed anymore.

Changed in cinder:
status:	Fix Released → Confirmed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-03-21: Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/448147

Changed in cinder:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-05-03: Fix merged to cinder (master)

Reviewed: https://review.openstack.org/448147
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=66efc6c8d93cc6fc9f533ec442f238dfa5d0b132
Submitter: Jenkins
Branch: master

commit 66efc6c8d93cc6fc9f533ec442f238dfa5d0b132
Author: Gorka Eguileor <email address hidden>
Date: Tue Mar 21 16:08:09 2017 +0100

Fix host check in is_backend_frozen

    When we fixed the freeze mechanism we introduced a DB function called
    ``is_backend_frozen`` that is used to check in the DB whether a resource
    (volume, group, etc.) is frozen or not.

    The issue is that this function is not taking into consideration that in
    some cases host/cluster_name may be comming with the pool information,
    so the check will fail.

This patch fixes this by removing the pool information from host and
cluster_name parameters before doing the check in the DB.

TrivialFix

Change-Id: Ie776adf9e746cf4cb7a2856d64d0b94423149b8d
Closes-Bug: #1616974