Resource failure causes nested stacks to be rolled back

Bug #1475057 reported by Zane Bitter
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
High
Rico Lin
Kilo
Won't Fix
High
Unassigned

Bug Description

The fix for bug 1446252 was to issue an update_cancel RPC call for a nested stacks whenever an update operation was cancelled in the parent stack. Unfortunately this always triggers a rollback. Previously (in Juno), if an update of a nested stack was cancelled then the disable_rollback flag was respected - which meant never rolling back since the disable_rollback flag is always True for nested stacks (since the parent stack will do an update with the previous template if it wants to roll back a change). The main downside to this is that it leaves the stack in a half-finished state, but that is after all what the user requested.

Since the patch, perhaps fortunately, was accidentally left out of Kilo, in Kilo the update is neither cancelled nor rolled back (that is, bug 1446252 still exists). This sucks because once a resource fails, we have to wait for all stacks in the tree to finish on their own or time out before we can issue another top-level stack update.

In Liberty the patch has landed, so the nested stack will always be rolled back unless we fix it. This could be a big problem for e.g. TripleO, where it is not uncommon for an individual resource to fail and we really don't want to roll back any sibling stacks as a result.

The ideal solution, as usual, is phase 1 of convergence, since in that case there is no need to do anything to nested stacks except when the user requests a rollback - if the user issues a subsequent update it will be accepted with no danger of a locking error.

In the meantime, I suspect the best thing is a change to cancel any in-progress update but not roll back.

Revision history for this message
Steven Hardy (shardy) wrote :

+1 I agree the desired behaviour is just to cancel any in-progress update (like when a stack timeout occurs) and let the operator retry, with bonus points to enable the top-level rollback flag to be respected (but not rollback by default).

Rico Lin (rico-lin)
Changed in heat:
assignee: nobody → Rico Lin (rico-lin)
Changed in heat:
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/206506

Changed in heat:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/207744

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/207745

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/207744
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=6536588272d33a119f81ad724de9149e1ccca3cc
Submitter: Jenkins
Branch: master

commit 6536588272d33a119f81ad724de9149e1ccca3cc
Author: ricolin <email address hidden>
Date: Fri Jul 31 10:38:54 2015 +0800

    Add cancel_with_rollback flag to stack cancel update

    Add cancel_with_rollback flag to deside rollback or not when stack update
    cancelled.

    Change-Id: I351a46c2e9add3f26fc42f09678c6c3a3c97475d
    Partial-Bug: #1475057

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/207745
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=9c65cd14bb78b5611ce07f823a4f0bd09355261a
Submitter: Jenkins
Branch: master

commit 9c65cd14bb78b5611ce07f823a4f0bd09355261a
Author: ricolin <email address hidden>
Date: Sat Aug 1 00:25:18 2015 +0800

    refactoring update_task

    Function update_task will be too complex if we add another exception inside,
    but this is required by bug 1475057.
    Require small refactoring.
    Change-Id: Idec60343b2b859b466514400de9bb1ce22879d96
    Partial-Bug: #1475057

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/206506
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=b76f6ec0e97a9c3aaf5b5140e7a449c9636b8f95
Submitter: Jenkins
Branch: master

commit b76f6ec0e97a9c3aaf5b5140e7a449c9636b8f95
Author: ricolin <email address hidden>
Date: Sat Aug 1 00:52:57 2015 +0800

    resource failure causes nested stacks to be rolled back

    an update_cancel RPC call for a nested stacks whenever an update
    operation was cancelled in the parent stack.Unfortunately this always
    triggers a rollback.
    In this change, resource failure with nested stacks will cancel any
    in-progress update but not roll back by default (roll back only when
    rollblack cancel exception raised).
    The Operator can retry in this way without waiting for timeout
    or overall roll back occurred.
    Change-Id: I94d75a21367f39c17a9b40b5d23405a28873cd1a
    Closes-Bug: #1475057

Changed in heat:
status: In Progress → Fix Committed
Zane Bitter (zaneb)
tags: added: kilo-backport-potential
Changed in heat:
status: Fix Committed → Fix Released
Angus Salkeld (asalkeld)
tags: removed: kilo-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/225525

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/225526

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/225528

Revision history for this message
Angus Salkeld (asalkeld) wrote :

this has rpc api changes so not backporting.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on heat (stable/kilo)

Change abandoned by Angus Salkeld (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/225525

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Angus Salkeld (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/225526

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Angus Salkeld (<email address hidden>) on branch: stable/kilo
Review: https://review.openstack.org/225528

Thierry Carrez (ttx)
Changed in heat:
milestone: liberty-3 → 5.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.