Update/delete hangs if previous update times out

Bug #1721654 reported by Zane Bitter
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
High
Zane Bitter

Bug Description

If convergence is enabled and the following sequence of events occurs:

1) The user initiates a stack update (or create), and one or more resources are taking a long time to complete.
2) The user initiates a second stack update before those resources are completed
3) Any of those resources eventually time out because they were still IN_PROGRESS when hitting the stack timeout from the original update.

then the *second* update will never complete, but hang IN_PROGRESS forever.

When the initial update releases the lock on the resource, it should retrigger the latest traversal if that resource is ready, but it does not in the case that it times out.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/509921

Changed in heat:
status: Triaged → In Progress
Zane Bitter (zaneb)
summary: - Update/delete hangs if previous update times out or is cancelled
+ Update/delete hangs if previous update times out
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to heat (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/513181

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to heat (master)

Reviewed: https://review.openstack.org/513181
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=6a9672a26443c901f4a465c86992ecece3f73bbd
Submitter: Zuul
Branch: master

commit 6a9672a26443c901f4a465c86992ecece3f73bbd
Author: Zane Bitter <email address hidden>
Date: Wed Oct 18 16:46:39 2017 -0400

    Make scheduler.Timeout exception hashable

    The python standard library in Python 3.6.3 and earlier has a bug with
    handling unhashable exceptions: https://bugs.python.org/issue28603

    Although oslo_log will catch the error, make scheduler.Timeout hashable so
    that all exceptions will be printable.

    Prior to 640abe0c12e63c207fcf67592838f112a29f5b43 we just used __cmp__(),
    but that isn't used in Python 3. Defining __eq__(), which is required for
    the total_ordering decorator, makes the class unhashable in Python 3.

    Change-Id: Idde65b2d41490ab8318b5a8b95ea74e9b96b4e5c
    Related-Bug: #1724366
    Related-Bug: #1721654

Changed in heat:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/509921
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=bb330ae1a6c907ec2a8b8c198b7268d0abec3b43
Submitter: Zuul
Branch: master

commit bb330ae1a6c907ec2a8b8c198b7268d0abec3b43
Author: Zane Bitter <email address hidden>
Date: Wed Oct 18 16:46:39 2017 -0400

    Retrigger new traversals after resource timeout

    If a resource times out, we still need to check whether there is a new
    traversal underway that we need to retrigger, otherwise the new traversal
    will never complete.

    Change-Id: I4ac7ac88797b7fb14046b5668649b2277ee55517
    Closes-Bug: #1721654

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/heat 10.0.0.0b1

This issue was fixed in the openstack/heat 10.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.