OpenStack Heat

After updating a stack stuck IN_PROGRESS, resources will be permanently stuck IN_PROGRESS

Bug #1570576 reported by Zane Bitter on 2016-04-14

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Heat	Fix Released	High	Tanvir Talukder	OpenStack Heat ocata-3

Bug Description

If an engine dies in the middle of an update, often both the stack and one or more resources within it will be left in an UPDATE_IN_PROGRESS state. If the user then attempts to do another update, Heat will be able to steal the lock from the dead engine but the update itself will fail because the resource is in an IN_PROGRESS state. So far, so good. The problem is that also sets the state of the stack to UPDATE_FAILED, but *without* resetting the state of any resources it contains (unlike the reset_stack_status task that runs at startup to reset any zombie stacks).

This means that from this point on, the user can attempt to update the stack all they like but it will never succeed because of resources inside that are stuck IN_PROGRESS. Also, there is no way to resolve the situation: restarting heat-engine won't help because reset_stack_status looks only at *stacks* that are IN_PROGRESS, which they may no longer be.

See original description

Revision history for this message

Zane Bitter (zaneb) wrote on 2016-04-14:

Two notable circumstances where this would come up:

1. An engine dies and isn't restarted
2. bug 1570569

Changed in heat:
status:	New → Triaged
importance:	Undecided → High

Bathri Ajay Raj (bathri-s) on 2016-04-18

Changed in heat:
assignee:	nobody → Bathri Ajay Raj (bathri-s)

Zane Bitter (zaneb) on 2016-04-18

Changed in heat:
milestone:	none → newton-1

Zane Bitter (zaneb) on 2016-05-19

Changed in heat:
assignee:	Bathri Ajay Raj (bathri-s) → nobody

Rabi Mishra (rabi) on 2016-06-01

Changed in heat:
milestone:	newton-1 → newton-2

Thomas Herve (therve) on 2016-07-12

Changed in heat:
milestone:	newton-2 → newton-3

Thomas Herve (therve) on 2016-09-01

Changed in heat:
milestone:	newton-3 → next

Revision history for this message

Zane Bitter (zaneb) wrote on 2016-09-01:

Moving this to rc1 because I've heard a bunch of reports what I think is this bug (or something closely related) happening to people, particularly when they run out of file descriptors on the database.

Changed in heat:
milestone:	next → newton-rc1

Tanvir Talukder (tanvirt16) on 2016-09-03

Changed in heat:
assignee:	nobody → Tanvir Talukder (tanvirt16)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-07: Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/366979

Changed in heat:
status:	Triaged → In Progress

Zane Bitter (zaneb) on 2016-09-07

Changed in heat:
assignee:	Tanvir Talukder (tanvirt16) → nobody
status:	In Progress → Triaged

Thomas Herve (therve) on 2016-09-16

Changed in heat:
milestone:	newton-rc1 → ocata-1

Tanvir Talukder (tanvirt16) on 2016-09-19

Changed in heat:
assignee:	nobody → Tanvir Talukder (tanvirt16)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-22: Change abandoned on heat (master)

Change abandoned by Tanvir Talukder (<email address hidden>) on branch: master
Review: https://review.openstack.org/366979
Reason: Workaround already in place

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-28: Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/378987

Changed in heat:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-10-14:

Fix proposed to branch: master
Review: https://review.openstack.org/386741

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-10-14: Change abandoned on heat (master)

Change abandoned by Tanvir Talukder (<email address hidden>) on branch: master
Review: https://review.openstack.org/378987
Reason: Abandoning due to problems resolving merge conflict. New patch set is located here: https://review.openstack.org/#/c/386741/

Rabi Mishra (rabi) on 2016-11-17

Changed in heat:
milestone:	ocata-1 → ocata-2

Revision history for this message

huangtianhua (huangtianhua) wrote on 2016-12-15:

Now we provide two ways to reset the status:
1. restart engine service, then will reset the stack status to *_FAILED and the resources it contains if they are in-progess
2. provide heat-manage cmd to reset the stack status to *_FAILED and the resources it contains which are in-progress

So, not sure what's the problem this bug tracing?

Rabi Mishra (rabi) on 2016-12-15

Changed in heat:
milestone:	ocata-2 → ocata-3

Revision history for this message

Zane Bitter (zaneb) wrote on 2016-12-15:

I corrected the typo I made in the description which rendered it nonsensical. Sorry about that!

The problem that this bug is tracing is that the stack can get set to UPDATE_FAILED (by a subsequent update) while the resources are still *_IN_PROGRESS. After that, restarting heat-engine won't help because it only looks at stacks that are *_IN_PROGRESS.

I believe it's correct that the heat-manage command will now be able to reset the resource statuses. That still requires an admin to intervene though.

description:

updated

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-25: Fix merged to heat (master)

#10

Reviewed: https://review.openstack.org/386741
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=d6a90cc6ac1f49286b1c6a53f934d60a579da9bf
Submitter: Jenkins
Branch: master

commit d6a90cc6ac1f49286b1c6a53f934d60a579da9bf
Author: Tanvir Talukder <email address hidden>
Date: Wed Jan 4 11:27:04 2017 -0600

Fix for resources stuck in progress after engine crash

    When a stack is IN_PROGRESS and an UPDATE or RESTORE is called
    after an engine crash, we set status of the stack and all of its
    IN_PROGRESS resources to FAILED

Change-Id: Ia3adbfeff16c69719f9e5365657ab46a0932ec9b
Closes-Bug: #1570576

Changed in heat:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-01-26: Fix included in openstack/heat 8.0.0.0b3

#11

This issue was fixed in the openstack/heat 8.0.0.0b3 development milestone.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.