OpenStack Heat

Convergence: cancel update doesn't immediately cancel the operation

Bug #1533176 reported by Anant Patil on 2016-01-12

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Heat	Triaged	Wishlist	Unassigned	OpenStack Heat no-milestone-taged-bugs

Bug Description

With convergence engine, the cancel stack-update requests takes time to finish. The resources already in progress are not interrupted and the threads keep running until either the resource completes or fails. If a user issues stack update, the provisioning of updated resource will have wait for the old resource in progress to complete or the stack to timeout. This could be affect the usability of heat upto some extent, since the user has to now wait and cannot cancel the running threads from old requests.

Tags:

Anant Patil (ananta) on 2016-01-12

Changed in heat:
assignee:	nobody → Anant Patil (ananta)

Revision history for this message

Anant Patil (ananta) wrote on 2016-01-13:

I am planning to fix this by making the check_resource from worker as a scheduler.wrappertask. The check_resource will also register an event before starting and after every step it will check the state of event. If the event is ready with a cancel message, the check_resource should stop. This is similar to how the existing stack-cancel-update works.

When the stack-cancel-update request is received, the cancel message is broad-casted to each engine worker and the registered events are sent cancel message. The next step() of check_resource should see this event message and bail out.

Revision history for this message

Anant Patil (ananta) wrote on 2016-01-13:

Effectively, all the check_resource running in all engine workers for a stack should stop when they receive a cancel message from RPC. This will also depend on whether a broadcast message can be sent using oslo.messaging.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-02-12: Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/279406

Anant Patil (ananta) on 2016-02-15

Changed in heat:
status:	New → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-08: Change abandoned on heat (master)

Change abandoned by Anant Patil (<email address hidden>) on branch: master
Review: https://review.openstack.org/279406
Reason: It is considered to be a bad idea to broadcast messages to all the heat engine to cancel workers. It would be better to have a thread polling the DB to know if cancel was requested (by updating the current traversal) instead of broadcasting. Will upload another patch to address the bug.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-05: Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/301483

OpenStack Infra (hudson-openstack) on 2016-04-12

Changed in heat:
assignee:	Anant Patil (ananta) → Rakesh H S (rh-s)

OpenStack Infra (hudson-openstack) on 2016-05-19

Changed in heat:
assignee:	Rakesh H S (rh-s) → Anant Patil (ananta)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-07-06: Fix merged to heat (master)

Reviewed: https://review.openstack.org/279406
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=459086f984140210d6350136bd1c4eb5e44c96b1
Submitter: Jenkins
Branch: master

commit 459086f984140210d6350136bd1c4eb5e44c96b1
Author: Anant Patil <email address hidden>
Date: Fri Feb 12 13:16:27 2016 +0530

Convergence: Cancel message

Implements a cancel message sending mechanism.

A cancel message is sent to heat engines working on the stack.

Change-Id: I3b529addbd02a79364f7f2a041fc87d5019dd5d9
Patial-Bug: #1533176

Zane Bitter (zaneb) on 2017-07-31

Changed in heat:
assignee:	Anant Patil (ananta) → nobody
status:	In Progress → New
importance:	Undecided → Wishlist

Revision history for this message

Zane Bitter (zaneb) wrote on 2017-07-31:

If we stop threads in the middle of a resource update then the resource will be left in an undeterminable state. (That's true regardless of whether we kill the thread at an arbitrary time or send it a message that it checks for between steps of a heat.engine.scheduler co-routine - although the latter is much better.) This isn't usually what we want because the resource goes into a FAILED state and will get replaced later.

The exception is nested stacks - we should start the rollback of any of those that are IN_PROGRESS immediately, rather than waiting for them to complete and then rolling them back, as we currently do.

What would also be nice is to have a command to allow the user to cancel a particular resource. We discussed re-using the mark-unhealthy command for that purpose in this thread: http://lists.openstack.org/pipermail/openstack-dev/2016-August/102016.html

Changed in heat:
status:	New → Triaged

Rico Lin (rico-lin) on 2018-05-07

Changed in heat:
milestone:	none → no-priority-tag-bugs

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.