Convergence: cancel update doesn't immediately cancel the operation

Bug #1533176 reported by Anant Patil
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Heat
Triaged
Wishlist
Unassigned

Bug Description

With convergence engine, the cancel stack-update requests takes time to finish. The resources already in progress are not interrupted and the threads keep running until either the resource completes or fails. If a user issues stack update, the provisioning of updated resource will have wait for the old resource in progress to complete or the stack to timeout. This could be affect the usability of heat upto some extent, since the user has to now wait and cannot cancel the running threads from old requests.

Anant Patil (ananta)
Changed in heat:
assignee: nobody → Anant Patil (ananta)
Revision history for this message
Anant Patil (ananta) wrote :

I am planning to fix this by making the check_resource from worker as a scheduler.wrappertask. The check_resource will also register an event before starting and after every step it will check the state of event. If the event is ready with a cancel message, the check_resource should stop. This is similar to how the existing stack-cancel-update works.

When the stack-cancel-update request is received, the cancel message is broad-casted to each engine worker and the registered events are sent cancel message. The next step() of check_resource should see this event message and bail out.

Revision history for this message
Anant Patil (ananta) wrote :

Effectively, all the check_resource running in all engine workers for a stack should stop when they receive a cancel message from RPC. This will also depend on whether a broadcast message can be sent using oslo.messaging.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/279406

Anant Patil (ananta)
Changed in heat:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on heat (master)

Change abandoned by Anant Patil (<email address hidden>) on branch: master
Review: https://review.openstack.org/279406
Reason: It is considered to be a bad idea to broadcast messages to all the heat engine to cancel workers. It would be better to have a thread polling the DB to know if cancel was requested (by updating the current traversal) instead of broadcasting. Will upload another patch to address the bug.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/301483

Changed in heat:
assignee: Anant Patil (ananta) → Rakesh H S (rh-s)
Changed in heat:
assignee: Rakesh H S (rh-s) → Anant Patil (ananta)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/279406
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=459086f984140210d6350136bd1c4eb5e44c96b1
Submitter: Jenkins
Branch: master

commit 459086f984140210d6350136bd1c4eb5e44c96b1
Author: Anant Patil <email address hidden>
Date: Fri Feb 12 13:16:27 2016 +0530

    Convergence: Cancel message

    Implements a cancel message sending mechanism.

    A cancel message is sent to heat engines working on the stack.

    Change-Id: I3b529addbd02a79364f7f2a041fc87d5019dd5d9
    Patial-Bug: #1533176

Zane Bitter (zaneb)
Changed in heat:
assignee: Anant Patil (ananta) → nobody
status: In Progress → New
importance: Undecided → Wishlist
Revision history for this message
Zane Bitter (zaneb) wrote :

If we stop threads in the middle of a resource update then the resource will be left in an undeterminable state. (That's true regardless of whether we kill the thread at an arbitrary time or send it a message that it checks for between steps of a heat.engine.scheduler co-routine - although the latter is much better.) This isn't usually what we want because the resource goes into a FAILED state and will get replaced later.

The exception is nested stacks - we should start the rollback of any of those that are IN_PROGRESS immediately, rather than waiting for them to complete and then rolling them back, as we currently do.

What would also be nice is to have a command to allow the user to cancel a particular resource. We discussed re-using the mark-unhealthy command for that purpose in this thread: http://lists.openstack.org/pipermail/openstack-dev/2016-August/102016.html

Changed in heat:
status: New → Triaged
Rico Lin (rico-lin)
Changed in heat:
milestone: none → no-priority-tag-bugs
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers