Delete a creating stack, some physical resource may not be deleted

Bug #1536451 reported by Yingzhe Zeng
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Medium
Steve Baker

Bug Description

In an environment with heat and cinder, and heat only has one engine service.
1. create a stack, which contains a number of volumes.
2. while the stack is in creating progress, start to delete the stack.

After the stack is deleted, use cinder list command to check the physical volume resources.
There might be some volume left behind in cinder, not deleted as expected.

Revision history for this message
huangtianhua (huangtianhua) wrote :

See the bug #1328983, heat will sleep 0.2s before stopping threads for delete , maybe 0.2s is not enough?

Revision history for this message
Zane Bitter (zaneb) wrote :

Discussed on the mailing list here: http://lists.openstack.org/pipermail/openstack-dev/2016-January/084467.html

Right now upon delete of an IN_PROGRESS stack we cancel the greenthread that is currently working on it using thread.cancel(). This means that the thread can stop at any random point (at least, any point that eventlet does context switches), including the point after we have created a volume but before we have stored its id in the database.

The solution should probably be to build the ability to cancel a thread by raising an exception only at the point where we are sleeping between scheduler tasks. This will mean that anything between explicit yields (e.g. the handle_create() method) becomes atomic, and this kind of problem is averted.

We have something similar to this already in the form of the ForcedCancel exception. We need to modify this to give us the ability to cancel threads this way.

There would need to be some sort of timeout after which if the thread hasn't exited by itself, we kill it anyway.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/273253

Changed in heat:
assignee: nobody → Zane Bitter (zaneb)
status: New → In Progress
Revision history for this message
Zane Bitter (zaneb) wrote :

Whoops, wrong bug number in the commit message.

Changed in heat:
assignee: Zane Bitter (zaneb) → nobody
status: In Progress → Triaged
importance: Undecided → Medium
Revision history for this message
Yingzhe Zeng (zengyingzhe) wrote :

Hi Zane,
Thanks for you reply.
So, is there a plan to implement the solution you mentioned above? To solve this problem fundamentally.

Changed in heat:
assignee: nobody → Jason Dunsmore (jasondunsmore)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/291931

Changed in heat:
status: Triaged → In Progress
Changed in heat:
milestone: none → newton-rc1
Changed in heat:
assignee: Jason Dunsmore (jasondunsmore) → Steve Baker (steve-stevebaker)
Thomas Herve (therve)
Changed in heat:
milestone: newton-rc1 → ocata-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on heat (master)

Change abandoned by Jason Dunsmore (<email address hidden>) on branch: master
Review: https://review.openstack.org/291931
Reason: Abandoning in favor of https://review.openstack.org/#/c/369827/

Changed in heat:
milestone: ocata-1 → newton-rc2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/369827
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=3000f904080d8dcd841d913dcd2ae658fb526c1a
Submitter: Jenkins
Branch: master

commit 3000f904080d8dcd841d913dcd2ae658fb526c1a
Author: Steve Baker <email address hidden>
Date: Fri Sep 16 03:29:59 2016 +0000

    Legacy delete attempt thread cancel before stop

    The error messages 'Command Out of Sync' are due to the threads being
    stopped in the middle of the database operations. This happens in the
    legacy action when delete is requested during a stack create.

    We have the thread cancel message but that was not being used in this
    case. Thread cancel should provide a more graceful way of ensuring the
    stack is in a FAILED state before the delete is attempted.

    This changes does the following in the delete_stack service method for
    legace engine:
    - if the stack is still locked, send thread cancel message
    - in a subthread wait for the lock to be released, or until a
      timeout based on the 4 minute cancel grace period
    - if the stack is still locked, do a thread stop as before

    Closes-Bug: #1499669
    Closes-Bug: #1546431
    Closes-Bug: #1536451
    Change-Id: I4cd613681f07d295955c4d8a06505d72d83728a0

Changed in heat:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/373518

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (stable/newton)

Reviewed: https://review.openstack.org/373518
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=2dd44db1b9cf4b789d8a083df6f97ae1fb5e22d5
Submitter: Jenkins
Branch: stable/newton

commit 2dd44db1b9cf4b789d8a083df6f97ae1fb5e22d5
Author: Steve Baker <email address hidden>
Date: Fri Sep 16 03:29:59 2016 +0000

    Legacy delete attempt thread cancel before stop

    The error messages 'Command Out of Sync' are due to the threads being
    stopped in the middle of the database operations. This happens in the
    legacy action when delete is requested during a stack create.

    We have the thread cancel message but that was not being used in this
    case. Thread cancel should provide a more graceful way of ensuring the
    stack is in a FAILED state before the delete is attempted.

    This changes does the following in the delete_stack service method for
    legace engine:
    - if the stack is still locked, send thread cancel message
    - in a subthread wait for the lock to be released, or until a
      timeout based on the 4 minute cancel grace period
    - if the stack is still locked, do a thread stop as before

    Closes-Bug: #1499669
    Closes-Bug: #1546431
    Closes-Bug: #1536451
    Change-Id: I4cd613681f07d295955c4d8a06505d72d83728a0
    (cherry picked from commit 3000f904080d8dcd841d913dcd2ae658fb526c1a)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/heat 7.0.0.0rc2

This issue was fixed in the openstack/heat 7.0.0.0rc2 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/heat 7.0.0

This issue was fixed in the openstack/heat 7.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/heat 8.0.0.0b1

This issue was fixed in the openstack/heat 8.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.