Bug #1536451 “Delete a creating stack, some physical resource ma...” : Bugs : OpenStack Heat

Revision history for this message

huangtianhua (huangtianhua) wrote on 2016-01-26:

#1

See the bug #1328983, heat will sleep 0.2s before stopping threads for delete , maybe 0.2s is not enough?

Revision history for this message

Zane Bitter (zaneb) wrote on 2016-01-27:

#2

Discussed on the mailing list here: http://lists.openstack.org/pipermail/openstack-dev/2016-January/084467.html

Right now upon delete of an IN_PROGRESS stack we cancel the greenthread that is currently working on it using thread.cancel(). This means that the thread can stop at any random point (at least, any point that eventlet does context switches), including the point after we have created a volume but before we have stored its id in the database.

The solution should probably be to build the ability to cancel a thread by raising an exception only at the point where we are sleeping between scheduler tasks. This will mean that anything between explicit yields (e.g. the handle_create() method) becomes atomic, and this kind of problem is averted.

We have something similar to this already in the form of the ForcedCancel exception. We need to modify this to give us the ability to cancel threads this way.

There would need to be some sort of timeout after which if the thread hasn't exited by itself, we kill it anyway.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-01-27: Fix proposed to heat (master)

#3

Fix proposed to branch: master
Review: https://review.openstack.org/273253

Changed in heat:
assignee:	nobody → Zane Bitter (zaneb)
status:	New → In Progress

Revision history for this message

Zane Bitter (zaneb) wrote on 2016-01-27:

#4

Whoops, wrong bug number in the commit message.

Changed in heat:
assignee:	Zane Bitter (zaneb) → nobody
status:	In Progress → Triaged
importance:	Undecided → Medium

Revision history for this message

Yingzhe Zeng (zengyingzhe) wrote on 2016-01-28:

#5

Hi Zane,
Thanks for you reply.
So, is there a plan to implement the solution you mentioned above? To solve this problem fundamentally.

Jason Dunsmore (jasondunsmore) on 2016-03-07

Changed in heat:
assignee:	nobody → Jason Dunsmore (jasondunsmore)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-11:

#6

Fix proposed to branch: master
Review: https://review.openstack.org/291931

Changed in heat:
status:	Triaged → In Progress

Steve Baker (steve-stevebaker) on 2016-09-14

Changed in heat:
milestone:	none → newton-rc1

OpenStack Infra (hudson-openstack) on 2016-09-15

Changed in heat:
assignee:	Jason Dunsmore (jasondunsmore) → Steve Baker (steve-stevebaker)

Thomas Herve (therve) on 2016-09-16

Changed in heat:
milestone:	newton-rc1 → ocata-1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-16: Change abandoned on heat (master)

#7

Change abandoned by Jason Dunsmore (<email address hidden>) on branch: master
Review: https://review.openstack.org/291931
Reason: Abandoning in favor of https://review.openstack.org/#/c/369827/

Steve Baker (steve-stevebaker) on 2016-09-19

Changed in heat:
milestone:	ocata-1 → newton-rc2

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-20: Fix merged to heat (master)

#8

Reviewed: https://review.openstack.org/369827
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=3000f904080d8dcd841d913dcd2ae658fb526c1a
Submitter: Jenkins
Branch: master

commit 3000f904080d8dcd841d913dcd2ae658fb526c1a
Author: Steve Baker <email address hidden>
Date: Fri Sep 16 03:29:59 2016 +0000

Legacy delete attempt thread cancel before stop

    The error messages 'Command Out of Sync' are due to the threads being
    stopped in the middle of the database operations. This happens in the
    legacy action when delete is requested during a stack create.

    We have the thread cancel message but that was not being used in this
    case. Thread cancel should provide a more graceful way of ensuring the
    stack is in a FAILED state before the delete is attempted.

    This changes does the following in the delete_stack service method for
    legace engine:
    - if the stack is still locked, send thread cancel message
    - in a subthread wait for the lock to be released, or until a
      timeout based on the 4 minute cancel grace period
    - if the stack is still locked, do a thread stop as before

    Closes-Bug: #1499669
    Closes-Bug: #1546431
    Closes-Bug: #1536451
    Change-Id: I4cd613681f07d295955c4d8a06505d72d83728a0

Changed in heat:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-20: Fix proposed to heat (stable/newton)

#9

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/373518

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-21: Fix merged to heat (stable/newton)

#10

Reviewed: https://review.openstack.org/373518
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=2dd44db1b9cf4b789d8a083df6f97ae1fb5e22d5
Submitter: Jenkins
Branch: stable/newton

commit 2dd44db1b9cf4b789d8a083df6f97ae1fb5e22d5
Author: Steve Baker <email address hidden>
Date: Fri Sep 16 03:29:59 2016 +0000

Legacy delete attempt thread cancel before stop

    The error messages 'Command Out of Sync' are due to the threads being
    stopped in the middle of the database operations. This happens in the
    legacy action when delete is requested during a stack create.

    We have the thread cancel message but that was not being used in this
    case. Thread cancel should provide a more graceful way of ensuring the
    stack is in a FAILED state before the delete is attempted.

    This changes does the following in the delete_stack service method for
    legace engine:
    - if the stack is still locked, send thread cancel message
    - in a subthread wait for the lock to be released, or until a
      timeout based on the 4 minute cancel grace period
    - if the stack is still locked, do a thread stop as before

    Closes-Bug: #1499669
    Closes-Bug: #1546431
    Closes-Bug: #1536451
    Change-Id: I4cd613681f07d295955c4d8a06505d72d83728a0
    (cherry picked from commit 3000f904080d8dcd841d913dcd2ae658fb526c1a)

tags:

added: in-stable-newton

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-09-27: Fix included in openstack/heat 7.0.0.0rc2

#11

This issue was fixed in the openstack/heat 7.0.0.0rc2 release candidate.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-10: Fix included in openstack/heat 7.0.0

#12

This issue was fixed in the openstack/heat 7.0.0 release.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-17: Fix included in openstack/heat 8.0.0.0b1

#13

This issue was fixed in the openstack/heat 8.0.0.0b1 development milestone.

OpenStack Heat

Delete a creating stack, some physical resource may not be deleted

Bug Description

Other bug subscribers

Remote bug watches