OpenStack Heat

resource group fail to rollback when `to be deleted` resources taken quota

Bug #1713900 reported by Rico Lin on 2017-08-30

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Heat	In Progress	High	Rico Lin	OpenStack Heat next

Bug Description

Normally in a cluster of resources, we need enough quota to make sure update will success, but since we didn't have any way to control(or reserve) quota for all resources in the stack. We hit resource limit when update(which is fine because there is not enough quota to complete anyway) and become update failed. The Problem shows up when we ask that update to rollback (for Magnum cluster this situation always true), and it will fail on almost every time for a complex resource group because quota still held by other resources. Example, When we update a cluster from 20 nodes to 50 nodes. We stuck when updating node number 40 because we run out of resources. So we might have around 20 nodes required to roll back with update replace (for Magnum cluster this is always true), and another 20 nodes (number 21-40) needs to be deleted.
But in most cases roll back for first 20 nodes will likely fail since the other 20 nodes still held resource quota.

What we can do is to make the priority and make sure we delete resources before starting to update/create other resources.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-08-30: Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/499020

Changed in heat:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-09-20:

Fix proposed to branch: master
Review: https://review.openstack.org/505811

Changed in heat:
assignee:	Rico Lin (rico-lin) → Zane Bitter (zaneb)

Zane Bitter (zaneb) on 2017-09-20

Changed in heat:
assignee:	Zane Bitter (zaneb) → Rico Lin (rico-lin)

OpenStack Infra (hudson-openstack) on 2017-09-26

Changed in heat:
assignee:	Rico Lin (rico-lin) → Zane Bitter (zaneb)

Zane Bitter (zaneb) on 2017-09-26

Changed in heat:
assignee:	Zane Bitter (zaneb) → Rico Lin (rico-lin)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-10-07:

Fix proposed to branch: master
Review: https://review.openstack.org/510290

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-10-07: Change abandoned on heat (master)

Change abandoned by Rico Lin (<email address hidden>) on branch: master
Review: https://review.openstack.org/510290

Rico Lin (rico-lin) on 2017-10-25

Changed in heat:
milestone:	queens-1 → queens-2

Rico Lin (rico-lin) on 2017-12-06

Changed in heat:
milestone:	queens-2 → queens-3

Rico Lin (rico-lin) on 2018-01-29

Changed in heat:
milestone:	queens-3 → queens-rc1

OpenStack Infra (hudson-openstack) on 2018-02-08

Changed in heat:
assignee:	Rico Lin (rico-lin) → Thomas Herve (therve)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-02-08: Fix merged to heat (master)

Reviewed: https://review.openstack.org/505811
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=09d74ffa3cb55e62318e0cd9eac10a9cb0c1a70a
Submitter: Zuul
Branch: master

commit 09d74ffa3cb55e62318e0cd9eac10a9cb0c1a70a
Author: Zane Bitter <email address hidden>
Date: Wed Sep 20 14:24:46 2017 -0400

Prioritise resource deletion over creation

    Because of quotas, there are times when creating a resource and then
    deleting another resource may fail where doing it in the reverse order
    would work, even though the resources are independent of one another.

    When enqueueing 'check_resource' messages, send those for cleanup nodes
    prior to those for update nodes. This means that all things being equal
    (i.e. no dependency relationship), deletions will be started first. It
    doesn't guarantee success when quotas allow, since only a dependency
    relationship will cause Heat to wait for the deletion to complete before
    starting creation, but it is a risk-free way to give us a better chance of
    succeeding.

Change-Id: I9727d906cd0ad8c4bf9c5e632a47af6d7aad0c72
Partial-Bug: #1713900

Revision history for this message

Zane Bitter (zaneb) wrote on 2018-02-09:

We merged a possible improvement in Queens. Whether to go all the way and alter the dependency graph is something there isn't consensus on. Moving this bug to Rocky.

Changed in heat:
milestone:	queens-rc1 → next
assignee:	Thomas Herve (therve) → Rico Lin (rico-lin)

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.