check update complete fail with magnum cluster

Bug #1702433 reported by Rico Lin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Triaged
High
Unassigned

Bug Description

While checking update complete in magnum cluster resource we might get Update failed by error

Resource UPDATE failed: ResourceUnknownStatus: resources.coe_cluster: Resource failed - Unknown status CREATE_COMPLETE due to "Unknown status updating Cluster 'coe_cluster' - Stack CREATE completed successfully"

appears the update check go through APIs before cluster status got changed.
This will cause the stack stay in update failed but the real cluster in magnum actually will be in status `update complete`(which will still operatable). And when we try to do another update on that cluster in heat. will cause an update replace. which will generate another Cluster(but highly chances to fail, since cluster is a big resource group which will take double numbers from the quota)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to heat (master)

Fix proposed to branch: master
Review: https://review.openstack.org/480450

Changed in heat:
status: Triaged → In Progress
Rico Lin (rico-lin)
tags: added: race-condition
Rico Lin (rico-lin)
Changed in heat:
milestone: pike-3 → pike-rc1
Revision history for this message
Zane Bitter (zaneb) wrote :

It can also exit early if it sees UPDATE_COMPLETE from a previous update. We need to check the timestamps as we do for nested stacks: http://git.openstack.org/cgit/openstack/heat/tree/heat/engine/resources/stack_resource.py?h=stable%2Focata#n384

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to heat (master)

Reviewed: https://review.openstack.org/480450
Committed: https://git.openstack.org/cgit/openstack/heat/commit/?id=601c5ba1e07c920bf382c1e0c10e9169b35c95b1
Submitter: Jenkins
Branch: master

commit 601c5ba1e07c920bf382c1e0c10e9169b35c95b1
Author: ricolin <email address hidden>
Date: Wed Jul 5 16:47:16 2017 +0800

    Allow CREATE_COMPLETE status when cluster/bay update check

    This patch will prevent when check update complete request got earlier
    replied than the status change to update in progress in magnum.
    This will cause the cluster failed in stack but completed in magnum.
    The stack will not be able to take that cluster/bay back at this point.
    Partial-Bug: #1702433

    Change-Id: I0074b76e6ec925f80bb50a7e67a81cda9438941b

Rabi Mishra (rabi)
Changed in heat:
milestone: pike-rc1 → pike-rc2
Rico Lin (rico-lin)
Changed in heat:
milestone: pike-rc2 → queens-1
Rico Lin (rico-lin)
Changed in heat:
milestone: queens-1 → queens-2
Rico Lin (rico-lin)
Changed in heat:
milestone: queens-2 → queens-3
Rico Lin (rico-lin)
Changed in heat:
milestone: queens-3 → queens-rc1
Revision history for this message
Rabi Mishra (rabi) wrote :

@Rico, What's in-progress for this bug?

Revision history for this message
Zane Bitter (zaneb) wrote :

A partial fix was merged. I'm bumping this to Rocky, because it doesn't look like a total fix is feasible in the Queens timeframe.

Changed in heat:
milestone: queens-rc1 → rocky-1
assignee: Rico Lin (rico-lin) → nobody
status: In Progress → Triaged
Revision history for this message
Rico Lin (rico-lin) wrote :

Agree, will target this in Rocky

Rico Lin (rico-lin)
Changed in heat:
milestone: rocky-1 → rocky-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on heat (master)

Change abandoned by Rico Lin (<email address hidden>) on branch: master
Review: https://review.openstack.org/490810
Reason: After discussion in previous PTG, we already provide other way to do this

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.