heat update fail when server lose

Bug #1676784 reported by Ryan Chen
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Heat
New
High
huangtianhua

Bug Description

When a server in a stack goes missing, the missing server can't be replaced by stack-update.

Create a stack with OS::Heat::ResourceGroup of N instances

If one of the instances lost(was deleted unexpected), the stack can be updated successfully with the same parameters when it was created. The status of the stack was 'UPDATE_COMPLETE'

If update the stack with some different parameters(my test case is: only changed the flavor), the remain instances can be updated successfully. But status of the stack was 'UPDATE_FAILED'

Here are my operation records and template file:
https://gist.github.com/yiheqilin/0984b448c023fecc5735469763a7db03

the openstack release is: newton

heat version:
# heat-manage --version
2017-03-28 09:05:03.412 27500 WARNING oslo_config.cfg [-] Option "verbose" from group "DEFAULT" is deprecated for removal. Its value may be silently ignored in the future.
7.0.3

the convergence_engine has been set to True

convergence_engine = True

Revision history for this message
Ryan Chen (yiheqilin) wrote :

the openstack release is: newton

heat version:
# heat-manage --version
2017-03-28 09:05:03.412 27500 WARNING oslo_config.cfg [-] Option "verbose" from group "DEFAULT" is deprecated for removal. Its value may be silently ignored in the future.
7.0.3

the convergence_engine has been set to True

convergence_engine = True

description: updated
Changed in heat:
assignee: nobody → huangtianhua (huangtianhua)
Revision history for this message
huangtianhua (huangtianhua) wrote :

No matter if enable convergence, we also have this problem in legacy.
The error instance not found raised when resize, so let's think of a way to fix this case.

Changed in heat:
importance: Undecided → High
Revision history for this message
huangtianhua (huangtianhua) wrote :

@Ryan Chen (yiheqilin):
hi, there is an option 'observe_on_update', I think the problem wont happen if we set the option observe_on_update=True.

Revision history for this message
Zane Bitter (zaneb) wrote :

You can do "stack resource mark unhealthy" to tell Heat that the resource needs to be removed, or you can do "stack check" to get Heat to check for this itself. Also, even if you hit this problem a subsequent update will succeed (i.e. the FAILED resource will be replaced).

Rico Lin (rico-lin)
Changed in heat:
milestone: none → no-priority-tag-bugs
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.