Deleting with an in-progress stack update can fail
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Heat |
Fix Released
|
High
|
Pavlo Shchelokovskyy | ||
Juno |
Fix Released
|
High
|
Zane Bitter |
Bug Description
If you try to interrupt a long-running stack-update, it's possible to end up in an undeletable state, as it seems the update is cancelled before persisting the new template, so on delete, we're referring to the old template, which doesn't match the current resource-list output:
I hit this when doing a TripleO stack update (the two templates attached, it was actually a mistake as it would be a destructive update if you tried it on a real overcloud, but it shouldn't break heat), basically I did:
devtest.sh --trash-my-machine
<all OK, overcloud stack launched>
devtest_
(testing https:/
Working on a more minimal reproducer, but here's what info I have atm:
heat stack-delete gives us this in the engine log:
Traceback (most recent call last):
File "/opt/stack/
timer()
File "/opt/stack/
cb(*args, **kw)
File "/opt/stack/
result = function(*args, **kwargs)
File "/opt/stack/
return func(*args, **kwargs)
File "/opt/stack/
return f(*args, **kwargs)
File "/opt/stack/
current_
KeyError: u'NovaCompute1P
but heat resource-list gives us a list which doesn't contain NovaCompute1Pas
$ heat resource-list overcloud
+------
| resource_name | physical_
+------
| NovaCompute0 | 09ac6d1a-
| controller0 | 64ca2b98-
| MysqlClusterUni
| MysqlRootPassword | jlxYxXiN75 | OS::Heat:
| NovaCompute1 | 554971c1-
| RabbitCookie | JRyUtJmBBmNJ5BI
| allNodesConfig | 34f6d9f5-
| PublicVirtualIP | d3c66308-
| ControlVirtualIP | 281bd0cc-
| Compute | 32bbcfb0-
| Controller | ecc3b574-
+------
From this point, you're stuck, as the stack can't be deleted :(
summary: |
- Deleting in-progress stack update can fail + Deleting with an in-progress stack update can fail |
Changed in heat: | |
importance: | Undecided → High |
tags: | added: tripleo |
Changed in heat: | |
status: | New → Triaged |
tags: | removed: tripleo |
tags: | added: tripleo |
Changed in heat: | |
milestone: | none → kilo-2 |
Changed in heat: | |
status: | Fix Committed → Fix Released |
Changed in heat: | |
milestone: | kilo-2 → 2015.1.0 |
tags: | added: juno-backport-potential |
no longer affects: | heat/kilo |
tags: | removed: juno-backport-potential |
I think I have a "minimal" reproducer.
For that I use custom resource plugin that takes forever to update: /github. com/pshchelo/ stackdev/ tree/9525f896cd 93cf3a0b0ae6f13 21245fed7201eba /heat_plugins/ stuck
https:/
and register it with Heat. The I use these templates: /github. com/pshchelo/ stackdev/ tree/9525f896cd 93cf3a0b0ae6f13 21245fed7201eba /templates/ stuck
https:/
create stack with two resources
$ heat stack-create stuck -f stuck2.yaml
update stack updating first resource and deleting the second
$ heat stack-update stuck -f stuck1.yaml
stack is hanging in UPDATE_IN_PROGRESS by design. Try to delete the stack:
$ heat stack-delete stuck
now the stack is stuck in DELETE_IN_PROGRESS, and there is nothing that could be done with it.
heat-engine log has the following traceback:
Traceback (most recent call last): lib/python2. 7/dist- packages/ eventlet/ hubs/hub. py", line 455, in fire_timers lib/python2. 7/dist- packages/ eventlet/ hubs/timer. py", line 58, in __call__ lib/python2. 7/dist- packages/ eventlet/ greenthread. py", line 212, in main heat/heat/ engine/ service. py", line 113, in _start_with_trace lib/python2. 7/dist- packages/ osprofiler/ profiler. py", line 105, in wrapper heat/heat/ engine/ stack.py" , line 972, in delete _delete_ backup_ stack(backup_ stack) heat/heat/ engine/ stack.py" , line 851, in _delete_ backup_ stack
File "/usr/local/
timer()
File "/usr/local/
cb(*args, **kw)
File "/usr/local/
result = function(*args, **kwargs)
File "/opt/stack/
return func(*args, **kwargs)
File "/usr/local/
return f(*args, **kwargs)
File "/opt/stack/
self.
File "/opt/stack/
curr_res = self.resources[key]
KeyError: u'second'