Need a way to retry failed operations
Bug #1160052 reported by
Clint Byrum
This bug affects 7 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Heat |
Fix Released
|
High
|
Zane Bitter |
Bug Description
Consider this scenario:
* Start create on a large, expensive stack
* FINAL resource in graph, a WaitCondition fails due to a timeout caused by temporary downtime of some external resource
Currently in Heat, you will have to *delete* the entire stack and try create agian.
I believe that update stack should be possible in this scenario, and uncreated resources should retry create on update, rather than refuse to update.
Changed in heat: | |
milestone: | havana-1 → havana-2 |
Changed in heat: | |
milestone: | havana-2 → havana-3 |
Changed in heat: | |
milestone: | havana-3 → havana-rc1 |
importance: | Low → Medium |
Changed in heat: | |
milestone: | havana-rc1 → icehouse-1 |
summary: |
- Need a way to retry creation + Need a way to retry failed operations |
Changed in heat: | |
importance: | Medium → High |
Changed in heat: | |
milestone: | icehouse-1 → icehouse-2 |
Changed in heat: | |
milestone: | icehouse-2 → icehouse-3 |
Changed in heat: | |
milestone: | icehouse-3 → icehouse-rc1 |
Changed in heat: | |
milestone: | icehouse-rc1 → next |
Changed in heat: | |
milestone: | next → juno-1 |
Changed in heat: | |
status: | In Progress → Confirmed |
Changed in heat: | |
status: | Confirmed → In Progress |
information type: | Public → Public Security |
information type: | Public Security → Public |
Changed in heat: | |
milestone: | juno-1 → juno-2 |
Changed in heat: | |
assignee: | Steve Baker (steve-stevebaker) → Jason Dunsmore (jasondunsmore) |
Changed in heat: | |
milestone: | juno-2 → juno-3 |
Changed in heat: | |
assignee: | Jason Dunsmore (jasondunsmore) → Steve Baker (steve-stevebaker) |
Changed in heat: | |
status: | Fix Committed → Fix Released |
Changed in heat: | |
milestone: | juno-3 → 2014.2 |
To post a comment you must log in.
I can understand why this could be useful, but have the following concerns:
- Seems like a corner case, in which case the current behaviour is fine?
- When a resource is in CREATE_FAILED state, the state is unknown, so the only thing we can do is delete it and re-create it (unless we add logic to all resource handle_update which figures out if the failure is recoverable, which seems potentially complex). This is equivalent to mapping resource CREATE_FAILED state to UPDATE_REPLACE in parser. Stack:: update( ), so we'd need to see if that will work with rollback.
- I see the argument for not deleting everything, and I guess it may be fairly simple with our current serialized resource creation strategy, but what happens when we move to parallel resource creation, is being able to re-start stack creation from a partially created state going to make things much more difficult?