Comment 2 for bug 1160052

Revision history for this message
Clint Byrum (clint-fewbar) wrote : Re: [Bug 1160052] Re: Need a way to retry creation

Excerpts from Steven Hardy's message of 2013-03-26 16:44:59 UTC:
> I can understand why this could be useful, but have the following
> concerns:
>
> - Seems like a corner case, in which case the current behaviour is fine?
>

Corner cases are where you are pushing Heat to do something rare. I
don't think mirrors containing packages being slow for a brief period
of time is all that rare. That situation would break a stack which
has a WaitCondition at the end. So I reject the notion that this is a
corner case. Things fail, and that should not cause a whole stack to
be invalidated.

> - When a resource is in CREATE_FAILED state, the state is unknown, so
> the only thing we can do is delete it and re-create it (unless we add
> logic to all resource handle_update which figures out if the failure is
> recoverable, which seems potentially complex). This is equivalent to
> mapping resource CREATE_FAILED state to UPDATE_REPLACE in
> parser.Stack::update(), so we'd need to see if that will work with
> rollback.
>

Am fine with deleting the failed resource. Not the failed stack
though. The failure is, in theory, isolated to those resources that
failed to create, so delete those, and try again from there.

We would have to think through the problem though, as the WaitCondition
that fails is really not the problem.. the problem is further up the
stack. This needs further thought, but I think there is an answer that
isn't "start over from 0".

> - I see the argument for not deleting everything, and I guess it may be
> fairly simple with our current serialized resource creation strategy,
> but what happens when we move to parallel resource creation, is being
> able to re-start stack creation from a partially created state going to
> make things much more difficult?
>

I don't think it makes things difficult at all. We will simply be running
through the same exact graph, but the create and active steps will be
instant because the desired state is already reached. When we get to
a resource that is missing, we carry on. We have to do that in updates
anyway since resources may be added as part of the update.