Comment 1 for bug 1497788

Revision history for this message
William Reade (fwereade) wrote :

As discussed live, I have a few concerns about this. The most important is that a hook error does *not* imply a workload error -- a properly-written charm will be setting workload statuses as they apply, and the rude interruption of a hook should make no difference to the *status of the workload*.

For example, if a hook is doing some difficult work and has taken down the workload to do so, it will have set "maintenance" -- and this remains literally and strictly true, it's just that the maintenance state is arbitrarily extended until someone does something about the agent (which is currently unable to progress).

Conversely, if the workload is running happily but the hook suddenly dies, the workload is still going to be running just as well as it was before; we've encountered a *management* problem, because (again) the agent is unable to progress. I admit that the workload will be at some, gradually increasing, risk of losing synchronisation with its environment; but we do clearly surface the age of the status, so I don't think it's that significant.

Either way, the user-supplied workload status is the most accurate description of the, uh, status of the workload; and the error status accurately describes the status of the agent; so, we should not overwrite workload statuses with "error" on hook failure.

Separately, there's the issue of us *pretending* that the workload has encountered an error when it's really the agent. I don't personally believe this is optimal, but it is as specced; what bothers me is that the pretending has already leaked too far into the model. I think your first approach was the right one -- store the truth in the model, and massage it on the way out of the apiserver -- because if/when we decide we should expose untweaked status data we'll be able to address that with a new facade version.