stack and baremetal workbooks are miss-using retry and continue-on
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
High
|
Dmitry Tantsur |
Bug Description
Using this code from the baremetal.yaml file to demonstrate the issue. At the time of reporting I count 6 places this wont work as expected.
wait_
action: ironic.node_get node_id=<% $.node_uuid %>
timeout: 1200 #20 minutes
retry:
delay: 3
count: 400
We are using the retry Mistral policy to wait for an node to reach a target state. However, this is actually a misuse and it wont work as we expect. "retry" and everything under it (delay, count, continue-on, break-on etc.) will only ever be triggered if the action fails. In this case, we expect ironic.node_get will not fail - we just want to check different properties. Since it never fails, the retry policy is never triggered.
So, unless the API call fails for a different reason, the aciton will only be called once and the task will complete.
The easiest way I can see to get the behaviour we want is with a small utility workflow. Essentially we need to wrap the action and make it error if the behaviour is something we want to trigger retrying. So if the action doesn't reach one of our end states, then we consider it to have failed.
verify_
input:
- node_uuid
- end_states # both the target state and the error states. Then continue-on and break-on can decide what to do
tasks:
action: ironic.node_get node_id=<% $.node_uuid %>
- fail: <% task().
So the above example would become something like this...
wait_
workflow: verify_
timeout: 1200 #20 minutes
retry:
delay: 3
count: 400
break-on: <% task().
note; error_states isn't in master yet, but is based on a wip patch: https:/
Changed in tripleo: | |
assignee: | nobody → Dmitry Tantsur (divius) |
status: | Confirmed → In Progress |
Changed in tripleo: | |
status: | Invalid → In Progress |
Changed in tripleo: | |
status: | In Progress → Invalid |
Changed in tripleo: | |
status: | Invalid → In Progress |
Changed in tripleo: | |
milestone: | rocky-1 → rocky-2 |
Changed in tripleo: | |
status: | In Progress → Fix Released |
Marked as invalid. Turned out I *still* didn't understand this. We need to document this better in Mistral, perhaps with examples.