Comment 4 for bug 1502295

Revision history for this message
Andrew Woodward (xarses) wrote :

I don't think this properly addresses the problem. The root issue here is that in a multi-node task if one node fails all nodes in the task are marked as failed. It happens when the task it's self fails too. In the event of a task being run on a production cloud it sets the entire cloud to failed. After this the orchestrator want's to re-run all task on all nodes to resolve it.

This is further compounded by the start of a task removing the pending state, not the completion.

Bottom line, only the node(s) failed in a task should be marked as error, and only the not completed tasks should be identified to run the next time changes are deployed.