REVERT_ALL strategy is not applied for all tasks

Bug #2043808 reported by Anton Kurbatov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
taskflow
In Progress
Undecided
Unassigned

Bug Description

I'm investigating an issue in OpenStack Octavia and found some strange behavior in the taskflow engine.
And it seems that REVERT_ALL is not applied to all successfully completed tasks before.

The tasks hierarchy looks like:

   Hierarchy:
     "linear_flow.Flow: main-flow(len=3)"
     |__"__main__.StartTask==1.0"
     |__"__main__.StartTask2==1.0"
     |__"unordered_flow.Flow: unordered-subflow(len=2)"
        |__"linear_flow.Flow: subflow1(len=1)"
        | |__"subflow1_retry==1.0"
        | |__"subflow1-jobtask==1.0"
        |__"linear_flow.Flow: subflow2(len=1)"
           |__"subflow2_retry==1.0"
           |__"subflow2-jobtask==1.0"

Both StartTask and StartTask2 have a "revert" method, and subflow1/subflow2 are similar and look like this:

job_items = list(range(10))

class JobTask(task.Task):
    def execute(self):
        item = job_items.pop()
        if item == 5:
            raise Exception('JobTask: test error')
        time.sleep(1)
        raise RetryException

class JobRetry(retry.Times):
    def on_failure(self, history, *args, **kwargs):
        last_errors = history[-1][1]
        for task_name, ex_info in last_errors.items():
            excp = ex_info._exc_info[1]
            if isinstance(excp, RetryException):
                return retry.RETRY

        return retry.REVERT_ALL

I can see applying REVERT_ALL strategy in the logs, but neither the StartTask nor StartTask2 revert methods are called.

I have attached a simple script that demonstrates this behavior.

In Octavia I get that the LB creation is just stuck because the code is definitely expecting the entire flow to be reverted but it doesn't.

I'm not sure if this is a taskflow issue or the code can be workedaround somehow.
I may be can say that at least this behavior does not correlate with the docs that say
"This strategy will revert every atom that has executed thus far, regardless of whether the parent flow has a separate retry strategy associated with it"

https://docs.openstack.org/taskflow/ocata/atoms.html

Revision history for this message
Anton Kurbatov (akurbatov) wrote :
Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :

Hi Anton,

I was working on the same issue this week, I proposed a WIP patch with a reproducer in the test and a potential workaround in taskflow: https://review.opendev.org/c/openstack/taskflow/+/900746

Note there's a launchpad for octavia: https://bugs.launchpad.net/octavia/+bug/2043360

We may stop using the retries in Octavia as a workaround if we don't find a good fix in taskflow

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :

Basically, it seems that there's a race condition when using unordered flows, one of the subflow triggers this REVERT_ALL, and just after that, the retry of the other subflow overwrites the REVERT_ALL with a new RETRY.

Changed in taskflow:
status: New → Confirmed
Changed in taskflow:
status: Confirmed → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.