Improve workflow completion logic by removing periodic jobs
* Workflow completion algorithm use periodic scheduled jobs to
poll DB and determine when a workflow is finished. The problem
with this approach is that if Mistral runs another iteration
of such job too soon then running such jobs will create a big
load on the system. If too late, then a workflow may be in
RUNNING state for too long after all its tasks are completed.
The current implementation tries to predict a delay with which
the next job should run, based on a number of incompleted tasks.
This approach was initially taken because we switched to a
non-blocking transactional model (previously we locked the entire
workflow execution graph in order to change a state of anything)
and in this architecture, when we have parallel branches, i.e.
parallel DB transactions, we can't make a consistent read from
DB from neither of these transactions to make a reliable decision
about whether the workflow is completed or not. Using periodic
jobs was a solution. However, this approach has been proven to
work unreliably because such a prediction about delay before the
next job iteration doesn't work well on all variety of use cases
that we have.
This patch removes using periodic jobs in favor of using the
"two transactions" approach when in the first transaction we
handle action completion event (and task completion if it causes
it) and in the second transaction, if a task is completed, we
check if the workflow is completed. This approach guarantees
that at least one of the "second" transactions in parallel
branches will make needed consistent read from DB (i.e. will
see the actuall state of all needed objects) to make the right
decision.
Reviewed: https:/ /review. openstack. org/607807 /git.openstack. org/cgit/ openstack/ mistral/ commit/ ?id=3d7acd3957a 75457da4ca87ae9 ebd5cc61d28149
Committed: https:/
Submitter: Zuul
Branch: master
commit 3d7acd3957a7545 7da4ca87ae9ebd5 cc61d28149
Author: Renat Akhmerov <email address hidden>
Date: Thu Oct 4 11:50:03 2018 +0700
Improve workflow completion logic by removing periodic jobs
* Workflow completion algorithm use periodic scheduled jobs to
poll DB and determine when a workflow is finished. The problem
with this approach is that if Mistral runs another iteration
of such job too soon then running such jobs will create a big
load on the system. If too late, then a workflow may be in
RUNNING state for too long after all its tasks are completed.
The current implementation tries to predict a delay with which
the next job should run, based on a number of incompleted tasks.
This approach was initially taken because we switched to a
non-blocking transactional model (previously we locked the entire
workflow execution graph in order to change a state of anything)
and in this architecture, when we have parallel branches, i.e.
parallel DB transactions, we can't make a consistent read from
DB from neither of these transactions to make a reliable decision
about whether the workflow is completed or not. Using periodic
jobs was a solution. However, this approach has been proven to
work unreliably because such a prediction about delay before the
next job iteration doesn't work well on all variety of use cases
that we have.
This patch removes using periodic jobs in favor of using the
"two transactions" approach when in the first transaction we
handle action completion event (and task completion if it causes
it) and in the second transaction, if a task is completed, we
check if the workflow is completed. This approach guarantees
that at least one of the "second" transactions in parallel
branches will make needed consistent read from DB (i.e. will
see the actuall state of all needed objects) to make the right
decision.
Closes-Bug: #1799382 226c184beb0bd78 3e1dcfa397f
Change-Id: I2333507503b3b8