Workflow output is not calculated correctly sometimes

Bug #1792090 reported by Renat Akhmerov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mistral
Fix Released
High
Renat Akhmerov

Bug Description

Sometimes, especially under load, workflow output is not calculated correctly.

Workflow example:

---
version: '2.0'

output_expression:
  output:
    continue_flag: <% global(continue_flag) %>

  task-defaults:
    on-error:
      - task2

  tasks:
    task1:
      action: std.fail
      on-success: task3

    task2:
      action: std.noop
      on-success:
        publish:
          global:
            continue_flag: false

    task3:
      action: std.noop

Once in a while "continue_flag" in the workflow output is "null", i.e. the expression "global(continue_flag)" returns null.

Changed in mistral:
assignee: nobody → Renat Akhmerov (rakhmerov)
milestone: none → stein-1
importance: Undecided → High
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to mistral (master)

Fix proposed to branch: master
Review: https://review.openstack.org/601949

Changed in mistral:
status: Confirmed → In Progress
tags: added: backport pike queens rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to mistral (master)

Reviewed: https://review.openstack.org/601949
Committed: https://git.openstack.org/cgit/openstack/mistral/commit/?id=dfdff78315f72999dbd269e5fc4c4065a1b13013
Submitter: Zuul
Branch: master

commit dfdff78315f72999dbd269e5fc4c4065a1b13013
Author: Renat Akhmerov <email address hidden>
Date: Wed Sep 12 15:05:54 2018 +0700

    Fix how Mistral calculates workflow output

    * Workflow output sometimes is not calculated correctly due to
      the race condition between different transactions: the one that
      checks workflow completion (i.e. calls "check_and_complete") and
      the one that processes action execution completion (i.e. calls
      "on_action_complete"). Calculating output sometimes was based on
      stale data cached by the SQLAlchemy session. To fix this, we just
      need to expire all objects in the session so that they are
      refreshed automatically if we read their state in order to make
      required calculations. See the bug description for more details
      on how the problem was observed.
    * Added another test for direct workflow that formally checks
      calculation of workflow output. It doesn't pretend to test the
      aforementioned issue (it can be reproduced only with a big number
      of attempts, and/or under load). It's for the sake of the test
      module completeness.

    Change-Id: I4a7e7fd9a4bbb6e93df169b4b40bc2d83ccfce89
    Closes-Bug: #1792090

Changed in mistral:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to mistral (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/602280

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to mistral (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/602281

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to mistral (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/602282

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on mistral (stable/queens)

Change abandoned by Dougal Matthews (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/602281

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to mistral (stable/pike)

Reviewed: https://review.openstack.org/602282
Committed: https://git.openstack.org/cgit/openstack/mistral/commit/?id=df944dfb6299baef62710ee321f0b1c21c319d1d
Submitter: Zuul
Branch: stable/pike

commit df944dfb6299baef62710ee321f0b1c21c319d1d
Author: Renat Akhmerov <email address hidden>
Date: Wed Sep 12 15:05:54 2018 +0700

    Fix how Mistral calculates workflow output

    * Workflow output sometimes is not calculated correctly due to
      the race condition between different transactions: the one that
      checks workflow completion (i.e. calls "check_and_complete") and
      the one that processes action execution completion (i.e. calls
      "on_action_complete"). Calculating output sometimes was based on
      stale data cached by the SQLAlchemy session. To fix this, we just
      need to expire all objects in the session so that they are
      refreshed automatically if we read their state in order to make
      required calculations. See the bug description for more details
      on how the problem was observed.
    * Added another test for direct workflow that formally checks
      calculation of workflow output. It doesn't pretend to test the
      aforementioned issue (it can be reproduced only with a big number
      of attempts, and/or under load). It's for the sake of the test
      module completeness.

    Change-Id: I4a7e7fd9a4bbb6e93df169b4b40bc2d83ccfce89
    Closes-Bug: #1792090
    (cherry picked from commit dfdff78315f72999dbd269e5fc4c4065a1b13013)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to mistral (stable/queens)

Reviewed: https://review.openstack.org/602281
Committed: https://git.openstack.org/cgit/openstack/mistral/commit/?id=24cb2daea626b66f06b5899ccaeefacd3d68928f
Submitter: Zuul
Branch: stable/queens

commit 24cb2daea626b66f06b5899ccaeefacd3d68928f
Author: Renat Akhmerov <email address hidden>
Date: Wed Sep 12 15:05:54 2018 +0700

    Fix how Mistral calculates workflow output

    * Workflow output sometimes is not calculated correctly due to
      the race condition between different transactions: the one that
      checks workflow completion (i.e. calls "check_and_complete") and
      the one that processes action execution completion (i.e. calls
      "on_action_complete"). Calculating output sometimes was based on
      stale data cached by the SQLAlchemy session. To fix this, we just
      need to expire all objects in the session so that they are
      refreshed automatically if we read their state in order to make
      required calculations. See the bug description for more details
      on how the problem was observed.
    * Added another test for direct workflow that formally checks
      calculation of workflow output. It doesn't pretend to test the
      aforementioned issue (it can be reproduced only with a big number
      of attempts, and/or under load). It's for the sake of the test
      module completeness.

    Change-Id: I4a7e7fd9a4bbb6e93df169b4b40bc2d83ccfce89
    Closes-Bug: #1792090
    (cherry picked from commit dfdff78315f72999dbd269e5fc4c4065a1b13013)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to mistral (stable/rocky)

Reviewed: https://review.openstack.org/602280
Committed: https://git.openstack.org/cgit/openstack/mistral/commit/?id=e50fbfbdd903340402716dc306f5c5354b9060ab
Submitter: Zuul
Branch: stable/rocky

commit e50fbfbdd903340402716dc306f5c5354b9060ab
Author: Renat Akhmerov <email address hidden>
Date: Wed Sep 12 15:05:54 2018 +0700

    Fix how Mistral calculates workflow output

    * Workflow output sometimes is not calculated correctly due to
      the race condition between different transactions: the one that
      checks workflow completion (i.e. calls "check_and_complete") and
      the one that processes action execution completion (i.e. calls
      "on_action_complete"). Calculating output sometimes was based on
      stale data cached by the SQLAlchemy session. To fix this, we just
      need to expire all objects in the session so that they are
      refreshed automatically if we read their state in order to make
      required calculations. See the bug description for more details
      on how the problem was observed.
    * Added another test for direct workflow that formally checks
      calculation of workflow output. It doesn't pretend to test the
      aforementioned issue (it can be reproduced only with a big number
      of attempts, and/or under load). It's for the sake of the test
      module completeness.

    Change-Id: I4a7e7fd9a4bbb6e93df169b4b40bc2d83ccfce89
    Closes-Bug: #1792090
    (cherry picked from commit dfdff78315f72999dbd269e5fc4c4065a1b13013)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/mistral 7.0.2

This issue was fixed in the openstack/mistral 7.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/mistral 6.0.4

This issue was fixed in the openstack/mistral 6.0.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/mistral 5.2.5

This issue was fixed in the openstack/mistral 5.2.5 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/mistral 8.0.0.0b1

This issue was fixed in the openstack/mistral 8.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.