Remove DB polling to determine "join" readiness

Bug #1799356 reported by Renat Akhmerov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mistral
Fix Released
High
Renat Akhmerov

Bug Description

Now Mistral does DB polling in order to determine "join" task readiness. This approach doesn't work in case if a workflow has lots of joins with many dependencies because it leads to the situation when CPU is mostly occupied by scheduler that runs periodic checking jobs again and again and doesn't let the workflow progress.

Changed in mistral:
assignee: nobody → Renat Akhmerov (rakhmerov)
importance: Undecided → High
status: New → Confirmed
milestone: none → stein-1
Changed in mistral:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to mistral (master)

Reviewed: https://review.openstack.org/610461
Committed: https://git.openstack.org/cgit/openstack/mistral/commit/?id=1a4c599a4d1ab9522d47a017ce1b98eac8b290df
Submitter: Zuul
Branch: master

commit 1a4c599a4d1ab9522d47a017ce1b98eac8b290df
Author: Renat Akhmerov <email address hidden>
Date: Wed Oct 10 14:37:08 2018 +0700

    Improve join by removing periodic jobs

    * This patch removes the approach with DB polling needed to
      determine if a "join" task is ready to run. Instead of running
      a periodic scheduled job, each task completion now runs the
      algorithm that finds all potentially affected join tasks
      and schedules just one job (instead of a periodic job) to check
      their readiness.
      This solves a problem of system cascaded overloading in case of
      having many very large joins (when a workflow has many joins with
      many dependencies each). Previously, in such case Mistral created
      too many periodic jobs that just didn't let the workflow progress
      well, i.e. most CPU was used by scheduler to run those periodic
      jobs that very rarely switched "join" tasks to the RUNNING state.

    Change-Id: I5ebc44c7a3f95c868d653689dc5cea689c788cd0
    Closes-Bug: #1799356

Changed in mistral:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to mistral (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/612982

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to mistral (stable/rocky)

Reviewed: https://review.openstack.org/612982
Committed: https://git.openstack.org/cgit/openstack/mistral/commit/?id=8306fa17eb7ae7e4a0f33f5f362be46daff4f0a9
Submitter: Zuul
Branch: stable/rocky

commit 8306fa17eb7ae7e4a0f33f5f362be46daff4f0a9
Author: Renat Akhmerov <email address hidden>
Date: Wed Oct 10 14:37:08 2018 +0700

    Improve join by removing periodic jobs

    * This patch removes the approach with DB polling needed to
      determine if a "join" task is ready to run. Instead of running
      a periodic scheduled job, each task completion now runs the
      algorithm that finds all potentially affected join tasks
      and schedules just one job (instead of a periodic job) to check
      their readiness.
      This solves a problem of system cascaded overloading in case of
      having many very large joins (when a workflow has many joins with
      many dependencies each). Previously, in such case Mistral created
      too many periodic jobs that just didn't let the workflow progress
      well, i.e. most CPU was used by scheduler to run those periodic
      jobs that very rarely switched "join" tasks to the RUNNING state.

    Change-Id: I5ebc44c7a3f95c868d653689dc5cea689c788cd0
    Closes-Bug: #1799356
    (cherry picked from commit 1a4c599a4d1ab9522d47a017ce1b98eac8b290df)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/mistral 7.0.4

This issue was fixed in the openstack/mistral 7.0.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/mistral 8.0.0.0b1

This issue was fixed in the openstack/mistral 8.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.