Local executors don't send action heartbeats

Bug #1852722 reported by Renat Akhmerov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mistral
Fix Released
High
Renat Akhmerov

Bug Description

Local executors never send action heartbeats. This was an architectural issue in the initial solution. It leads to failing long running actions automatically after a configured amount of time (60 mins) no matter if they are being processed normally or not.

Changed in mistral:
assignee: nobody → Renat Akhmerov (rakhmerov)
importance: Undecided → High
status: New → Confirmed
milestone: none → ussuri-1
Changed in mistral:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to mistral (master)

Reviewed: https://review.opendev.org/694023
Committed: https://git.openstack.org/cgit/openstack/mistral/commit/?id=7ec4f26744ac17151adae4c06cd8a17b71f409a7
Submitter: Zuul
Branch: master

commit 7ec4f26744ac17151adae4c06cd8a17b71f409a7
Author: Renat Akhmerov <email address hidden>
Date: Wed Nov 13 16:34:44 2019 +0700

    Make action heartbeats work for all executor types

    * Previously action hearbeats didn't work in case of using local
      executors because the component responsible for sending heartbeats
      was started by the executor RPC server which doesn't make sense to
      initialize for a local executor. This patch refactors the code
      so that now heartbeats get sent for any type of executors. For
      local executors it is also useful because a cluster node that
      runs an engine and a local executor may also crash. With this
      change, remaining cluster nodes will be able to understand that
      the action will never complete and one of them will time it out.
      If all is fine with the node where the local executor is running
      then heartbeats will be sent normally and the action won't time
      out. Before this change, in case of local executors a long running
      action would always time out after a configured amount of time
      (by default, 60 mins) just because local executors never sent
      heartbeats.
    * Made a lot of renamings to clearly see what component is
      responsible for.
    * Wrote the tests that check the heartbeat sender, both positive
      and negative scenarios for local and remote executor types.

    Closes-Bug: #1852722

    Change-Id: I4d0fdff54de9bee70aeaf10a4ef483ad7000840b

Changed in mistral:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to mistral (stable/train)

Reviewed: https://review.opendev.org/694712
Committed: https://git.openstack.org/cgit/openstack/mistral/commit/?id=24254685778c028792793774c0f8072a02e09614
Submitter: Zuul
Branch: stable/train

commit 24254685778c028792793774c0f8072a02e09614
Author: Renat Akhmerov <email address hidden>
Date: Mon Nov 11 11:36:49 2019 +0700

    Refactor action execution reporter

    * Moved away from using Oslo periodic tasks in the action execution
      reporter since in this case they don't make the code more readable.
      Also, now it is symmetric with other similar components like action
      execution checker.
    * Refactored action execution checker w/o using classes since having
      many instances of it doesn't make sense.
    * Small style changes

    Partial-Bug: #1852722
    Change-Id: I9a97c40222e8dc4870c9b6a7c5f5e3c14f37bdd6
    (cherry picked from commit 0e758e16e1abdb2440e28d270457f7329b876708)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to mistral (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/694744

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to mistral (stable/train)

Reviewed: https://review.opendev.org/694744
Committed: https://git.openstack.org/cgit/openstack/mistral/commit/?id=d878d5409f3bd74f6137490d7fac555d4975fee9
Submitter: Zuul
Branch: stable/train

commit d878d5409f3bd74f6137490d7fac555d4975fee9
Author: Renat Akhmerov <email address hidden>
Date: Wed Nov 13 16:34:44 2019 +0700

    Make action heartbeats work for all executor types

    * Previously action hearbeats didn't work in case of using local
      executors because the component responsible for sending heartbeats
      was started by the executor RPC server which doesn't make sense to
      initialize for a local executor. This patch refactors the code
      so that now heartbeats get sent for any type of executors. For
      local executors it is also useful because a cluster node that
      runs an engine and a local executor may also crash. With this
      change, remaining cluster nodes will be able to understand that
      the action will never complete and one of them will time it out.
      If all is fine with the node where the local executor is running
      then heartbeats will be sent normally and the action won't time
      out. Before this change, in case of local executors a long running
      action would always time out after a configured amount of time
      (by default, 60 mins) just because local executors never sent
      heartbeats.
    * Made a lot of renamings to clearly see what component is
      responsible for.
    * Wrote the tests that check the heartbeat sender, both positive
      and negative scenarios for local and remote executor types.

    Closes-Bug: #1852722

    Change-Id: I4d0fdff54de9bee70aeaf10a4ef483ad7000840b
    (cherry picked from commit 7ec4f26744ac17151adae4c06cd8a17b71f409a7)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/mistral 9.0.1

This issue was fixed in the openstack/mistral 9.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/mistral 10.0.0.0b1

This issue was fixed in the openstack/mistral 10.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.