Task stalls in state RUNNING in case of service dies

Bug #1502120 reported by Roman Dobosz
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mistral
Fix Released
Undecided
Nikolay Makhotkin
Mitaka
Fix Released
High
Nikolay Makhotkin

Bug Description

I've tested Mistral (two Mistral services on two controllers) on devstack, to check what happens if executor dies during executing an action, just to check if there is an HA for tasks, and discovered that the task is stuck in RUNNING state.

Steps to reproduce:

1. Create a workflow, that will run for some time (in my case I've created custom action that run for 20 seconds)
2. Execute the workflow and check which service is executing it
3. Kill that service
4. Issue "mistral execution-list" command and observe that task is in RUNNING state. Forever.

Revision history for this message
Renat Akhmerov (rakhmerov) wrote :

Roman,

Thanks for letting us know. We've been aware of this issue since the very beginning when we started using oslo.messaging. Since oslo doesn't support post-processing message acknowledgement yet we have this gap. And that's the reason why have a number of patches with "RPC" in their commit messages on review. Currently we're working with oslo team to fix this problem. Here's a WIP that addresses that problem: https://review.openstack.org/#/c/229186/

Changed in mistral:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to mistral (master)

Reviewed: https://review.openstack.org/279018
Committed: https://git.openstack.org/cgit/openstack/mistral/commit/?id=b562989e78d8833cfcf15900da63e21d216b96d6
Submitter: Jenkins
Branch: master

commit b562989e78d8833cfcf15900da63e21d216b96d6
Author: Nikolay Mahotkin <email address hidden>
Date: Thu Feb 11 15:26:38 2016 +0300

    Ack message after processing (oslo.messaging)

    This patch fixes the HA gap when executor dies.
    Now if executor dies, next executor pick up previous executor's task.

     * Currently it is almost impossible to write unit-test on
     this bug. For now, I created a new config option for whether
     to use this fix or to use original oslo.messaging. By default
     it is False.

     * For tests, need to wait creating of HA-gate.

    Closes-Bug: #1502120

    Change-Id: Ia6d25d039b1e8210b7e544540e4b527d28f6d394

Changed in mistral:
status: In Progress → Fix Released
Revision history for this message
Thierry Carrez (ttx) wrote : Fix included in openstack/mistral 2.0.0.0rc1

This issue was fixed in the openstack/mistral 2.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.