Mistral

WF execution stays in RUNNING although task and action executions are in SUCCESS

Bug #1518012 reported by Moshe Elisha on 2015-11-19

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Mistral	Fix Released	High	Moshe Elisha	Mistral 2.0.0 "2.0.0"

Bug Description

Hi,

I have created a workflow and executed it.
The actions executions are all in SUCCESS.
The task executions are all in SUCCESS.
The workflow execution is still in RUNNING.

I see one ERROR line in the logs and it might be related - os-mistral-01 mistral-engine.log:

2015-11-19 19:52:23.654 2034 ERROR oslo_messaging._drivers.impl_qpid [-] Unable to connect to AMQP server on {'username': 'guest', 'host': '10.1.20.151:5671', 'password': 'guest'}: [Errno 8] _ssl.c:504: EOF occurred in violation of protocol. Sleeping 1 seconds
2015-11-19 19:52:23.655 2034 DEBUG oslo_messaging._drivers.amqp [-] Pool creating new connection create /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqp.py:103
2015-11-19 19:52:23.668 2034 INFO oslo_messaging._drivers.impl_qpid [-] Connected to AMQP server on 10.1.20.151:5671
2015-11-19 19:52:23.675 2034 DEBUG oslo_messaging._drivers.amqpdriver [-] sending reply msg_id: 11bb94319e374fd792d2fbcdf698e39d reply queue: reply_852b887d8ad54eb39f92b7cc9a669bbb _send_reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:79
2015-11-19 19:52:24.673 2034 INFO oslo_messaging._drivers.impl_qpid [-] Connected to AMQP server on 10.1.20.151:5671
2015-11-19 19:52:24.681 2034 DEBUG oslo_messaging._drivers.amqpdriver [-] sending reply msg_id: 125489f3fc3b498d9c828f0d8fa4db36 reply queue: reply_47481d93ae9d46be91133c8632d783d4 _send_reply /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:79

In the attached ZIP you will find:
* The workflow used
* The cmd output showing the scenario
* The Mistral logs (from all three servers)

Tags:

Revision history for this message

Moshe Elisha (melisha) wrote on 2015-11-19:

Workflow used and the logs Edit (62.8 KiB, application/zip)

Revision history for this message

Moshe Elisha (melisha) wrote on 2015-11-19:

I am able to reproduce this bug. In other occurrences of the same failed scenario I don't see the "ERROR oslo_messaging._drivers.impl_qpid" - so that ERROR might not be related to the bug

Revision history for this message

Moshe Elisha (melisha) wrote on 2015-11-22:

mistral.log Edit (118.3 KiB, text/plain)

I have added a logging line to default_engine.py:_check_workflow_completion

if wf_utils.find_incomplete_task_executions(wf_ex):
+++ LOG.debug("WF Execution %s has incompleted task executions: %s" % (str(wf_ex), str(wf_ex.task_executions)))
return

I can see that all 4 task executions are printing that statement unaware of the updated status of the other task executions.
Can you confirm? I have attached the relevant log segments.

Revision history for this message

Renat Akhmerov (rakhmerov) wrote on 2015-11-23:

Related to https://bugs.launchpad.net/mistral/+bug/1513456

Nikolay Makhotkin (nmakhotkin) on 2015-11-23

Changed in mistral:
importance:	Undecided → High
status:	New → Triaged
milestone:	none → mitaka-1

Renat Akhmerov (rakhmerov) on 2015-11-26

tags:

added: liberty-backport-potential

Moshe Elisha (melisha) on 2015-12-01

Changed in mistral:
assignee:	nobody → Moshe Elisha (melisha)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-05: Fix proposed to mistral (master)

Fix proposed to branch: master
Review: https://review.openstack.org/253819

Changed in mistral:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-09: Fix merged to mistral (master)

Reviewed: https://review.openstack.org/253819
Committed: https://git.openstack.org/cgit/openstack/mistral/commit/?id=976667c0a22f604559e19135f30dea803e3080f6
Submitter: Jenkins
Branch: master

commit 976667c0a22f604559e19135f30dea803e3080f6
Author: Moshe Elisha <email address hidden>
Date: Sat Dec 5 15:31:24 2015 +0000

Fix concurrency issues by using READ_COMMITTED

    The state of a workflow execution was not updated even when all task
    executions were completed if some tasks finished at the same time as
    other tasks.
    Because we were using our connections with transaction isolation
    level = REPEATABLE_READ - Each process was using a snapshot of the DB
    created at the first read statement in that transaction.
    When a task finished and evaluated the state of all the other tasks
    it did not see the up-to-date state of those tasks - and so, because
    not all tasks were completed - the task did not change the workflow
    execution state.

    Similar behavior happened with multiple action executions under same
    task. On completion, each action execution checked the status of the
    other action executions and did not see the up-to-date state of these
    action execution - causing task execution to stay in RUNNING state.

    Change-Id: I12f66134d92b8ed39df9d6128d7de5ee49aa8623
    Closes-Bug: #1518012
    Closes-Bug: #1513456

Changed in mistral:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-10: Fix proposed to mistral (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/255718

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-12-10: Fix merged to mistral (stable/liberty)

Reviewed: https://review.openstack.org/255718
Committed: https://git.openstack.org/cgit/openstack/mistral/commit/?id=09f206f863396591c8a0b0d1805a72ae8b3966cd
Submitter: Jenkins
Branch: stable/liberty

commit 09f206f863396591c8a0b0d1805a72ae8b3966cd
Author: Moshe Elisha <email address hidden>
Date: Sat Dec 5 15:31:24 2015 +0000

Fix concurrency issues by using READ_COMMITTED

    Change-Id: I12f66134d92b8ed39df9d6128d7de5ee49aa8623
    Closes-Bug: #1518012
    Closes-Bug: #1513456
    (cherry picked from commit 976667c0a22f604559e19135f30dea803e3080f6)

tags:

added: in-stable-liberty

Revision history for this message

Thierry Carrez (ttx) wrote on 2016-01-21: Fix included in openstack/mistral 2.0.0.0b2

This issue was fixed in the openstack/mistral 2.0.0.0b2 development milestone.

Revision history for this message

Doug Hellmann (doug-hellmann) wrote on 2016-02-16: Fix included in openstack/mistral 1.0.1

#10

This issue was fixed in the openstack/mistral 1.0.1 release.

Renat Akhmerov (rakhmerov) on 2016-05-05

Changed in mistral:
milestone:	mitaka-1 → 2.0.0

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-10:

#11

This issue was fixed in the openstack/mistral 1.0.1 release.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.