Add a timeout in tripleoclient when waiting for Zaqar messages

Bug #1618445 reported by Dougal Matthews on 2016-08-30
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
High
Juan Antonio Osorio Robles

Bug Description

In tripleoclient after a workflow is started we wait for messages on the Zaqar queue. This is fine unless there is a problem in Zaqar and we never receive the messages.

We need to add a timeout, that can be specified by the caller (as a different sensible default will be required for different workflows). This should be added here: https://github.com/openstack/python-tripleoclient/blob/master/tripleoclient/plugin.py#L125-L126

If we do reach a timeout, we should check the status of the workflow in Mistral and report that to the user - in theory, if we hit the timeout it means that the workflow has hit and error and a message has never been sent. (However, it could also be possible that a workflow has finished successfully but doesn't send a message due to an erorr in the workflow.)

Brad P. Crochet (brad-9) wrote :

When I saw this, it appeared that the workflow was successful but the notification failed. It probably depends on where in the workflow it failed, considering that some workflows use zaqar as a gate.

Brad P. Crochet (brad-9) wrote :

This was the execution failure: http://paste.openstack.org/show/564821/

Brad P. Crochet (brad-9) wrote :

I take that back. It looks like the error was a duplicate plan name. So, hopefully, that's an easy way to reproduce this.

Brad P. Crochet (brad-9) wrote :

I tried that again, and it just worked. It looks like zaqar would just need to be either down or in an undefined state.

Fix proposed to branch: master
Review: https://review.openstack.org/364252

Changed in tripleo:
status: Confirmed → In Progress
Changed in tripleo:
assignee: Dougal Matthews (d0ugal) → Julie Pichon (jpichon)
Julie Pichon (jpichon) on 2016-09-12
Changed in tripleo:
assignee: Julie Pichon (jpichon) → Dougal Matthews (d0ugal)
Changed in tripleo:
milestone: newton-rc1 → newton-rc2
Changed in tripleo:
milestone: newton-rc2 → ocata-1
tags: added: newton-backport-potential
Changed in tripleo:
milestone: ocata-1 → newton-rc3
Emilien Macchi (emilienm) wrote :

It sounds like a feature more like a bug. Moving it to Ocata 1.

Changed in tripleo:
milestone: newton-rc3 → ocata-1
Dougal Matthews (d0ugal) on 2016-11-03
tags: removed: newton-backport-potential
Changed in tripleo:
assignee: Dougal Matthews (d0ugal) → nobody
Steven Hardy (shardy) on 2016-11-14
Changed in tripleo:
milestone: ocata-1 → ocata-2
Changed in tripleo:
assignee: nobody → Dougal Matthews (d0ugal)
Changed in tripleo:
assignee: Dougal Matthews (d0ugal) → Juan Antonio Osorio Robles (juan-osorio-robles)

Reviewed: https://review.openstack.org/364252
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=579d1b1318d86c96db6a4363b2b5753fc114dc91
Submitter: Jenkins
Branch: master

commit 579d1b1318d86c96db6a4363b2b5753fc114dc91
Author: Dougal Matthews <email address hidden>
Date: Wed Aug 31 09:33:07 2016 +0000

    Add an optional timeout when waiting for websocket messages

    This patch adds a mechanism for setting a timeout when waiting for websocket
    messages. It then adds it to workflow executions which are fairly predictable.
    This means that they always take roughly the same length of time. Other
    workflows like baremetal introspection can be much slower or quicker
    depending on the the users environment.

    Closes-Bug: #1618445
    Change-Id: I656735d58b1b676148e6ceacfc9861b3c5f44e5d

Changed in tripleo:
status: In Progress → Fix Released

This issue was fixed in the openstack/python-tripleoclient 5.6.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers