Periodically able to cause WBE test to timeout

Bug #1431097 reported by Joshua Harlow on 2015-03-12
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
taskflow
Low
Unassigned

Bug Description

I am seeing a periodic (not that frequent) test timeout for the WBE tests;

The output looks similar to the following

taskflow.exceptions.RequestTimeout: Request '<REQUEST> {'task_cls': 'taskflow.tests.utils.ProgressingTask', 'arguments': {}, 'action': 'execute', 'task_name': 'task2', 'task_version': (1, 0)}' has expired after waiting for 60.01 seconds for it to transition out of (WAITING, PENDING) states

An example @ http://logs.openstack.org/59/163159/9/check/gate-taskflow-python26/2e20f09/console.html

Joshua Harlow (harlowja) on 2015-03-12
Changed in taskflow:
importance: Undecided → Low
Joshua Harlow (harlowja) wrote :

I'm pretty sure this is just jenkins slaves being slow or unresponsive as its in-frequent...

Maybe noisy neighbor issues that are starving the test processes? Or flaky something or other?

Reviewed: https://review.openstack.org/276536
Committed: https://git.openstack.org/cgit/openstack/taskflow/commit/?id=cea71f27998cfa911044103fcb8fca79b6989717
Submitter: Jenkins
Branch: master

commit cea71f27998cfa911044103fcb8fca79b6989717
Author: Joshua Harlow <email address hidden>
Date: Thu Feb 4 18:09:24 2016 -0800

    Fix for WBE sporadic timeout of tasks

    This fixes the sporadic of tasks that would happen
    under certain circumstances. What happened was that
    a new worker notification would be sent to a callback
    while at the same time a task submission would come in
    and there would be a small race period where the task
    would insert itself into the requests cache while the
    callback was processing.

    So to work around this the whole concept of a requests
    cache was revamped and now the WBE executor just maintains
    its own local dictionary of ongoing requests and accesses
    it safely.

    During the on_wait function that is periodically called
    by kombu the previous expiry of work happens but now any
    requests that are pending are matched to any new workers
    that may have appeared.

    This avoids the race (and ensures that even if a new
    worker is found but a submission is in progress that the
    duration until that submission happens will only be until
    the next on_wait call happens).

    Related-Bug: #1431097

    Change-Id: I98b0caeedc77ab2f7214847763ae1eb0433d4a78

Ben Nemec (bnemec) on 2018-03-23
Changed in taskflow:
status: New → In Progress
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers