In python3 test_fixtures.TestOSAPIFixture.test_responds_to_version stalls in epoll()

Bug #1558105 reported by Chris Dent on 2016-03-16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)

Bug Description

When running the py34 unit tests, nova.tests.unit.test_fixtures.TestOSAPIFixture.test_responds_to_version will block in epoll() for up to 900 seconds. This sometimes causes the gate to timeout as the combined time of building the environment, running the tests, and gathering the information can be too much (despite all the tests passing).

The problem appears to be a deadlock/race in eventlet itself when working with the same file from different greenthreads.

see email for more discussion:

a fix is proposed and will be linked here momentarily

Changed in nova:
assignee: nobody → Chris Dent (cdent)
status: New → In Progress
Chris Dent (cdent) wrote :
tags: added: testing

Here's what the threads look like when it's stuck:

Change abandoned by Chris Dent (<email address hidden>) on branch: master
Reason: this fix is considered too weird, better to just block the test in py3, done in: I47e90bb613bfba76bb504a5bd0955206120b5556

Submitter: Jenkins
Branch: master

commit f6b11c59641d55020a5ada25cba6ec4de58431a9
Author: Chris Dent <email address hidden>
Date: Fri Mar 18 12:36:32 2016 +0000

    Blacklist TestOSAPIFixture.test_responds_to_version in python3

    In python3 the test blocks in epoll() for 10-15 minutes. This can
    lead to gate job timeouts (as a result of the cumulative time being
    extended by this one test). The root cause has been tracked to eventlet

    Change-Id: I47e90bb613bfba76bb504a5bd0955206120b5556
    Related-Bug: #1558105

Augustina Ragwitz (auggy) wrote :

Does this fix actually close the bug or just partially close it? If it closes, the tag "closes-bug" should be used. Since this fix is already merged, if it does actually fix the bug then this bug should be marked as Fix Committed.

ChangBo Guo(gcb) (glongwave) wrote :

It takes about 2.5 seconds to run this test method. Not sure this only occurred run all unit tests.

tox -e py34 nova.tests.unit.test_fixtures.TestOSAPIFixture.test_responds_to_version
Ran: 1 tests in 29.2162 sec.
 - Passed: 1
 - Skipped: 0
 - Expected Fail: 0
 - Unexpected Success: 0
 - Failed: 0
Sum of execute time for each test: 2.5056 sec.

Worker Balance
 - Worker 0 (1 tests) => 0:00:02.505606

Slowest Tests:

Test id Runtime (s)
----------------------------------------------------------------------- -----------
nova.tests.unit.test_fixtures.TestOSAPIFixture.test_responds_to_version 2.506

Submitter: Jenkins
Branch: master

commit 4173e4192142d9ba0d02e92082522b7fb651abe0
Author: ChangBo Guo(gcb) <email address hidden>
Date: Thu Dec 29 14:20:29 2016 +0800

    Enable TestOSAPIFixture.test_responds_to_version on Python 3

    It takes about 2.5 seconds to run this test, seems the issue
    has gone. Not sure this is fixed by eventlet now. Let's enable
    it now and figure out root cause and fix the issue.

    Related-Bug: #1558105
    Partially-Implements: blueprint goal-python35

    Change-Id: I6826a4ba3ea5656471cdffbaa2883b522b4b22f0

Sean Dague (sdague) wrote :

There are no currently open reviews on this bug, changing
the status back to the previous state and unassigning. If
there are active reviews related to this bug, please include
links in comments.

Changed in nova:
status: In Progress → New
assignee: Chris Dent (cdent) → nobody
Sean Dague (sdague) wrote :

This appears fixed, please reopen if still an issue

Changed in nova:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers