Random timeout on gate-trove-python34-db

Bug #1582257 reported by Victor Stinner
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack DBaaS (Trove)
In Progress
Undecided
Unassigned

Bug Description

Running the unit tests on Python 3 sometimes takes 37 minutes, whereas the usual duration is closer to 15 minutes.

Example with https://review.openstack.org/#/c/313941/:

(a) May 9 1:14 AM: "gate-trove-python27-db SUCCESS in 9m 52s" and "gate-trove-python34-db SUCCESS in 37m 14s"
(b) May 10 1:48 AM: "gate-trove-python27-db SUCCESS in 5m 59s" and "gate-trove-python34-db SUCCESS in 10m 10s"

Duration of Python 2 unit tests:

(a) "Ran 398 tests in 34.702s"
(b) "Ran 398 tests in 52.965s"

Duration of Python 3 unit tests:

(a) "Ran 686 tests in 15.971s"
(b) "Ran 686 tests in 1581.090s"

Building .tox/py27 took:

(a) ~46 seconds
(b) ~77 seconds

Building .tox/py34 took:

(a) ~7 min 51 sec
(b) ~8 min 46 sec

Notes:

* the number of Python 3 unit tests is supposed to be much lower, since we only run a subset of unit tests known to pass on Python 3
* even on Python 2, the timings change a lot. Maybe the CI uses slaves with variable performance?
* building the tox env takes 7x longer on Python 3: it looks like Python 2 gets binary wheel packages for many packages, whereas Python 3 gets tarball and has to rebuild locally wheel packages: see all "Running setup.py bdist_wheel for <package name>: started" lines in tox/py34-1.log.txt.gz and compare to the few similar lines in tox/py27-1.log.txt.gz
* the real bug is that running unit tests on Python 3 sometimes takes 1581 seconds instead of 15 seconds, 100x slower

Revision history for this message
Victor Stinner (vstinner) wrote :

> building the tox env takes 7x longer on Python 3: (...)

I proposed the fix https://review.openstack.org/#/c/316890/ to precompile also wheel packages for Python 3 on the OpenStack infra.

Revision history for this message
Victor Stinner (vstinner) wrote :

> Duration of Python 2 unit tests:
>
> (a) "Ran 398 tests in 34.702s"
> (b) "Ran 398 tests in 52.965s"

Ooops, sorry, I read the wrong line. In fact, the correct timings are:

Duration of Python 2 unit tests:

(a) Ran 1495 tests in 22.595s
(b) Ran 1495 tests in 393.364s

So on this example, Python 2 unit tests are also 17x slower between the two runs. The huge performance issue is not specific to Python 3.

Revision history for this message
Victor Stinner (vstinner) wrote :

> (...) the real bug is that running unit tests on Python 3 sometimes takes 1581 seconds instead of 15 seconds, 100x slower
> (...)
> (...) So on this example, Python 2 unit tests are also 17x slower between the two runs.

The difference between Python 2 and Python 3, 17x slower versus 100x slower, can be explained by the fact that currently unit tests are run in parallel on Python 2 (the CI uses 8 processes) but are sequentially on Python 3 (1 process).

So the real bug is that sometimes running tests is like *much* slower, *even on Python 2*.

IMHO the slowdown comes from the code detecting dangling mocks in the base TestCase in trove/tests/unittests/trove_testtools.py.

Revision history for this message
Victor Stinner (vstinner) wrote :

I wrote a short patch to measure the time spent in searching all "dangling mocks": 161.8 sec whereas running 686 tests took 175.466 sec (I ran tests on Python 3).

=> searching mocks take 92% of time to run unit tests...

Revision history for this message
Victor Stinner (vstinner) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to trove (master)

Fix proposed to branch: master
Review: https://review.openstack.org/316955

Changed in trove:
assignee: nobody → Victor Stinner (victor-stinner)
status: New → In Progress
Revision history for this message
Victor Stinner (vstinner) wrote :

My change https://review.openstack.org/#/c/316890/ "Build also wheel packages for Python 3" was merged, but I see that gate-trove-python34-db still has to compile a lot of wheel packages. It looks like the "wheel mirror" job did not run since my change was merged. The Python 3 job still takes much longer than the Python 2 because of that:

* gate-trove-python27-db: SUCCESS
* gate-trove-python34-db: SUCCESS

Revision history for this message
Victor Stinner (vstinner) wrote :

pip is not installed on the server building wheel packages, I reported the bug to the opentack infra team: https://storyboard.openstack.org/#!/story/2000593

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on trove (master)

Change abandoned by Victor Stinner (<email address hidden>) on branch: master
Review: https://review.openstack.org/316955
Reason: It looks like this approach is strongly rejected by Trove developers, so I hope that my fix on the openstack infra (building wheel packages on Python 3) will reduce the risk of timeout on the Python 3 timeout.

Changed in trove:
assignee: Victor Stinner (victor-stinner) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.