FTBFS on arm64+ppc64el - out of memory

Bug #1769672 reported by Rebecca Palmer
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
theano (Debian)
Fix Released
Unknown
theano (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

The build logs show 16 tests running out of memory on arm64 [0] and 57 on ppc64el [1].

How much is too much, and is it a set maximum or dependent on what else the buildd is doing? A first attempt at running the same tests locally (Debian amd64, python3) gave ~300MB memory usage.

The failed tests are towards the end, which suggests the possibility of a cumulative memory leak, but the PPA [2] and the original fail on exactly the same number of tests.

[0] https://launchpad.net/ubuntu/+source/theano/1.0.1+dfsg-2/+build/14845703/+files/buildlog_ubuntu-cosmic-arm64.theano_1.0.1+dfsg-2_BUILDING.txt.gz
[1] https://launchpad.net/ubuntu/+source/theano/1.0.1+dfsg-2/+build/14845705/+files/buildlog_ubuntu-cosmic-ppc64el.theano_1.0.1+dfsg-2_BUILDING.txt.gz
[2] https://launchpadlibrarian.net/369115909/buildlog_ubuntu-cosmic-ppc64el.theano_1.0.1+dfsg-2ubuntu1~ppa1_BUILDING.txt.gz (Debian doesn't run the tests in parallel anyway, but I don't know whether the default is different in Ubuntu)

Tags: ftbfs
Revision history for this message
Graham Inggs (ginggs) wrote :

Hi, you wrote Debian doesn't run the tests in parallel, do you mean for theano in particular, or in general?

codesearch.debian.net shows quite a few instances of:
dh_auto_test --max-parallel-1
and
dh_auto_test --no-parallel

...but those may have been added for Ubuntu's sake.

Revision history for this message
Rebecca Palmer (rebecca-palmer) wrote :

Looks like it's a cumulative memory leak: over the python2 tests, memory usage increases by ~6GB. (Measured by occasionally looking at System Monitor, so I know it's more than one step, but not whether it's a few bad tests or a smaller amount on every test.) This is then freed when the python2 tests finish; I didn't wait to see whether the python3 half would repeat it.

This suggests that a quick workaround would be to split the test suite into several batches: upstream actually provide a tool to do that (theano/tests/run_tests_in_batch.py), though they seem to think the need for it is Windows-specific. However, this would be hiding a real bug if the memory leak can also happen in normal use of Theano.

Debian defaults to parallel builds in dh compat>=10 (theano sets this to 11), but System Monitor shows theano's tests using ~1 core of CPU. I suspect this is because Theano uses a custom test runner script (with nearly the same interface as nosetests, but dh doesn't know that). I also vaguely remember considering parallel testing in a previous release and deciding/finding it wouldn't be a good idea, but can't remember why.

Changed in theano (Debian):
status: Unknown → New
Revision history for this message
Rebecca Palmer (rebecca-palmer) wrote :

The underlying issue is not new; I don't know whether it's got worse per test or the test suite has just got longer. See the linked Debian bug for further analysis.

This will probably make it build (by splitting up the test suite, i.e. it doesn't solve the underlying problem) but I haven't had time to test it.

--- a/debian/rules
+++ b/debian/rules
@@ -39,7 +39,7 @@ override_dh_auto_install:

 override_dh_auto_test:
- PYBUILD_SYSTEM=custom PYBUILD_TEST_ARGS='PYTHONPATH=. {interpreter} bin/theano-nose -v' dh_auto_test
+ PYBUILD_SYSTEM=custom PYBUILD_TEST_ARGS='PYTHONPATH=. {interpreter} theano/tests/run_tests_in_batch.py -v' dh_auto_test

 override_dh_installdocs:
        dh_installdocs -A README.html

Revision history for this message
Graham Inggs (ginggs) wrote :

I'll test and upload if successful.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package theano - 1.0.1+dfsg-2ubuntu1

---------------
theano (1.0.1+dfsg-2ubuntu1) cosmic; urgency=medium

  * Split the test suite to avoid OOM on arm64 and ppc64el,
    thanks Rebecca N. Palmer (LP: #1769672)

 -- Graham Inggs <email address hidden> Sat, 12 May 2018 07:48:47 +0000

Changed in theano (Ubuntu):
status: New → Fix Released
Changed in theano (Debian):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.