zope.testing --subunit allows bad output on stdout, which can break subunit processing and tests

Bug #996729 reported by Gary Poster
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Graham Binns

Bug Description

The parallel testing had a test run that only reported about 3000 tests even though the full 16000+ ran. This was because the stream it processed included the following test: http://pastebin.ubuntu.com/976408/

Obviously, """No handlers could be found for logger "lazr.smtptest"\n""" has been inserted into the stream. Without that, the stream can be processed correctly.

This bug's title partially directs a solution by attributing responsibility. This comes from a discussion with Robert in which he said that subunit shouldn't be expected to handle malformed output of this sort. Similarly, we could look for all sources of these complaints in the Launchpad tests (probably from a __del__ method when something is being garbage collected) but that will be error-prone, difficult, and not future-proof.

Instead, the test runner itself ought to isolate the subunit output. We could do that either by making the subunit stream not go to stdout, or make everything else not go to stdout (e.g., "sys.stdout = sys.stderr"). Robert notes that the former solution has the disadvantage that it does not allow piping of subunit output, which is intended to be one of the primary user interfaces for the stream. To pursue the latter solution, we may have test breakage, now and in the future, but it preserves the subunit interface and he prefers it. He points out that diagnosis of future test breakage might be helped if we can turn the sys.stdout redirection to stderr off and on.

I'd prefer that the --subunit flag to the testrunner imply that the stdout isolation should occur; that is, by default when you ask for a subunit stream, you should get a clean subunit stream, with the stdout->stderr redirection for everything else writing to stdout from the tests. For debugging, you can simply not run with --subunit, which will not perform the redirection of stdout. Further, if you really want to use --subunit but disable the stdout/stderr switch, maybe we could have another switch (--disable-stdout-switch?) but I think this is a YAGNI for now: not using --subunit seems sufficient at this stage. Robert's not sure he agrees, but is choosing not to weigh in.

This change should be made in our zope.testing fork.

Related branches

Revision history for this message
Gary Poster (gary) wrote :

Another idea from Robert, which would be cool: "make the replacement sys.stdout grab a traceback and report the object triggering the write--e.g. a wrapper around sys.stderr with a custom write() + writelines() methods"

Graham Binns (gmb)
Changed in launchpad:
assignee: nobody → Graham Binns (gmb)
status: Triaged → In Progress
Gary Poster (gary)
summary: zope.testing --subunit allows bad output on stdout, which can break
- subunit processing
+ subunit processing and tests
Graham Binns (gmb)
Changed in launchpad:
status: In Progress → Fix Committed
status: Fix Committed → In Progress
Graham Binns (gmb)
tags: added: qa-untestable
Changed in launchpad:
status: In Progress → Fix Committed
Revision history for this message
William Grant (wgrant) wrote :

Reverted in r15295 for breaking some subunit streams somehow.

Changed in launchpad:
status: Fix Committed → Triaged
Revision history for this message
Gary Poster (gary) wrote :

Thanks William, and sorry you had to do that. I was about to revert it myself.

More details on the breakage I saw are as follows.

The zope.testing p9 egg is causing the subunit output to miss tests. These vary from run to run. Based on one previous analysis of a test run, I had hoped that this difference was innocuous, or even a turn for the better: removing fake tests and only leaving good ones. Unfortunately that's not what I see now, and I suspect I know the mistake I made in the previous analysis (I forgot that subunit-filter filters out successful tests by default, and lets random other non-subunit output through by default).

I analyzed four runs. For each below, I list the number of tests reported, a link to the subunit output, and a link to the tests run for that subunit output.

16392
http://ubuntuone.com/4LP7EstXHWeVWa8r2eU1Ja
http://ubuntuone.com/3j6dFX4UpL59ttn3oKsnBt

16419
http://ubuntuone.com/2cdnWhvL7hB0ixV2HpxWxE
http://ubuntuone.com/72tYL5omcXEFn6W8kuTHti

16931
http://ubuntuone.com/6nc5UMMjpS60U626Y1i7rj
http://ubuntuone.com/1tV1LHc522EF90VAjXILzK

16943
http://ubuntuone.com/0GBFzS5Q84qVaapcXXDO8h
http://ubuntuone.com/7clScEtUPYZdrlFuAp3B8W

I created the test lists with a command like this, using the testing-cabal PPA's version of subunit:

cat SUBUNIT_FILE | subunit-filter --without-tag='zope:layer' --without-tag='zope:info_suboptimal' -s --no-passthrough | subunit-ls > TEST_NAME_FILE

A line count of the resulting file equaled exactly the number of tests that our buildbot configuration had reported.

After generating the test lists, I got the aggregated difference of all--or most--of the combinations. It is here.

http://ubuntuone.com/4mxQbnzetLxgB88hNnb1eS

Looking at that file, you can see that it is comprised of real tests in the main if not the totality--not mere blather that had been inadvertently converted to tests, as had been seen before.

I saw no pattern in the differences. Each of the four ran different tests, with even the one with the least tests running some tests that the one with the most tests did not run. The missed tests are split across many layers, ranging from lp.testing.layers.BaseLayer through lp.testing.layers.YUITestLayer to zope.testing.testrunner.layer.UnitTests .

The only other observation I had of note was that "lp.codehosting.codeimport.tests.test_worker.CodeImportBranchOpenPolicyTests.test_checkOneURL" could be found in the 16392 subunit, but not the test listing. However, other missing tests in that test run, such as "lp.soyuz.tests.test_packageupload.TestPackageUploadWithPackageCopyJob.test_acceptFromQueue_with_copy_job," could not be found in the associated subunit.

Any other observations of how the subunit files are broken would be welcome.

Revision history for this message
Gary Poster (gary) wrote :

One last note: all four of those test runs were of devel r15400

Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
tags: added: qa-needstesting
removed: qa-untestable
Changed in launchpad:
status: Triaged → Fix Committed
Gary Poster (gary)
Changed in launchpad:
status: Fix Committed → In Progress
Revision history for this message
Gary Poster (gary) wrote :

Notes from investigation:

The four runs have zope:layer tags inappropriately applied to about 50 tests, not layers.

This accounts for all of the discrepancies for the bigger three test runs.

The smaller test run also turns out to be another incident of bug 994752 (which is concerning, given that we have a ten minute time out for starting an ephemeral container at this point, but pretty clearly unrelated to the zope.testing p9 changes). worker-15 is absent from the results.

So, the incorrect zope:layer tags are the immediate source of the problem. We could switch to filtering on the test name not ending with ":setUp" and ":tearDown" as a workaround, but that's not a good resolution. We're trying to figure out how the zope.testing changes could have triggered this kind of change. It's far from obvious at the moment.

I am curious as to what subunit issues William saw, and I've asked him on IRC to clarify when he is back online. If he saw something else then perhaps we haven't fully identified the problem yet.

Revision history for this message
Gary Poster (gary) wrote :

I should say that zope:layer tags are inappropriately applied to about 50 runs in the two biggest test runs and in the smallest test run; the remaining test run (16419 tests) has hundreds of tests like this.

Revision history for this message
Gary Poster (gary) wrote :

It turned out to be a new variation of bug 986429. Looking at the MP for that bug, we needed to switch flushing sys.stdout and sys.stderr to flushing sys.__stdout__ and sys.__stderr__. Benji and/or Brad will be landing soon.

William Grant (wgrant)
tags: added: qa-untestable
removed: qa-needstesting
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :

Fixed in stable r15409 (http://bazaar.launchpad.net/~launchpad-pqm/launchpad/stable/revision/15409) by a commit, but not testable.

Changed in launchpad:
status: In Progress → Fix Committed
William Grant (wgrant)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.