subunit stream corruption when python test processes segfault

Bug #1060616 reported by Robert Collins on 2012-10-03
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Critical
Unassigned

Bug Description

Running these tests http://paste.ubuntu.com/1257313/ on stevenk's jenkins environment fails and corrupts the stream at the same time. The corrupt stream makes analysis of the failure hard; this bug is about the corrupt stream.

To reproduce:
setup an instance the way Steve has, and run
:~/launchpad/lp-branches/workspace/devel$ sudo /usr/local/bin/lp-setup-lxc-test lptests /home/jenkins/.ssh/id_rsa --load-list /home/jenkins/runs/tmpzqc6_R > ~/runs/stream 2>~/runs/stream_err

Where the load list is from the pastebin.

The stream will then be corrupt - running it through subunit2pyunit, for instance, will spew a lot of test protocol data to stdout, and show an additional failed test - the disconnected during test.

Changed in launchpad:
status: New → Triaged
importance: Undecided → Critical
Robert Collins (lifeless) wrote :

lp.registry.tests.test_distributionmirror_prober.TestProberProtocolAndFactory.test_connect_to_proxy_when_http_proxy_exists
 is the last actual test before stream data gets exposed, and lp.testing.layers.YUIAppServerLayer:setUp is the last virtual test:
"
lp.registry.tests.test_distributionmirror_prober.TestProberProtocolAndFactory.test_connect_to_proxy_when_http_proxy_exists
lp.testing.layers.TwistedLayer:tearDown
lp.testing.layers.BaseLayer:tearDown
Running in a subprocess.
lp.testing.layers.BaseLayer:setUp
lp.testing.layers.MemcachedLayer:setUp
lp.testing.layers.YUIAppServerLayer:setUp
test: Could not communicate with subprocess
successful: Could not communicate with subprocess
"

Note that 'Running in a subprocess.\n'

should be 'test: Running in a subprocess.\n' -s the corruption is ~ there.

looking at the raw stream we see:"
test: lp.testing.layers.YUIAppServerLayer:setUp
tags: zope:layer
time: 2012-10-03 04:01:37.335434Z
successful: lp.testing.layers.YUIAppServerLayer:setUp
tags: zope:layer:lp.testing.layers.YUIAppServerLayer
time: 2012-10-03 04:01:37.775637Z
test: lp/testing/tests/test_yuixhr_fixture_facet
test: Could not communicate with subprocess
tags: zope:error_with_banner
successful: Could not communicate with subprocess
"

Which suggests that lp/testing/tests/test_yuixhr_fixture_facet is failing sufficiently hard that the finally: block which should be reporting on the test failure isn't executing at all - not even enough to report that it didn't complete the test.

Robert Collins (lifeless) wrote :

In fact, its worse, it manages to skip tests entirely when this error happens:
with a list to load containing:"
lp/testing/tests/test_yuixhr_fixture_facet
lp.translations.browser.tests.test_productserieslanguage_views.TestProductSeriesView.test_single_potemplate_no_template
lp.translations.tests.test_potmsgset.TestCheckForConflict.test_quiet_if_no_timestamp
"
The latter two being tests that ran after the ones that dies -hard-

The following stream is generated:"
Setting up ephemeral container...
Starting up the container...
time: 2012-10-03 04:44:01.921674Z
test: lp.testing.layers.BaseLayer:setUp
tags: zope:layer
time: 2012-10-03 04:44:01.992877Z
successful: lp.testing.layers.BaseLayer:setUp
tags: zope:layer:lp.testing.layers.BaseLayer
time: 2012-10-03 04:44:01.993050Z
test: lp.testing.layers.MemcachedLayer:setUp
tags: zope:layer
time: 2012-10-03 04:44:02.166581Z
successful: lp.testing.layers.MemcachedLayer:setUp
tags: zope:layer:lp.testing.layers.MemcachedLayer
time: 2012-10-03 04:44:02.166850Z
test: lp.testing.layers.YUIAppServerLayer:setUp
tags: zope:layer
time: 2012-10-03 04:44:21.877628Z
successful: lp.testing.layers.YUIAppServerLayer:setUp
tags: zope:layer:lp.testing.layers.YUIAppServerLayer
time: 2012-10-03 04:44:22.322311Z
test: lp/testing/tests/test_yuixhr_fixture_facet
"

Note that the additional tests didn't run at all. If the first line of the load-list is removed the additional tests run correctly, demonstrating that the ids are valid.

Robert Collins (lifeless) wrote :

Another test from the same partition that show the error:
lib/lp/code/javascript/tests/test_productseries-setbranch.html

Robert Collins (lifeless) wrote :

For a non-crashing system, making a test that calls sys._exit(0) would likely be enough to reproduce the stream corruption.

summary: - subunit stream corruption with jenkins test runs
+ subunit stream corruption when python test processses segfault
Curtis Hovey (sinzui) on 2012-10-04
tags: added: test-system
Colin Watson (cjwatson) on 2014-11-24
summary: - subunit stream corruption when python test processses segfault
+ subunit stream corruption when python test processes segfault
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers