subunit stream corruption when python test processes segfault

Bug #1060616 reported by Robert Collins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
Critical
Unassigned

Bug Description

Running these tests http://paste.ubuntu.com/1257313/ on stevenk's jenkins environment fails and corrupts the stream at the same time. The corrupt stream makes analysis of the failure hard; this bug is about the corrupt stream.

To reproduce:
setup an instance the way Steve has, and run
:~/launchpad/lp-branches/workspace/devel$ sudo /usr/local/bin/lp-setup-lxc-test lptests /home/jenkins/.ssh/id_rsa --load-list /home/jenkins/runs/tmpzqc6_R > ~/runs/stream 2>~/runs/stream_err

Where the load list is from the pastebin.

The stream will then be corrupt - running it through subunit2pyunit, for instance, will spew a lot of test protocol data to stdout, and show an additional failed test - the disconnected during test.

Tags: test-system
Changed in launchpad:
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
Robert Collins (lifeless) wrote :

lp.registry.tests.test_distributionmirror_prober.TestProberProtocolAndFactory.test_connect_to_proxy_when_http_proxy_exists
 is the last actual test before stream data gets exposed, and lp.testing.layers.YUIAppServerLayer:setUp is the last virtual test:
"
lp.registry.tests.test_distributionmirror_prober.TestProberProtocolAndFactory.test_connect_to_proxy_when_http_proxy_exists
lp.testing.layers.TwistedLayer:tearDown
lp.testing.layers.BaseLayer:tearDown
Running in a subprocess.
lp.testing.layers.BaseLayer:setUp
lp.testing.layers.MemcachedLayer:setUp
lp.testing.layers.YUIAppServerLayer:setUp
test: Could not communicate with subprocess
successful: Could not communicate with subprocess
"

Note that 'Running in a subprocess.\n'

should be 'test: Running in a subprocess.\n' -s the corruption is ~ there.

looking at the raw stream we see:"
test: lp.testing.layers.YUIAppServerLayer:setUp
tags: zope:layer
time: 2012-10-03 04:01:37.335434Z
successful: lp.testing.layers.YUIAppServerLayer:setUp
tags: zope:layer:lp.testing.layers.YUIAppServerLayer
time: 2012-10-03 04:01:37.775637Z
test: lp/testing/tests/test_yuixhr_fixture_facet
test: Could not communicate with subprocess
tags: zope:error_with_banner
successful: Could not communicate with subprocess
"

Which suggests that lp/testing/tests/test_yuixhr_fixture_facet is failing sufficiently hard that the finally: block which should be reporting on the test failure isn't executing at all - not even enough to report that it didn't complete the test.

Revision history for this message
Robert Collins (lifeless) wrote :

In fact, its worse, it manages to skip tests entirely when this error happens:
with a list to load containing:"
lp/testing/tests/test_yuixhr_fixture_facet
lp.translations.browser.tests.test_productserieslanguage_views.TestProductSeriesView.test_single_potemplate_no_template
lp.translations.tests.test_potmsgset.TestCheckForConflict.test_quiet_if_no_timestamp
"
The latter two being tests that ran after the ones that dies -hard-

The following stream is generated:"
Setting up ephemeral container...
Starting up the container...
time: 2012-10-03 04:44:01.921674Z
test: lp.testing.layers.BaseLayer:setUp
tags: zope:layer
time: 2012-10-03 04:44:01.992877Z
successful: lp.testing.layers.BaseLayer:setUp
tags: zope:layer:lp.testing.layers.BaseLayer
time: 2012-10-03 04:44:01.993050Z
test: lp.testing.layers.MemcachedLayer:setUp
tags: zope:layer
time: 2012-10-03 04:44:02.166581Z
successful: lp.testing.layers.MemcachedLayer:setUp
tags: zope:layer:lp.testing.layers.MemcachedLayer
time: 2012-10-03 04:44:02.166850Z
test: lp.testing.layers.YUIAppServerLayer:setUp
tags: zope:layer
time: 2012-10-03 04:44:21.877628Z
successful: lp.testing.layers.YUIAppServerLayer:setUp
tags: zope:layer:lp.testing.layers.YUIAppServerLayer
time: 2012-10-03 04:44:22.322311Z
test: lp/testing/tests/test_yuixhr_fixture_facet
"

Note that the additional tests didn't run at all. If the first line of the load-list is removed the additional tests run correctly, demonstrating that the ids are valid.

Revision history for this message
Robert Collins (lifeless) wrote :

Another test from the same partition that show the error:
lib/lp/code/javascript/tests/test_productseries-setbranch.html

Revision history for this message
Robert Collins (lifeless) wrote :

For a non-crashing system, making a test that calls sys._exit(0) would likely be enough to reproduce the stream corruption.

summary: - subunit stream corruption with jenkins test runs
+ subunit stream corruption when python test processses segfault
Curtis Hovey (sinzui)
tags: added: test-system
Colin Watson (cjwatson)
summary: - subunit stream corruption when python test processses segfault
+ subunit stream corruption when python test processes segfault
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.