Launchpad itself

subunit stream corruption when python test processes segfault

Bug #1060616 reported by Robert Collins on 2012-10-03

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Launchpad itself	Triaged	Critical	Unassigned

Bug Description

Running these tests http://paste.ubuntu.com/1257313/ on stevenk's jenkins environment fails and corrupts the stream at the same time. The corrupt stream makes analysis of the failure hard; this bug is about the corrupt stream.

To reproduce:
setup an instance the way Steve has, and run
:~/launchpad/lp-branches/workspace/devel$ sudo /usr/local/bin/lp-setup-lxc-test lptests /home/jenkins/.ssh/id_rsa --load-list /home/jenkins/runs/tmpzqc6_R > ~/runs/stream 2>~/runs/stream_err

Where the load list is from the pastebin.

The stream will then be corrupt - running it through subunit2pyunit, for instance, will spew a lot of test protocol data to stdout, and show an additional failed test - the disconnected during test.

Tags:

Robert Collins (lifeless) on 2012-10-03

Changed in launchpad:
status:	New → Triaged
importance:	Undecided → Critical

Revision history for this message

Robert Collins (lifeless) wrote on 2012-10-03:

lp.registry.tests.test_distributionmirror_prober.TestProberProtocolAndFactory.test_connect_to_proxy_when_http_proxy_exists
is the last actual test before stream data gets exposed, and lp.testing.layers.YUIAppServerLayer:setUp is the last virtual test:
"
lp.registry.tests.test_distributionmirror_prober.TestProberProtocolAndFactory.test_connect_to_proxy_when_http_proxy_exists
lp.testing.layers.TwistedLayer:tearDown
lp.testing.layers.BaseLayer:tearDown
Running in a subprocess.
lp.testing.layers.BaseLayer:setUp
lp.testing.layers.MemcachedLayer:setUp
lp.testing.layers.YUIAppServerLayer:setUp
test: Could not communicate with subprocess
successful: Could not communicate with subprocess
"

Note that 'Running in a subprocess.\n'

should be 'test: Running in a subprocess.\n' -s the corruption is ~ there.

looking at the raw stream we see:"
test: lp.testing.layers.YUIAppServerLayer:setUp
tags: zope:layer
time: 2012-10-03 04:01:37.335434Z
successful: lp.testing.layers.YUIAppServerLayer:setUp
tags: zope:layer:lp.testing.layers.YUIAppServerLayer
time: 2012-10-03 04:01:37.775637Z
test: lp/testing/tests/test_yuixhr_fixture_facet
test: Could not communicate with subprocess
tags: zope:error_with_banner
successful: Could not communicate with subprocess
"

Which suggests that lp/testing/tests/test_yuixhr_fixture_facet is failing sufficiently hard that the finally: block which should be reporting on the test failure isn't executing at all - not even enough to report that it didn't complete the test.

Revision history for this message

Robert Collins (lifeless) wrote on 2012-10-03:

In fact, its worse, it manages to skip tests entirely when this error happens:
with a list to load containing:"
lp/testing/tests/test_yuixhr_fixture_facet
lp.translations.browser.tests.test_productserieslanguage_views.TestProductSeriesView.test_single_potemplate_no_template
lp.translations.tests.test_potmsgset.TestCheckForConflict.test_quiet_if_no_timestamp
"
The latter two being tests that ran after the ones that dies -hard-

The following stream is generated:"
Setting up ephemeral container...
Starting up the container...
time: 2012-10-03 04:44:01.921674Z
test: lp.testing.layers.BaseLayer:setUp
tags: zope:layer
time: 2012-10-03 04:44:01.992877Z
successful: lp.testing.layers.BaseLayer:setUp
tags: zope:layer:lp.testing.layers.BaseLayer
time: 2012-10-03 04:44:01.993050Z
test: lp.testing.layers.MemcachedLayer:setUp
tags: zope:layer
time: 2012-10-03 04:44:02.166581Z
successful: lp.testing.layers.MemcachedLayer:setUp
tags: zope:layer:lp.testing.layers.MemcachedLayer
time: 2012-10-03 04:44:02.166850Z
test: lp.testing.layers.YUIAppServerLayer:setUp
tags: zope:layer
time: 2012-10-03 04:44:21.877628Z
successful: lp.testing.layers.YUIAppServerLayer:setUp
tags: zope:layer:lp.testing.layers.YUIAppServerLayer
time: 2012-10-03 04:44:22.322311Z
test: lp/testing/tests/test_yuixhr_fixture_facet
"

Note that the additional tests didn't run at all. If the first line of the load-list is removed the additional tests run correctly, demonstrating that the ids are valid.

Revision history for this message

Robert Collins (lifeless) wrote on 2012-10-03:

Another test from the same partition that show the error:
lib/lp/code/javascript/tests/test_productseries-setbranch.html

Revision history for this message

Robert Collins (lifeless) wrote on 2012-10-03:

For a non-crashing system, making a test that calls sys._exit(0) would likely be enough to reproduce the stream corruption.

Robert Collins (lifeless) on 2012-10-04

summary:

- subunit stream corruption with jenkins test runs
+ subunit stream corruption when python test processses segfault

Curtis Hovey (sinzui) on 2012-10-04

tags:

added: test-system

Colin Watson (cjwatson) on 2014-11-24

summary:

- subunit stream corruption when python test processses segfault
+ subunit stream corruption when python test processes segfault

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.