subunit-ls is very slow on reasonably sized streams

Bug #1388586 reported by Jelmer Vernooij
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
subunit
Triaged
High
Unassigned

Bug Description

subunit-ls takes a very long time to run on a Samba-generated subunit2 file.

gwenhwyvar:~/src/samba% wc -l st/subunit2
247593 st/subunit2
gwenhwyvar:~/src/samba% du -sch st/subunit2
17M st/subunit2
gwenhwyvar:~/src/samba% time PYTHONPATH=lib/testtools:lib/subunit/python:lib/extras:lib/mimeparse ./lib/subunit/filters/subunit-ls < st/subunit2 > /dev/null
PYTHONPATH=lib/testtools:lib/subunit/python:lib/extras:lib/mimeparse < > 100.19s user 9.87s system 98% cpu 1:51.45 total

Revision history for this message
Robert Collins (lifeless) wrote :

Could you attach the sample file?

Changed in subunit:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Here is another file that is slow, though it is not quite as bad as the file mentioned in the original report.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

subunit-ls --no-passthrough < failed-subunit2 89.40s user 9.32s system 99% cpu 1:39.64 total

Revision history for this message
Robert Collins (lifeless) wrote :

Ok, so a quick examination shows that this file has lots of unencapsulated stdout. encapsulating it via subunit-filter -s to a new file gets me:

time ../../filters/subunit-ls < ~/Downloads/failed-subunit2.2 > /dev/null

real 0m3.875s
user 0m3.775s
sys 0m0.087s

or as time ../../filters/subunit-ls --no-passthrough < ~/Downloads/failed-subunit2.2

real 0m4.843s
user 0m4.444s
sys 0m0.149s

vs

time ../../filters/subunit-ls < ~/Downloads/failed-subunit2 > /dev/null

real 1m14.881s
user 1m3.278s
sys 0m11.570s

So about 1 second in terminal emulator blocking when not rendering to /dev/null. and the rest in utf8 decode I suspect. Will profile shortly to confirm/deny this. Its still kindof slow either way, of course - there are only 6.5K tests, which makes this nearly a ms per test as processing.

Revision history for this message
Robert Collins (lifeless) wrote :
Revision history for this message
Robert Collins (lifeless) wrote :

On python 3.5
 time PYTHONPATH=$(pwd)/../../python ../../filters/subunit-ls --no-passthrough < ~/Downloads/failed-subunit2.2 > /dev/null

real 0m1.061s
user 0m1.014s
sys 0m0.049s

which is better; so interpreter issues perhaps. I wonder when the pypy JIT kicks in ? :)

Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 1388586] Re: subunit-ls is very slow on reasonably sized streams

On Wed, Dec 10, 2014 at 07:34:28AM -0000, Robert Collins wrote:
> On python 3.5
> time PYTHONPATH=$(pwd)/../../python ../../filters/subunit-ls --no-passthrough < ~/Downloads/failed-subunit2.2 > /dev/null
>
> real 0m1.061s
> user 0m1.014s
> sys 0m0.049s
>
> which is better; so interpreter issues perhaps. I wonder when the pypy
> JIT kicks in ? :)
FWIW I'm running it on Python2.7. Can you reproduce the issue on 2.x?

Jelmer

Revision history for this message
Robert Collins (lifeless) wrote :

Ah, I wasn't clear. Yes, its very slow there with unencapsulated stdout, and nice and fast without. The file failed-subunit2 has unencapsulated stdout, and failed-subunit2.2 has encapsulated stdout.

unencapsulated:
 time PYTHONPATH=../../python ../../filters/subunit-ls --no-passthrough < ~/Downloads/failed-subunit2 | wc -l
6846

real 2m25.130s
user 2m14.084s
sys 0m11.006s

encapsulated:
 time PYTHONPATH=../../python ../../filters/subunit-ls --no-passthrough < ~/Downloads/failed-subunit2.2 | wc -l
6426

real 0m1.063s
user 0m1.014s
sys 0m0.049s

We should obviously fix this.

A workaround would be to generate encapsulated stdout from the get-go. How was the stream generated?

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

This was actually generated with subunit-1to2 IIRC.

Revision history for this message
Robert Collins (lifeless) wrote :

hmm, will need to check thats behaviour.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.