Comment 4 for bug 922804

Revision history for this message
Yavor Nikolov (yavor-nikolov) wrote :

Thanks for your comments Andrew,
I found a wiki dump (dewiki-20120116-pages-articles.xml.bz2 - 2.4G compressed). I tested pbzip2 decompression on it - just one of the CPUs was heavily loaded (the other being pretty idle). real and user time being pretty close.

bz2StreamScanner.getNextStream() is chunking data into pieces if stream is big (There is a size limit - something like 1MB, if the stream is larger - there are many for one with a sequence number attached to each piece, last one has isLastInSequence=true).

Sharing between threads when decompressing single stream is something which is implemented in another utility - lbzip2. (But has more complicated logic for splitting streams).
Another thing is I think lbzip2 is also able to parallel-compress producing single stream.