pbzip2 decompress does not appear to use multiple cores in some cases

Bug #1236194 reported by haikuty
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
pbzip2
Invalid
Undecided
Unassigned

Bug Description

I have a tbz file that doesn't use multiple cores when decompressing with pbzip2.

I haven't dug into the code, but when I run pbzip2 to decompress any of these tar files (via the -d flag) I see it using 8 threads, but only a single core (max CPU is 102.9%).

If I do a compress with pbzip2 then I see it using 8 threads and 350-370% (which is all 4 cores on this Core i5 iMac). This is what I expect to see.

If I re-compress the uncompressed tar file with pbzip2 and then run pbzip2 -d on the version that was compressed by pbzip2 then pbzip2 will use all the cores to decompress it as I'd expect.

So something about a file compressed by this other person makes it not be decompressed by pbzip2. Very odd.

Unfortunately I can't upload this particular test file but I'll try to see if I can construct one that exhibits the same issue.

Revision history for this message
Yavor Nikolov (yavor-nikolov) wrote :

pbzip2 is able to decompress in parallel only if the archive is in a format like [bz2 chunk1][bz2 chunk2][bz2 chunk3]...

pbzip2 is producing such kind of archives. However if archive has been created with another utility - e.g. bzip2 - it may end up having a single monolitic [bz2 chunk]. pbzip2 is not presently parallelizing decompression of such single-chunk archives.

Revision history for this message
haikuty (tyler-q) wrote :

Ah, I wondered if it was something like that. How unfortunate since I have no control over the creation of these files. Guess I'll be looking at different solutions to speed this all up. Thanks for the information!

Changed in pbzip2:
status: New → Invalid
Revision history for this message
Yavor Nikolov (yavor-nikolov) wrote :

As far as I remember lbzip2 was able to decompress in multiple thread even single-chunk archives.

Revision history for this message
haikuty (tyler-q) wrote :

hey thanks for this! lbzip2 does indeed seem to be able to more fully utilize multi-core even for the single-chunk archives I have to deal with. Not 100% of all 4 cores, but 250-300% which is still a big improvement. Thanks for the tip!

Revision history for this message
Yavor Nikolov (yavor-nikolov) wrote :

Both pbzip2 and lbzip2 have command line options for setting the number of processing threads. (You can try to manually increase it if the auto-calculation doesn't saturate your CPUs enough).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.