pbunzip2 --ignore-trailing-garbage=1 hangs with large enough -p# - consumers hang after producer is interrupted

Bug #762464 reported by David James on 2011-04-16
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
pbzip2
Medium
Yavor Nikolov

Bug Description

I'm seeing regular hangs with pbzip2 on a 16 core machine when using pbzip2 to decompress lots of portage packages. I can reproduce the hang about once every 200 decompresses if 5 or more processors are used. With 4 processors or less, I can't reproduce the issue. Specifying -p4 works around the issue, but -p5 or higher leads to crashes...

Reproduction recipe:

$ pbzip2 --version
Parallel BZIP2 v1.1.3 - by: Jeff Gilchrist [http://compression.ca]
[Mar. 27, 2011] (uses libbzip2 by Julian Seward)
Major contributions: Yavor Nikolov <email address hidden>
$ wget http://commondatastorage.googleapis.com/chromeos-prebuilt/board/x86-generic/full-15.04.11.141852/packages/dev-db/sqlite-3.6.22-r2.tbz2
$ for i in $(seq 1 1000); do echo $i; pbzip2 -dc --ignore-trailing-garbage=1 sqlite-3.6.22.tbz2 > /dev/null; done

After 103 tries, pbzip2 hung on the archive, without using any CPU. I can exit by hitting CTRL-C. Here is example output:
pbzip2: *WARNING: Trailing garbage after EOF ignored!
^C
 *Control-C or similar caught [sig=2], quitting...
Terminator thread: premature exit requested - quitting..

I've attached a sample archive that reproduces the crash. I've reproduced the crash with several different archives, all of which have trailing garbage in xpak format (from Gentoo Portage).

David James (davidjames) wrote :
Yavor Nikolov (yavor-nikolov) wrote :

Thanks for reporting that. Does the issue also occur with --ignore-trailing-garbage=0?

David James (davidjames) wrote :

Here's a crashing output using -k:

Parallel BZIP2 v1.1.3 - by: Jeff Gilchrist [http://compression.ca]
[Mar. 27, 2011] (uses libbzip2 by Julian Seward)
Major contributions: Yavor Nikolov <email address hidden>

         # CPUs: 16
 Maximum Memory: 100 MB
 Ignore Trailng Garbage: on
-------------------------------------------
         File #: 1 of 1
     Input Name: sqlite-3.6.22.tbz2
    Output Name: <stdout>

 BWT Block Size: 900k
     Input Size: 1619402 bytes
Decompressing data...
    Output Size: 3614720 bytes
pbzip2: *WARNING: Trailing garbage after EOF ignored!
Completed: 100%

^C
 *Control-C or similar caught [sig=2], quitting...
Terminator thread: premature exit requested - quitting...

... and the last few lines of output of the same command using strace:

mmap(NULL, 430080, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fecf0c7d000
read(3, "\325Z\26\17\341\244\377~#o\246 \347'\331\353Ct\241\364\352\360\325\223\332l\233\360\233\24v?"..., 1048567) = 570826
read(3, "", 477741) = 0
mmap(NULL, 536576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fecf0bfa000
munmap(0x7fecf0c7d000, 430080) = 0
futex(0x1639484, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1639480, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
mmap(NULL, 536576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fecfbf38000
futex(0x1639484, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1639480, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
mmap(NULL, 536576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fecfbeb5000
futex(0x1639484, FUTEX_WAKE_OP_PRIVATE, 1, 1, 0x1639480, {FUTEX_OP_SET, 0, FUTEX_OP_CMP_GT, 1}) = 1
mmap(NULL, 536576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fecf02e1000
read(3, "", 1048567) = 0
close(3) = 0
munmap(0x7fecf164e000, 1052672) = 0
    Output Size: 3614720 bytesIT, 20044, NULLCompleted: 24%
pbzip2: *WARNING: Trailing garbage after EOF ignored!
) = 0eted: 100%
futex(0x7fecf9f5f9e0, FUTEX_WAIT, 20028, NULL

David James (davidjames) wrote :

I tested, and pbzip2 works fine with archives that don't have trailing garbage at the end. This hang only happens in the case where pbzip2 is actually ignoring trailing garbage.

Yavor Nikolov (yavor-nikolov) wrote :

I also managed to reproduce the issue. Seems it's always hang (by "crash" I would expect pbzip2 to abort abnormally by itself).

Changed in pbzip2:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Yavor Nikolov (yavor-nikolov)
milestone: none → 1.1.4
Changed in pbzip2:
status: Confirmed → In Progress
summary: - pbunzip2 --ignore-trailing-garbage=1 still hangs with 1.1.3
+ pbunzip2 --ignore-trailing-garbage=1 hangs with large enough -p# -
+ consumers hang after producer is interrupted
Yavor Nikolov (yavor-nikolov) wrote :

Root cause: some consumers loop forever when input queue is empty and producer has been interrupted (due to the trailing garbage detection).

Proposed solution:
producer should register it's ending when prematurely exits
and/or
consumers' interrupt check should take into account consumers waiting for brand new blocks when garbage has already been detected

I'll publish a fix today

Yavor Nikolov (yavor-nikolov) wrote :

I'm uploading a patch which seems to fix this bug.

Changed in pbzip2:
status: In Progress → Fix Committed
Yavor Nikolov (yavor-nikolov) wrote :

I've just committed fix for that bug to 1.1 branch (lp:pbzip2/1.1).
Official release including that fix will be 1.1.4 - supposed to come out on 2011-04-23.

David James (davidjames) wrote :

I've tested this patch with 1000 iterations on files that occasionally hung before the patch. I also untarred a few hundred other packages to check and I didn't reproduce any hangs. The patch looks good to me.

Changed in pbzip2:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers