pbzip2 1.1.5 hangs during compression
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
pbzip2 |
Fix Released
|
High
|
Yavor Nikolov |
Bug Description
pbzip2 1.1.5 occasionally hangs during compression. This hang occurs pretty rarely.
Here's the strace output from a hang:
[pid 6756] futex(0x60f4e4, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 6891] futex(0x961564, FUTEX_WAIT_PRIVATE, 39, NULL <unfinished ...>
[pid 6755] rt_sigtimedwait
[pid 6549] futex(0x7f4c9a8
File is completely written out, and stdin is closed.
Pid 6549 is in pthread_join, likely waiting on 6891 to exit.
Pid 6755 is the signal thread
Pid 6756 is the terminator thread
Pid 6891 is in pthread_cond_wait
I investigated the pbzip2 source code and found a situation where this hang can occur:
1. Consumer thread grabs fifo->mut lock, checks that the fifo is empty, and checks that producer is not done.
2. Producer thread sets done state and broadcasts FifoQueue->notEmpty
3. fileWriter thread finishes, broadcasts FifoQueue->notEmpty
4. consumer blocks waiting for FifoQueue->notEmpty signal
5. Main thread blocks in pthread_join, waiting for consumer thread to exit.
This hang normally occurs very rarely, because the above situation is quite unlikely to happen. So I attached a patch to make the hang more likely (just so you can reproduce the hang more easily.)
This is tracked on Chromium OS bug tracker at http://
Changed in pbzip2: | |
assignee: | nobody → Yavor Nikolov (yavor-nikolov) |
Changed in pbzip2: | |
status: | New → Confirmed |
Changed in pbzip2: | |
status: | Confirmed → Fix Committed |
milestone: | none → 1.1.6 |
Changed in pbzip2: | |
status: | Fix Committed → Fix Released |
Here's a patch that fixes the issue. This patch ensures that we always grab the associated mutex before broadcasting a signal. That in turn ensures that any threads that have checked the condition associated with the mutex have made it to the 'wait' stage and won't miss the signal.