Stress test caused inifinite loop in repack()

Bug #1005206 reported by Peter Beaman
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Akiban Persistit
Critical
Peter Beaman

Bug Description

During nightly run of persistit-stress-tests the mixture_txn_1 plan induced a failure in which two threads were stuck looping in Buffer#repack(). Assets attached

Related branches

Revision history for this message
Peter Beaman (pbeaman) wrote :
Revision history for this message
Peter Beaman (pbeaman) wrote :

The thread dump.

Revision history for this message
Nathan Williams (nwilliams) wrote :

We should probably update the run_stress_tests.py script to add jmx and debug flags.

Peter Beaman (pbeaman)
Changed in akiban-persistit:
importance: Undecided → Critical
assignee: nobody → Peter Beaman (pbeaman)
Peter Beaman (pbeaman)
Changed in akiban-persistit:
milestone: none → 3.1.2
status: New → Fix Committed
Revision history for this message
Peter Beaman (pbeaman) wrote :

This bug was introduced by lp:~pbeaman/akiban-persistit/fix_1003578_Out-of-order-PageNode-entries. In that branch we added code to prune Long Record values using a separate copy of the Buffer so that timestamp ordering can be preserved while pruning long MVVs. The Once pruned, the copy is folded back into the primary buffer atomically so that the timestamp protocol can be observed.

Unfortunately the process of copying back modified buffer neglected to include the _alloc and _slack fields which are changed in the copy by the pruning process.

This branch:

(a) fixes the bug by copying these fields
(b) adds a new test to demonstrate the bug and its fix
(c) adds a new call to fatal() in the repack() method to eliminate the infinite loop should there ever be another cause.

visibility: private → public
Changed in akiban-persistit:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers