Corruption in stress test mixture_txn_1
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Akiban Persistit |
Fix Released
|
Critical
|
Peter Beaman |
Bug Description
Various symptoms found while running stress test mixture_txn_1, including CorruptVolumeEx
Examples:
[JOURNAL_FLUSHER] WARNING Journal flush operation took 2,541ms
FAILED WITH EXCEPTION
com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
[Thread-13] ERROR BTree structure error Volume persistit(
Exchange(
0: Buffer=<Page 11,916 in volume persistit(
1: Buffer=<Page 38 in volume persistit(
FAILED WITH EXCEPTION
com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
Related branches
- Akiban Build User: Needs Fixing
- Nathan Williams: Approve
-
Diff: 299 lines (+125/-30)5 files modifiedsrc/main/java/com/persistit/Buffer.java (+7/-20)
src/main/java/com/persistit/Exchange.java (+4/-4)
src/main/java/com/persistit/IntegrityCheck.java (+1/-1)
src/test/java/com/persistit/Bug1010079Test.java (+108/-0)
src/test/java/com/persistit/MVCCPruneBufferTest.java (+5/-5)
visibility: | private → public |
description: | updated |
Changed in akiban-persistit: | |
status: | In Progress → Fix Released |
Changed in akiban-persistit: | |
assignee: | nobody → Peter Beaman (pbeaman) |
Diagnosis:
Various bad symptoms stem from the same root cause. Exchange.LevelCache members refer to existing Buffers. Fast path verifies that Buffer referenced by a LevelCache element has not changed since the last time the thread used that Buffer by comparing generation numbers. If the Buffer can be claimed, has the same volume, page address and generation number, then Exchange assumes its content has not changed and simply uses it.
New code added to prune Long MVV values fails to update the generation accurately. The bug is subtle since code in the Buffer# pruneMvvValues method does indeed adjust the generation number appropriately. However the Buffer copy constructor used in this code path does not copy the generation number, and therefore the update attempted in pruneMvvValues is fruitless.
In addition, pruneLongMvvValues uses a Value object obtained from a ThreadLocal. However, this same Value is already in use by Exchange# storeInternal.