Corruption in stress test mixture_txn_1

Bug #1010079 reported by Peter Beaman
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Akiban Persistit
Fix Released
Critical
Peter Beaman

Bug Description

Various symptoms found while running stress test mixture_txn_1, including CorruptVolumeException, CorruptValueException, and several different asserts. These all stem from new code introduced in lp:~pbeaman/akiban-persistit/fix_1006576_long_record_pruning. This bug is related to 1005206 and 1006576.

Examples:

[JOURNAL_FLUSHER] WARNING Journal flush operation took 2,541ms
FAILED WITH EXCEPTION
com.persistit.exception.CorruptValueException: MVV Value is corrupt at index: 8514
 at com.persistit.MVV.prune(MVV.java:477)
 at com.persistit.Exchange.storeInternal(Exchange.java:1484)
 at com.persistit.Exchange.store(Exchange.java:1288)
 at com.persistit.Exchange.store(Exchange.java:2530)
 at com.persistit.stress.Stress3txn.executeTest(Stress3txn.java:174)
 at com.persistit.test.AbstractTestRunnerItem.runIt(AbstractTestRunnerItem.java:156)
 at com.persistit.test.TestRunner$TestThread.run(TestRunner.java:361)

[Thread-13] ERROR BTree structure error Volume persistit(/tmp/persistit_tests/mixture_txn_1.plan/persistit) page 11916 invalid page type 31: should be 1
Exchange(Volume=/tmp/persistit_tests/mixture_txn_1.plan/persistit,Tree=shared,,Key=<{"stress6",10,531,"xxxxxxxxxxxxxxx"}>)
0: Buffer=<Page 11,916 in volume persistit(/tmp/persistit_tests/mixture_txn_1.plan/persistit) at index 3,263 timestamp=7,000,530 status=v type=LongRec>, keyGeneration=82679, bufferGeneration=132764, foundAt=<40:exact:depth=33:after>>
1: Buffer=<Page 38 in volume persistit(/tmp/persistit_tests/mixture_txn_1.plan/persistit) at index 105 timestamp=7,000,509 status=vdr1 type=Index1>, keyGeneration=82679, bufferGeneration=359, foundAt=<1596:depth=14:ebc=13:db=134:tail=2232>>
FAILED WITH EXCEPTION
com.persistit.exception.CorruptVolumeException: Volume persistit(/tmp/persistit_tests/mixture_txn_1.plan/persistit) page 11916 invalid page type 31: should be 1
 at com.persistit.Exchange.corrupt(Exchange.java:3880)
 at com.persistit.Exchange.checkPageType(Exchange.java:3726)
 at com.persistit.Exchange.searchLevel(Exchange.java:1216)
 at com.persistit.Exchange.searchTree(Exchange.java:1119)
 at com.persistit.Exchange.storeInternal(Exchange.java:1437)
 at com.persistit.Exchange.store(Exchange.java:1288)
 at com.persistit.Exchange.store(Exchange.java:2530)
 at com.persistit.stress.Stress6.executeTest(Stress6.java:123)
 at com.persistit.test.AbstractTestRunnerItem.runIt(AbstractTestRunnerItem.java:156)
 at com.persistit.test.TestRunner$TestThread.run(TestRunner.java:361)

Related branches

Peter Beaman (pbeaman)
visibility: private → public
description: updated
Revision history for this message
Peter Beaman (pbeaman) wrote :

Diagnosis:

Various bad symptoms stem from the same root cause. Exchange.LevelCache members refer to existing Buffers. Fast path verifies that Buffer referenced by a LevelCache element has not changed since the last time the thread used that Buffer by comparing generation numbers. If the Buffer can be claimed, has the same volume, page address and generation number, then Exchange assumes its content has not changed and simply uses it.

New code added to prune Long MVV values fails to update the generation accurately. The bug is subtle since code in the Buffer#pruneMvvValues method does indeed adjust the generation number appropriately. However the Buffer copy constructor used in this code path does not copy the generation number, and therefore the update attempted in pruneMvvValues is fruitless.

In addition, pruneLongMvvValues uses a Value object obtained from a ThreadLocal. However, this same Value is already in use by Exchange#storeInternal.

Revision history for this message
Peter Beaman (pbeaman) wrote :

The generation number analysis is incorrect. It appears the issue is failure to invalidate the FastIndex for the original page when keys are removed from the copy.

Peter Beaman (pbeaman)
Changed in akiban-persistit:
status: In Progress → Fix Released
Changed in akiban-persistit:
assignee: nobody → Peter Beaman (pbeaman)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.