Very long recovery time due to many temporary tree IT records
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Akiban Persistit |
Fix Released
|
High
|
Peter Beaman |
Bug Description
An Akiban Server instance running in an AMI is taking 20+ minutes to start. During this time the CPU is 100% busy doing this:
"main" prio=10 tid=0x00007f499
java.
at sun.nio.
at sun.nio.
at sun.nio.
at sun.nio.
at java.nio.
at java.lang.
at java.lang.
at java.lang.
at com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
at com.persistit.
- locked <0x000000035049
at com.persistit.
at com.akiban.
- locked <0x000000035048
at com.akiban.
at com.akiban.
at com.akiban.
at com.akiban.
- locked <0x000000035048
at com.akiban.
at com.akiban.
at com.akiban.
Upon further investigation we found that the journal file contains over 300Mbyte of IT (Identify Tree) records, almost entirely trees created in temporary volumes during sort operations. There are 8,406,774 IT records in the journal.
Temporary trees and volumes should not identified in the journal at all.
Further, this system is exhibiting the behavior predicted in https:/
Related branches
- Nathan Williams: Needs Fixing
- Peter Beaman: Needs Resubmitting
- Akiban Build User: Needs Fixing
-
Diff: 447 lines (+201/-29)12 files modifiedsrc/main/java/com/persistit/Buffer.java (+3/-1)
src/main/java/com/persistit/Exchange.java (+3/-1)
src/main/java/com/persistit/IntegrityCheck.java (+5/-4)
src/main/java/com/persistit/JournalManager.java (+13/-1)
src/main/java/com/persistit/Persistit.java (+2/-4)
src/main/java/com/persistit/RecoveryManager.java (+18/-6)
src/main/java/com/persistit/Transaction.java (+10/-7)
src/main/java/com/persistit/Tree.java (+11/-0)
src/main/java/com/persistit/Volume.java (+26/-1)
src/main/java/com/persistit/VolumeStructure.java (+6/-1)
src/test/java/com/persistit/Bug1018526Test.java (+104/-0)
src/test/java/com/persistit/Bug932097Test.java (+0/-3)
Changed in akiban-persistit: | |
assignee: | nobody → Peter Beaman (pbeaman) |
description: | updated |
Changed in akiban-persistit: | |
status: | Confirmed → In Progress |
Changed in akiban-persistit: | |
status: | In Progress → Fix Committed |
visibility: | private → public |
Changed in akiban-persistit: | |
milestone: | none → 3.1.2 |
Changed in akiban-persistit: | |
status: | Fix Committed → Fix Released |
The proposed fix incorporates logic to clean up existing databases. The first startup recovery on a site having this problem using the modified version of Persistit will delete all the temporary trees from the tree map. The next journal file rollover event after that will write a tree map without the temporary tree IT records, and subsequent to that they will not reappear.