Comment 3 for bug 1126297

Revision history for this message
Nathan Williams (nwilliams) wrote :

Eureka. This is trivially explainable and not due to any race.

The code running on the customer machine had a worker thread that could, potentially, exit without closing the transaction (no commit, rollback, or end). That leads to an error in the CleanupManager (see comment #2) but not before the manager marks that status as notified. Since it got notified, the next cleanup of the bucket will put it onto the free list in a locked state. This is how we wind up with a locked status on the free list, making that bucket essentially unusable.

This quick test code demonstrates all three aspects:
public static class EmptyTxnThread extends Thread {
        private final Persistit persistit;
        private final boolean doCommit;
        public Throwable error;

        public EmptyTxnThread(Persistit persistit, boolean doCommit) {
            this.persistit = persistit;
            this.doCommit = doCommit;
        }

        @Override
        public void run() {
            Transaction txn = persistit.getTransaction();
            try {
                txn.begin();
                if(doCommit) {
                    txn.commit();
                    txn.end();
                } else {
                    System.out.println("Abandoning thread: " + Thread.currentThread().getName());
                }
            } catch(Throwable t) {
                error = t;
            }
        }
    }

    @Test
    public void abandonedTest() throws Throwable {
        for(int i = 0; ; ++i) {
            EmptyTxnThread thread = new EmptyTxnThread(_persistit, i > 0);
            thread.start();
            while(thread.isAlive()) {
                Thread.sleep(10);
            }
            if(thread.error != null) {
                throw thread.error;
            }
        }
    }

And output from running it locally:
Abandoning thread: Thread-2
[CLEANUP_MANAGER] ERROR java.lang.IllegalMonitorStateException
     at java.util.concurrent.locks.ReentrantLock$Sync.tryRelease(ReentrantLock.java:155)
     at java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1260)
     at java.util.concurrent.locks.ReentrantLock.unlock(ReentrantLock.java:460)
     at com.persistit.TransactionStatus.wwUnlock(TransactionStatus.java:378)
     at com.persistit.TransactionIndexBucket.notifyCompleted(TransactionIndexBucket.java:263)
     at com.persistit.TransactionIndex.notifyCompleted(TransactionIndex.java:614)
     at com.persistit.Transaction.rollback(Transaction.java:726)
     at com.persistit.Transaction.close(Transaction.java:511)
     at com.persistit.Persistit.cleanup(Persistit.java:1512)
     at com.persistit.CleanupManager.poll(CleanupManager.java:156)
     at com.persistit.CleanupManager.runTask(CleanupManager.java:87)
     at com.persistit.IOTaskRunnable.run(IOTaskRunnable.java:143)
     at java.lang.Thread.run(Thread.java:722)

java.lang.AssertionError: status:<<ts=115 tc=ABORTED mvv=0>> wwLock:<java.util.concurrent.locks.ReentrantLock@51037076[Locked by thread Thread-2]> curThread:<Thread-240>
 at com.persistit.TransactionIndexBucket.allocateTransactionStatus(TransactionIndexBucket.java:180)
 at com.persistit.TransactionIndex.registerTransaction(TransactionIndex.java:560)
 at com.persistit.TransactionIndex.registerTransaction(TransactionIndex.java:527)
 at com.persistit.Transaction.begin(Transaction.java:596)
 at com.persistit.unit.ExchangeTest$EmptyTxnThread.run(ExchangeTest.java:427)

The only other modification made was to change the assert in allocateTransactionStatus() to print the info from the wwLock to prove the theory.