Build fails: persistit-coverage hangs with deadlock

Bug #1043536 reported by Peter Beaman
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Akiban Persistit
Critical
Peter Beaman

Bug Description

Persistit r359

Persistit builds correctly, but the Clover coverage run fails consistently. I recorded this apparent deadlock on the EC2 machine:

"CLEANUP_MANAGER" prio=10 tid=0x00007f83b06e1000 nid=0x22f3 waiting on condition [0x00007f83b555f000]
   java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x00000000d822f398> (a java.util.concurrent.Semaphore$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
 at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
 at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
 at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
 at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
 at com.persistit.util.ThreadSequencer$EnabledSequencer.sequence(ThreadSequencer.java:693)
 at com.persistit.util.ThreadSequencer.sequence(ThreadSequencer.java:186)
 at com.persistit.JournalManager.lookupUpPageNode(JournalManager.java:1381)
 at com.persistit.JournalManager.readPageFromJournal(JournalManager.java:1303)
 at com.persistit.VolumeStorageV2.readPage(VolumeStorageV2.java:784)
 at com.persistit.Buffer.load(Buffer.java:582)
 at com.persistit.BufferPool.get(BufferPool.java:1455)
 at com.persistit.Exchange.searchLevel(Exchange.java:2025)
 at com.persistit.Exchange.searchTree(Exchange.java:1842)
 at com.persistit.Exchange.storeInternal(Exchange.java:2423)
 at com.persistit.Exchange.store(Exchange.java:2163)
 at com.persistit.Exchange.store(Exchange.java:4380)
 at com.persistit.VolumeStructure.storeTreeStatistics(VolumeStructure.java:457)
 at com.persistit.VolumeStructure.flushStatistics(VolumeStructure.java:696)
 at com.persistit.Persistit.cleanup(Persistit.java:2419)
 at com.persistit.CleanupManager.poll(CleanupManager.java:283)
 at com.persistit.CleanupManager.runTask(CleanupManager.java:133)
 at com.persistit.IOTaskRunnable.run(IOTaskRunnable.java:297)
 at java.lang.Thread.run(Thread.java:662)

"READ_THREAD" prio=10 tid=0x00007f83b02de800 nid=0x22c7 waiting on condition [0x00007f83aeae8000]
   java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x00000000d822f398> (a java.util.concurrent.Semaphore$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
 at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
 at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
 at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
 at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
 at com.persistit.util.ThreadSequencer$EnabledSequencer.sequence(ThreadSequencer.java:693)
 at com.persistit.util.ThreadSequencer.sequence(ThreadSequencer.java:186)
 at com.persistit.JournalManager.lookupUpPageNode(JournalManager.java:1381)
 at com.persistit.JournalManager.readPageFromJournal(JournalManager.java:1303)
 at com.persistit.VolumeStorageV2.readPage(VolumeStorageV2.java:784)
 at com.persistit.Buffer.load(Buffer.java:582)
 at com.persistit.BufferPool.get(BufferPool.java:1455)
 at com.persistit.Exchange.traverse(Exchange.java:3580)
 at com.persistit.Exchange.traverse(Exchange.java:3336)
 at com.persistit.Exchange.traverse(Exchange.java:3266)
 at com.persistit.Exchange.next(Exchange.java:4125)
 at com.persistit.JournalManagerTest$2.run(JournalManagerTest.java:1541)
 at com.persistit.unit.ConcurrentUtil$1.run(ConcurrentUtil.java:77)
 at java.lang.Thread.run(Thread.java:662)
 …

"main" prio=10 tid=0x0000000041953000 nid=0x1a0a waiting on condition [0x00007f83bbf61000]
   java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for <0x00000000d822f398> (a java.util.concurrent.Semaphore$NonfairSync)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
 at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
 at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
 at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
 at java.util.concurrent.Semaphore.acquire(Semaphore.java:286)
 at com.persistit.util.ThreadSequencer$EnabledSequencer.sequence(ThreadSequencer.java:693)
 at com.persistit.util.ThreadSequencer.sequence(ThreadSequencer.java:186)
 at com.persistit.JournalManager.lookupUpPageNode(JournalManager.java:1381)
 at com.persistit.JournalManager.readPageFromJournal(JournalManager.java:1303)
 at com.persistit.VolumeStorageV2.readPage(VolumeStorageV2.java:784)
 at com.persistit.Buffer.load(Buffer.java:582)
 at com.persistit.BufferPool.get(BufferPool.java:1455)
 at com.persistit.Exchange.searchLevel(Exchange.java:2025)
 at com.persistit.Exchange.searchTree(Exchange.java:1842)
 at com.persistit.Exchange.storeInternal(Exchange.java:2423)
 at com.persistit.Exchange.store(Exchange.java:2163)
 at com.persistit.Exchange.store(Exchange.java:4380)
 at com.persistit.SplitPolicyTest.__CLR3_0_1ewwe7owzt(SplitPolicyTest.java:564)
 at com.persistit.SplitPolicyTest.testPackBiasPacking(SplitPolicyTest.java:502)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

It appears that JournalManagerTest did shut down correctly and therefore left threads hanging on ThreadSequencer, and that the ThreadSequence was not disabled.

Related branches

Revision history for this message
Peter Beaman (pbeaman) wrote :

Marking Critical since this affects the build.

Changed in akiban-persistit:
assignee: nobody → Peter Beaman (pbeaman)
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
Peter Beaman (pbeaman) wrote :

Apparent cause:

In JournalManagerTest, this code sequence:

        enableSequencer(true);
        addSchedules(PAGE_MAP_READ_INVALIDATE_SCHEDULE);
        startAndJoinAssertSuccess(5000, thread1, thread2);
        disableSequencer();

is susceptible to an assertError in startAndJoinAssertSuccess. With clover instrumentation it appears possible for the threads to run too long and time out - I saw this in some earlier runs.

Fix is (a) to add a try/finally or move the disable to tearDown(), and (b) either lengthen the timeouts or otherwise deal with unexpectedly slow execution in clover.

Changed in akiban-persistit:
status: Confirmed → Fix Committed
milestone: none → 3.1.7
visibility: private → public
Peter Beaman (pbeaman)
Changed in akiban-persistit:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers