Akiban Persistit

Value returned from Accumulator#update() is non-transactional

Bug #1167056 reported by Nathan Williams on 2013-04-09

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Akiban Persistit	Fix Released	High	Peter Beaman	Akiban Persistit 3.3.0

Bug Description

Persistit r428

Note that the return value is not documented.

Test case below fails with:
java.lang.AssertionError: newValB expected:<1> but was:<2>

@Test
public void updateRetVal() throws PersistitException {
    Transaction txn = _persistit.getTransaction();
    Exchange ex = _persistit.getExchange(UnitTestProperties.VOLUME_NAME, "accum_bug", true);
    Accumulator accum = ex.getTree().getAccumulator(Type.SUM, 3);

    txn.begin();
    assertEquals("init live", 0, accum.getLiveValue());
    assertEquals("init snap", 0, accum.getSnapshotValue(txn));
    txn.commit();
    txn.end();

    txn.begin();
    long newValA = accum.update(1, txn);
    assertEquals("newValA", 1, newValA);
    assertEquals("snap a", 1, accum.getSnapshotValue(txn));
    txn.rollback();
    txn.end();

    txn.begin();
    assertEquals("snap b", 0, accum.getSnapshotValue(txn));
    long newValB = accum.update(1, txn);
    assertEquals("newValB", 1, newValB);
    assertEquals("snap c", 1, accum.getSnapshotValue(txn));
    txn.commit();
    txn.end();

    txn.begin();
    assertEquals("snap d", 1, accum.getSnapshotValue(txn));
    txn.commit();
    txn.end();
}

Looking at the code, it appears it is actually the contents of the live value. If that is "as intended", we at least need to document it. I would vote for remove the behavior, though, so that any live value get is explicitly an opt-in.

Related branches

lp:~pbeaman/akiban-persistit/fix-accumulator-javadoc

Merged into lp:akiban-persistit at revision 429

Nathan Williams: Needs Fixing on 2013-04-11

lp:~nwilliams/akiban-server/seq-accum-fix

Merged into lp:~akiban-technologies/akiban-server/trunk at revision 2619

Mike McMahon: Needs Information on 2013-04-11

Akiban Build User: Needs Fixing on 2013-04-11

Thomas Jones-Low: Approve on 2013-04-11

Nathan Williams (nwilliams) on 2013-04-09

information type:

Proprietary → Public

Revision history for this message

Peter Beaman (pbeaman) wrote on 2013-04-10:

Download full text (5.7 KiB)

Yes, it's interesting that the Javadoc in Accumulator is actually malformed. I don't know why the javadoc task doesn't kick out a warning.

So everything Nathan wrote is correct. This is an attempt to add to it.

The fundamental issue is that every generated auto-increment value needs to be unique for all time (subject to the maxValue wrapping behavior of sequences).

There are two ways to do this:

(a) Serialize all update and insert transactions that increment an auto-increment value. That way, any transaction that needs a new value is able to see all prior committed updates, and the newly incremented value it commits will be unique because no other transaction is running concurrently.

(b) Allocate a provisional ID value (for example, by incrementing an AtomicLong) and then record the transaction in such a way that this value is permanently retired and will never be generated again. As a corollary to this behavior, a transaction that allocates and ID and then rolls back will leave a "hole" in the sequence numbers. We believe this is a reasonable outcome preferable to running all auto-incrementing transactions serially.

Currently the Sequence code uses a Persistit SUM accumulator to maintain the count. The semantics of SUM are to provide a transactionally accurate count. So if three transactions concurrently increment a SUM accumulator, and then two commit and one aborts, the result will be 2.

What confuses the issue is that Accumulators offer two values: the "snapshot" value, which is consistent with the state of the database at the start of the transaction, and the "live" value, which is modified as updates are executed. The live value is not transactionally correct; it is an "estimate" of the value accurate only if ever transaction has committed. (I note that there's no attempt to adjust the live value even when an aborted transaction is pruned. We might want to change that.) The live value of a SUM accumulator is not maintained transactionally - in fact it explicitly diverges from any consistent snapshot view.

What further confuses the issue, as Nathan notes, is that the Accumulator#update operation always returns the live value, but is not documented. What the code today is actually doing is using the live value of a SUM accumulator, which appears to work just fine for awhile. To see a failure there must be one or more update transactions that increment the accumulator and then abort, leaving the live value larger than the snapshot value by N where N is the number of transactions that aborted. Everything continues to work correctly until the server is restarted; when that happens the live value is reset to the snapshot value that existed at the time of the shutdown, and therefore after restarting, the next auto-increment value obtained is smaller than one produced prior to shutdown - hence the duplicate key exceptions.

Yes, it's interesting that the Javadoc in Accumulator is actually malformed.  I don't know why the javadoc task doesn't kick out a warning.

So everything Nathan wrote is correct. This is an attempt to add to it.

The fundamental issue is that every generated auto-increment value needs to be unique for all time (subject to the maxValue wrapping behavior of sequences).

There are two ways to do this:

(a) Serialize all update and insert transactions that increment an auto-increment value.  That way, any transaction that needs a new value is able to see all prior committed updates, and the newly incremented value it commits will be unique because no other transaction is running concurrently.

(b) Allocate a provisional ID value (for example, by incrementing an AtomicLong) and then record the transaction in such a way that this value is permanently retired and will never be generated again. As a corollary to this behavior, a transaction that allocates and ID and then rolls back will leave a "hole" in the sequence numbers.  We believe this is a reasonable outcome preferable to running all auto-incrementing transactions serially.

Currently the Sequence code uses a Persistit SUM accumulator to maintain the count.  The semantics of SUM are to provide a transactionally accurate count.  So if three transactions concurrently increment a SUM accumulator, and then two commit and one aborts, the result will be 2.

What confuses the issue is that Accumulators offer two values: the "snapshot" value, which is consistent with the state of the database at the start of the transaction, and the "live" value, which is modified as updates are executed. The live value is not transactionally correct; it is an "estimate" of the value accurate only if ever transaction has committed. (I note that there's no attempt to adjust the live value even when an aborted transaction is pruned.  We might want to change that.) The live value of a SUM accumulator is not maintained transactionally - in fact it explicitly diverges from any consistent snapshot view.

What further confuses the issue, as Nathan notes,  is that the Accumulator#update operation always returns the live value, but is not documented. What the code today is actually doing is using the live value of a SUM accumulator, which appears to work just fine for awhile. To see a failure there must be one or more update transactions that increment the accumulator and then abort, leaving the live value larger than the snapshot value by N where N is the number of transactions that aborted.  Everything continues to work correctly until the server is restarted; when that happens the live value is reset to the snapshot value that existed at the time of the shutdown, and therefore after restarting, the next auto-increment value obtained is smaller than one produced prior to shutdown - hence the duplicate key exceptions.

The only way to use a SUM accumulator to generate unique ID values is to use its snapshot value.  But unfortunately, this will work only if all update transactions that perform auto-increment are constrained to execute serially.  That is, to take out a big lock (or force a w-w-conflict) to prevent concurrent executions.  The reason is Write Skew: two concurrent transactions read the same initial snapshot value S.  Both increment it, and once the two transactions commit the end result will be S+2.  But neither transaction can "see" the existence of the other transaction, and so each transaction presumably assigns the ID value S+1 which clearly results in non-unique keys.

The solution I would propose is to change the Sequence code to use a SEQ accumulator, which is designed specifically to handle this use case. Specifically, it implements the strategy described as (b) above.  A SEQ accumulator differs from a SUM accumulator in that the Delta record that records the operation in the journal contains the maximum live value observed by the transaction, not the value by which the accumulator is to be incremented.  Therefore upon restart after either a clean or dirty shutdown, the starting state of a SEQ accumulator's live value is the maximum of any value ever assigned to a transaction that actually committed. And therefore the live value of a SEQ accumulator provides the semantic that it is unique for all time.

However, there are a couple of problems with using a SEQ accumulator:

(a) We have no way of rolling the value back; it always, for all time, monotonically increases.

(b) Existing databases have stored SUM accumulators, and we will need to do work to convert them.

Currently the Sequence code path in Sequence#nextValue  wraps by resetting the accumulator.  It does so by adding the difference between the desired value and the current live value. For all the reasons noted above, this will result in incorrect values.  I think the easiest fix is to use a SEQ accumulator that always increments by 1 (not the increment value specified for the sequence), and then to compute the sequence value by multiplying and mod-ding appropriately.

Fixing up existing databases, including the try and db1 databases, will take some effort.  Simply changing the AccumulatorAdaptor#AccumInfo enum is not sufficient because the stored SUM accumulator will conflict with the SEQ accumulator specified in code causing an IllegalStateException.  I think our patch will need to delete the Sequence trees, recompute the correct values, and recreate the accumulators with their correct values.

With respect to the Accumulator#update method, the minimum Persistit change required is to document carefully what the return value means.  We actually need the return value from SEQ accumulator, but as Nathan points out it is not a very useful result from an update to SUM accumulator.  I would entertain proposal on how to adjust the API.

Revision history for this message

Nathan Williams (nwilliams) wrote on 2013-04-10:

Thanks for the excellent summary Peter.

What does incrementing by 1 and mul/mod buy us? More runway before needing a reset?

The upgrade procedure you proposed sounds about right: drop, scan for max, and recreate. Care will need to be taken with reusing the index (i.e. a new one and deprecate the old is probably simpler), but that's just details.

As far as the API, I would propose an "opt-in" for the live value. Something like update() return void and add a new updateAndGetLive() that is documented as such.

Revision history for this message

Thomas Jones-Low (tjoneslo) wrote on 2013-04-10:

I distinctly recall having the discussion of the use of the SEQ vs SUM accumulators for the sequences. I had assumed that both accumulators were the same transactionally.

>
> (a) We have no way of rolling the value back; it always, for all time, monotonically increases.
>

This is the reason we chose SUM over SEQ. We couldn't implement cycle nor increment by -1 as SEQ accumulators. I felt it was better to have the accumulator hold the true value of the sequence, rather than having a two step process to derive the value.

if, in order to get the transactional support we require for the underlying accumulator, sequences need to use SEQ accumulators the process to derive the true sequence value will need to multiply the SEQ accumulator value by the increment and mod by the cycle value.

Nathan Williams (nwilliams) on 2013-04-10

Changed in akiban-persistit:
status:	New → Fix Committed
importance:	Undecided → High
assignee:	nobody → Peter Beaman (pbeaman)
milestone:	none → 3.2.8

Nathan Williams (nwilliams) on 2013-04-11

Changed in akiban-server:
status:	New → Fix Committed
importance:	Undecided → High
assignee:	nobody → Nathan Williams (nwilliams)
milestone:	none → 1.6.1

Revision history for this message

Peter Beaman (pbeaman) wrote on 2013-04-11:

The outcome:

Persistit Version 3.2.9 has a modified API for Accumulators. The Javadoc has been reworked, hopefully to clarify everything.

You now get an accumulator of a specific type with code such as this:

SeqAccumulator sa = tree.getSeqAccumulator(index);

You update the accumulator with a method that is specific to that type; for example, SeqAccumulator#allocate() is the method used to allocate a new sequence number, while SumAccumulator#add(long) increments to a SumAccumulator. Only SeqAccumulator#allocate() returns a value - the other updating methods are are of return type void. This is because only in the case of the SeqAccumulator#allocate() method is the returned value useful.

We think these changes will make the specific use cases more natural to use and will help prevent confusion and error.

Changed in akiban-persistit:
status:	Fix Committed → Fix Released

Nathan Williams (nwilliams) on 2013-04-11

Changed in akiban-server:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.