Comment 2 for bug 798213

Revision history for this message
Kristian Nielsen (knielsen) wrote :

I found the root cause of this.

The problem only occurs when --innodb-release-locks-early=1 as well as the
binlog is enabled.

Suppose transaction A modifies row R, then starts to commit. The transaction
is first prepared, then row locks are released just after prepare due to
--innodb-release-locks-early=1.

It is now possible for another transaction B to also modify R, start
committing, and getting prepared before transaction A has time to be written
into the binlog and committed. This leaves two transactions A and B in the
prepared state, both of which modified row R.

Suppose now that we crash at this point, then restart the server and initiate
crash recovery. We will find transaction A and B both in the prepared state
but not in the binlog, so they will need to be rolled back.

In this case it is possible for InnoDB to roll back the transactions in the
wrong order. If A is rolled back first, then B, R will end up with the data
from transactions A, which is what B saw in the row before it did its
modifications. This is wrong; it leaves transaction A commited with respect to
row R, and rolled back for all other rows, totally (and silently) breaking
transactional consistency.

It appears that rollback in InnoDB will happen in reverse order of transaction
start, so if transaction B started before transaction A we get this
corruption. I will attach a mysql-test-run test case that reliably shows this
failure.

Note that xtrabackup (and innobase hot backup) works by running the crash
recovery on copied data files, so it suffers from the same corruption on
restore (this is what triggers the original failure in the RQG test). So both
crash recovery and normal xtrabackup is affected.

I think to fix this problem, it is necessary to change the order in which
InnoDB rolls back transactions during crash recovery. First any non-prepared
transactions must be rolled back, then all transactions in XA prepared state
must be rolled back in reverse order of prepare (as defined by the order of
`prepare' records in the transaction log). This affects both
innobase_xa_recover() as well as xtrabackup --prepare, as these handle XA
prepared transactions differently. Note that the prepare records may occur so
long back in the transaction log that they are beyond the last checkpoint, or
even overwritten by newer log data due to the cyclic nature of the InnoDB
transaction log. So I am not sure how hard it will be to get the correct order
to rollback, it will require some knowledge of InnoDB internals to implement
this.

Alternatively, we could remove the --innodb-release-locks-early=1 feature
completely.