--innodb-release-locks-early=1 breaks InnoDB crash recovery

Bug #798213 reported by Philip Stoev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Won't Fix
High
Kristian Nielsen

Bug Description

As noted by knielsen, the group commit + SBR + release_locks_early test suffers sporadic failures. I was able to repeat against the latest mysql-5.3.

All the failures seem to have the following in common:
* the diff reports that the master has 2 or more rows more than the slave;
* the divering tables are always the ones that have no PK

Logs will be attached shortly.

Revision history for this message
Philip Stoev (pstoev-askmonty) wrote :

Datadir and logs. The rqg log is in rqg.log.failing. The files for the cloned slave are in clonedslave . The files in slave-data are from a normal slave that is not affected by xtrabackup+restore.

Changed in maria:
assignee: nobody → Kristian Nielsen (knielsen)
Changed in maria:
status: New → In Progress
milestone: none → 5.3
summary: - Sporadic failure in the rqg_rpl_sbr_xtrabackup test
+ --innodb-release-locks-early=1 breaks InnoDB crash recovery
Changed in maria:
status: In Progress → Confirmed
importance: Undecided → High
description: updated
Revision history for this message
Kristian Nielsen (knielsen) wrote :

I found the root cause of this.

The problem only occurs when --innodb-release-locks-early=1 as well as the
binlog is enabled.

Suppose transaction A modifies row R, then starts to commit. The transaction
is first prepared, then row locks are released just after prepare due to
--innodb-release-locks-early=1.

It is now possible for another transaction B to also modify R, start
committing, and getting prepared before transaction A has time to be written
into the binlog and committed. This leaves two transactions A and B in the
prepared state, both of which modified row R.

Suppose now that we crash at this point, then restart the server and initiate
crash recovery. We will find transaction A and B both in the prepared state
but not in the binlog, so they will need to be rolled back.

In this case it is possible for InnoDB to roll back the transactions in the
wrong order. If A is rolled back first, then B, R will end up with the data
from transactions A, which is what B saw in the row before it did its
modifications. This is wrong; it leaves transaction A commited with respect to
row R, and rolled back for all other rows, totally (and silently) breaking
transactional consistency.

It appears that rollback in InnoDB will happen in reverse order of transaction
start, so if transaction B started before transaction A we get this
corruption. I will attach a mysql-test-run test case that reliably shows this
failure.

Note that xtrabackup (and innobase hot backup) works by running the crash
recovery on copied data files, so it suffers from the same corruption on
restore (this is what triggers the original failure in the RQG test). So both
crash recovery and normal xtrabackup is affected.

I think to fix this problem, it is necessary to change the order in which
InnoDB rolls back transactions during crash recovery. First any non-prepared
transactions must be rolled back, then all transactions in XA prepared state
must be rolled back in reverse order of prepare (as defined by the order of
`prepare' records in the transaction log). This affects both
innobase_xa_recover() as well as xtrabackup --prepare, as these handle XA
prepared transactions differently. Note that the prepare records may occur so
long back in the transaction log that they are beyond the last checkpoint, or
even overwritten by newer log data due to the cyclic nature of the InnoDB
transaction log. So I am not sure how hard it will be to get the correct order
to rollback, it will require some knowledge of InnoDB internals to implement
this.

Alternatively, we could remove the --innodb-release-locks-early=1 feature
completely.

Revision history for this message
Kristian Nielsen (knielsen) wrote :
Revision history for this message
Kristian Nielsen (knielsen) wrote :
Revision history for this message
Kristian Nielsen (knielsen) wrote :
Revision history for this message
Kristian Nielsen (knielsen) wrote :

The attached test case (3 files in total) fails as follows in current MariaDB 5.3:

main.innodb_release_locks_early_recovery [ fail ]
        Test ended at 2011-06-22 10:26:18

CURRENT_TEST: main.innodb_release_locks_early_recovery
--- /home/knielsen/my/5.3/mariadb-5.3/mysql-test/r/innodb_release_locks_early_recovery.result 2011-06-22 10:26:00.000000000 +0200
+++ /home/knielsen/my/5.3/mariadb-5.3/mysql-test/r/innodb_release_locks_early_recovery.reject 2011-06-22 10:26:18.000000000 +0200
@@ -58,7 +58,7 @@
 Got one of the listed errors
 SELECT * FROM t1;
 a b
-1 Base0
+1 Update c4
 SELECT * FROM t2 ORDER by a;
 a
 DROP TABLE t1;

mysqltest: Result length mismatch

As explained above, we see here transaction c4 visible in table t1, even though it should have been rolled back (and is rolled back in table t2).

Revision history for this message
Kristian Nielsen (knielsen) wrote :

The --innodb-release-locks-early feature has been removed from MariaDB 5.3 because of a fundamental issue with InnoDB crash replication.

Changed in maria:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.