RBR Replication with concurrent XA in READ-COMMITTED takes supremum pseudo-records and breaks replication

Bug #1735555 reported by Kenny Gryp
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.5
New
High
Vlad Lesin
5.6
New
High
Vlad Lesin
5.7
In Progress
High
Vlad Lesin

Bug Description

The problem that we are is facing is similar to what is described in https://bugs.mysql.com/bug.php?id=85447, which has been fixed in 5.7.18 (and the test case no longer fails).

This is what happens:

There are 2 XA Transactions that deadlock which causes replication to fail with the error:

```
       Last_Error: Slave SQL thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.
```

There are several ongoing XA Transactions at the same time.

When we look at the locks that are being held, we see:

```
mysql> select * from INNODB_LOCKS;
+---------------------------+-------------+-----------+-----------+---------------------+------------+------------+-----------+----------+------------------------+
| lock_id | lock_trx_id | lock_mode | lock_type | lock_table | lock_index | lock_space | lock_page | lock_rec | lock_data |
+---------------------------+-------------+-----------+-----------+---------------------+------------+------------+-----------+----------+------------------------+
| 28024945201:8439:938215:1 | 28024945201 | X | RECORD | `databse`.`tablezz` | PRIMARY | 8439 | 938215 | 1 | supremum pseudo-record |
| 28024945046:8439:938215:1 | 28024945046 | S | RECORD | `databse`.`tablezz` | PRIMARY | 8439 | 938215 | 1 | supremum pseudo-record |
+---------------------------+-------------+-----------+-----------+---------------------+------------+------------+-----------+----------+------------------------+
2 rows in set, 1 warning (0.00 sec)

mysql> select * from INNODB_LOCK_WAITS;
+-------------------+---------------------------+-----------------+---------------------------+
| requesting_trx_id | requested_lock_id | blocking_trx_id | blocking_lock_id |
+-------------------+---------------------------+-----------------+---------------------------+
| 28024945201 | 28024945201:8439:938215:1 | 28024945046 | 28024945046:8439:938215:1 |
+-------------------+---------------------------+-----------------+---------------------------+
1 row in set, 1 warning (0.00 sec)
```

There is a S lock on a supremum pseudo-record in one XA transaction which 'conflicts' with the X lock another XA transaction wants to take.

How does this happen?

What happens on the original master is as follows...

Requirements:

- We have the following rows: ...,80000,100000,...
- row 80000 is on for example innodb page 99 and row 100000 is on page 100. There is still some space available to add a row to page 99.
- make sure the slave is using log_slave_updates & binlog_format=mixed

You can simulate it like this:

```
CREATE TABLE t1 (t1_pk DECIMAL(20,0) PRIMARY KEY , t1_blob BLOB) ENGINE=InnoDB;

INSERT INTO t1 VALUES (10000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (20000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (30000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (40000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (50000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (60000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (70000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (80000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (90000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (100000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (110000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (120000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (130000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (140000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (150000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (160000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (170000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (180000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (190000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (200000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (210000, REPEAT("a", 2165));

DELETE FROM t1 WHERE t1_pk IN (90000, 80000);

CREATE TABLE t2 (t2_pk INT PRIMARY KEY, t1_pk DECIMAL(20,0),
  FOREIGN KEY (t1_pk) REFERENCES t1 (t1_pk)) ENGINE=InnoDB;

--connect(con2,localhost,root)
XA START '2';
INSERT INTO t1 VALUES (85000, NULL);

#-- We are inserting a value between 80000,100000

--connection master
XA START '1';
INSERT INTO t2 VALUES (1, 100000);

#-- This causes an S lock on 100000

XA END '1';
XA PREPARE '1';

--connection con2
XA END '2';
XA PREPARE '2';

--connection con2
XA COMMIT '2';

--connection master
XA COMMIT '1';
```

Now depending on binlog_format and tx_isolation, you will get different results:

binlog_format=MIXED, tx_isolation=REPEATABLE READ: the events will be logged in SBR and replication will break. You will have supremum pseudo-records S Locks on both master and slave
binlog_format=MIXED, tx_isolation=READ COMMITTED : the events will be logged in RBR and replication will not break. There will be no supremum pseudo-record S locks
binlog_format=ROW , tx_isolation=REPEATABLE READ: The events will be logged in RBR and replication will not break. You will have supremum pseudo-records S locks on the master, but not on the slave as RBR binlog events are in READ-COMMITTED tx isolation mode
binlog_format=ROW , tx_isolation=READ COMMITTED : The events will be logged in RBR and replication will not break. You will have supremum pseudo-records S locks on the master, but not on the slave

All this is to be expected. (This basically means always use RBR with XA)

I have tried everything to be able to reproduce this issue the customer is facing and am not able to reproduce this yet
The customer is using binlog_format=MIXED with tx_isolation=READ COMMITTED, the events are RBR, so that test case above does not fail, but the workload of the customer which looks similar does fail.

You can see with verbose locks in `SHOW ENGINE INNODB STATUS` that there is an S lock on a row in page 938216, and then an supremum S lock on page 938215 :

RECORD LOCKS space id 8439 page no 938216 n bits 80 index PRIMARY of table `databse`.`tablezz` trx id 28024945046 lock mode S locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 45; compact format; info bits 0
 0: len 13; hex 8000000000000001561a65f747; asc V e G;;
 ... REDACTED ...

RECORD LOCKS space id 8439 page no 938215 n bits 112 index PRIMARY of table `databse`.`tablezz` trx id 28024945046 lock mode S
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;
```

It must be that somehow replication is putting an S lock on a supremum record, even though the TRX is in READ-COMMITTED. I do not yet know when this happens.

Kenny Gryp (gryp)
summary: RBR Replication with concurrent XA in READ-COMMITTED takes supremum
- pseudo-records
+ pseudo-records and breaks replication
Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

This is not the complete answer, just a note MySQL documentation says:

"If you use READ COMMITTED, you must use row-based binary logging. "

Vlad Lesin (vlad-lesin)
Changed in percona-server:
assignee: nobody → Vlad Lesin (vlad-lesin)
Revision history for this message
Vlad Lesin (vlad-lesin) wrote :
Download full text (25.1 KiB)

I am working with 5.7.19-17.

I modified the code a little bit with the following patch:

==============patch==============================
--- a/storage/innobase/lock/lock0lock.cc
+++ b/storage/innobase/lock/lock0lock.cc
@@ -51,6 +51,27 @@ Created 5/7/1996 Heikki Tuuri

 #include <set>

+extern "C" LEX_CSTRING thd_query_unsafe(MYSQL_THD thd);
+
+static void inline print_info_if_supremum(
+ const buf_block_t* block,
+ ulint heap_no,
+ ulint mode) {
+
+ const char *query = current_thd ? thd_query_unsafe(current_thd).str : "";
+
+ if (!query)
+ query = "";
+
+ if (heap_no == PAGE_HEAP_NO_SUPREMUM) {
+ ib::info() << ">>>>>>>> supremum is locked for page "
+ << block->page.id.page_no()
+ << ", mode " << mode
+ << ", query: " << query;
+ }
+
+}
+
 /* Flag to enable/disable deadlock detector. */
 my_bool innobase_deadlock_detect = TRUE;

@@ -2058,6 +2079,8 @@ lock_rec_lock(
              || mode - (LOCK_MODE_MASK & mode) == 0);
        ut_ad(dict_index_is_clust(index) || !dict_index_is_online_ddl(index));

+ print_info_if_supremum(block, heap_no, mode);
+
        /* We try a simplified and faster subroutine for the most
        common cases */
        switch (lock_rec_lock_fast(impl, mode, block, heap_no, index, thr)) {

=================patch=============================

This patch outputs some info to error log on supremum page lock.

Then I modified the above test case, just added header, footer and the ability to change isolation level:

====================test=========================
--source include/master-slave.inc
--source include/have_innodb.inc
--source include/have_binlog_format_mixed.inc

#--let $isolation_level=REPEATABLE READ
--let $isolation_level=READ COMMITTED

--connection slave

--eval SET TRANSACTION ISOLATION LEVEL $isolation_level;

--connection master

--eval SET TRANSACTION ISOLATION LEVEL $isolation_level;

CREATE TABLE t1 (t1_pk DECIMAL(20,0) PRIMARY KEY , t1_blob BLOB) ENGINE=InnoDB;

INSERT INTO t1 VALUES (10000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (20000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (30000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (40000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (50000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (60000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (70000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (80000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (90000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (100000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (110000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (120000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (130000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (140000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (150000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (160000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (170000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (180000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (190000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (200000, REPEAT("a", 2165));
INSERT INTO t1 VALUES (210000, REPEAT("a", 2165));

DELETE FROM t1 WHERE t1_pk IN (90000, 80000);

CREATE TABLE t2 (t2_pk INT PRIM...

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

Forgot to mention that the bug can be repeated on 5.7.19-17, but I can't repeat it on the latest 5.7 branch. But this does not mean it was fixed because our test results are unpredictable.

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

The test stops failing (at least for --repeat=100) on the following commit:

commit afb6d4b56f8e57b9a7f72022e912b401e570124b
Author: Laurynas Biveinis <email address hidden>
Date: Thu Aug 31 11:48:44 2017 +0300

  Fix bug 1711781 (Redundant GTID unsafe mark for CREATE/DROP TEMPORARY TABLE in RBR/MBR)

  After fixing bug 1668602, bug 1539504, and bug 1313901, CREATE
  TEMPORARY TABLE is only logged under statement binary log mode, and
  DROP TEMPORARY TABLE is only logged if the corresponding CREATE
  TEMPORARY TABLE has been logged. However, a corresponding
  enforce_gtid_consistency check in THD::is_ddl_gtid_compatible has not
  been relaxed accordingly. This resulted that CREATE/DROP TEMPORARY
  TABLE statements were forbidden incorrectly in transactional contexts,
  including function and trigger calls, even when they required no
  binary logging at all.

  Fix by keeping only the CREATE TEMPORARY TABLE check in
  THD::is_ddl_gtid_compatible and lifting its restriction for row/mixed
  binary log modes. For DROP TEMPORARY TABLE, since its check requires
  knowing the binlog format at the corresponding CREATE TEMPORARY TABLE,
  move it to mysql_rm_table.

  Convert binlog.binlog_enforce_gtid_consistency test to an include file
  that is shared between two new binlog format-dependent tests
  binlog_stm_enforce_gtid_consistency and
  binlog_row_mix_enforce_gtid_consistency.

Revision history for this message
Kenny Gryp (gryp) wrote :

The customers problem (XA Replication deadlock) can now be reproduced both on 5.7.19 as well as 5.7.20

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

There is some progress in understanding the case.

Here https://github.com/vlad-lesin/percona-server/tree/lp-1735555-xa-transaction-lock-5.7.19-logging-and-stable-test is the branch with additional logging and some facilities for debugging. It's based on 5.7.19 and all the following analysis is for 5.7.19 too. There is also stable mtr test suite which uses DEBUG_SYNC and DEBUG system variables. The code is quite dirty, so don't pay attention on cosmetics, it's just for debugging.

The first, the bug description is the same as I see in the error log after I added some diagnostics.

What is happening:

1) When slave thread executes "INSERT INTO t2 VALUES (1, 100000)" the supremum of the page which contains records if t1 for pk values 40000-90000 is S-locked. The lock must be released on commit, but commit for this transaction has not yet executed, so the supremum stays S-locked.

2) After some binlog events "INSERT INTO t1 VALUES (85000, NULL)" is executed and the supremum is tried to be X-locked. But it can't be X-locked because it has been already S-locked on step 1. As there was not commit for the transaction in (1), the supremum stays S-locked and after timeout the current transaction tries to X-lock the supremum again with the same result.

That's why slave thread is stopped.

The question is why the test is unstable. The answer is "because of purge". When some records are deleted from innodb page, they are marked as "deleted", but they still consume the space inside the page. "Purge" process deallocates this space. So, what's going on when the test passes successfully?

1) When "DELETE FROM t1 WHERE t1_pk IN (90000, 80000)" is executed the correspondent events inside of the page are marked as "deleted".

2) When slave thread executes "INSERT INTO t2 VALUES (1, 100000)" the supremum of the page which contains records of t1 for pk values 40000-90000 is S-locked.

3) After some binlog events "INSERT INTO t1 VALUES (85000, NULL)" is executed. As there was not "purge" on the page, there is no enough space in the page for the new record, the record is inserted in another page and the supremum of another space is X-locked successfully for this operation.

What was done to make the test stable(for 5.7.19)?

1) Wait until "DELETE FROM t1 WHERE t1_pk IN (90000, 80000)" is committed and deleted records are unlocked.

2) Wait until "purge" thread frees space from deleted records inside of the page.

3) Continue the test execution.

Despite my test(mysql-test/suite/rpl/t/lp-1735555-sync-points.test in the above branch) is stable for 5.7.19, I can't repeat the bug on 5.7.20.

Vlad Lesin (vlad-lesin)
Changed in percona-server:
status: New → In Progress
importance: Undecided → High
Revision history for this message
Vlad Lesin (vlad-lesin) wrote :
Download full text (13.1 KiB)

This is the answer on the question why I can't repeat the bug on 5.7.20 and why it can be repeated on the customer's environment with 5.7.20.

For this analysis I use the following code which contains my additional diagnostics and some variants of the test:

https://github.com/vlad-lesin/percona-server/tree/lp-1735555-xa-transaction-lock-5.7.19-logging-and-stable-test
https://github.com/vlad-lesin/percona-server/tree/lp-1735555-xa-transaction-lock-5.7.20-logging-and-stable-test

1) Let's look at the backtrace where S-lock is set on 5.7.19 slave:

==================bt 1.1==============================
#1 0x0000000001ae40bb in lock_rec_lock (impl=false, mode=2, block=0x7fffea66ff60, heap_no=1, index=0x7fff9c017610, thr=0x7fff9c02b300)
    at ./storage/innobase/lock/lock0lock.cc:2085
#2 0x0000000001aee27d in lock_clust_rec_read_check_and_lock (flags=0, block=0x7fffea66ff60, rec=0x7fffeac58070 "supremumu\210", index=0x7fff9c017610,
    offsets=0x7ffff10fcd30, mode=LOCK_S, gap_mode=0, thr=0x7fff9c02b300) at ./storage/innobase/lock/lock0lock.cc:6321
#3 0x0000000001b99e22 in row_ins_set_shared_rec_lock (type=0, block=0x7fffea66ff60, rec=0x7fffeac58070 "supremumu\210", index=0x7fff9c017610, offsets=0x7ffff10fcd30,
    thr=0x7fff9c02b300) at ./storage/innobase/row/row0ins.cc:1498
#4 0x0000000001b9a55d in row_ins_check_foreign_constraint (check_ref=1, foreign=0x7fff9c034e00, table=0x7fff9c02da90, entry=0x7fff9c02bba8, thr=0x7fff9c02b300)
    at ./storage/innobase/row/row0ins.cc:1725
#5 0x0000000001b9ac49 in row_ins_check_foreign_constraints (table=0x7fff9c02da90, index=0x7fff9c0350c0, entry=0x7fff9c02bba8, thr=0x7fff9c02b300)
    at ./storage/innobase/row/row0ins.cc:1964
#6 0x0000000001b9e7ac in row_ins_sec_index_entry (index=0x7fff9c0350c0, entry=0x7fff9c02bba8, thr=0x7fff9c02b300, dup_chk_only=false)
    at ./storage/innobase/row/row0ins.cc:3400
#7 0x0000000001b9ea7b in row_ins_index_entry (index=0x7fff9c0350c0, entry=0x7fff9c02bba8, thr=0x7fff9c02b300)
    at ./storage/innobase/row/row0ins.cc:3477
#8 0x0000000001b9efd5 in row_ins_index_entry_step (node=0x7fff9c02b088, thr=0x7fff9c02b300)
    at ./storage/innobase/row/row0ins.cc:3625
#9 0x0000000001b9f379 in row_ins (node=0x7fff9c02b088, thr=0x7fff9c02b300) at ./storage/innobase/row/row0ins.cc:3767
#10 0x0000000001b9f98d in row_ins_step (thr=0x7fff9c02b300) at ./storage/innobase/row/row0ins.cc:3952
#11 0x0000000001bc0895 in row_insert_for_mysql_using_ins_graph (mysql_rec=0x7fff9c0334b0 "\375\001", prebuilt=0x7fff9c02aae0)
    at ./storage/innobase/row/row0mysql.cc:2278
#12 0x0000000001bc0e8c in row_insert_for_mysql (mysql_rec=0x7fff9c0334b0 "\375\001", prebuilt=0x7fff9c02aae0)
    at ./storage/innobase/row/row0mysql.cc:2402
#13 0x0000000001a546da in ha_innobase::write_row (this=0x7fff9c02ee30, record=0x7fff9c0334b0 "\375\001")
    at ./storage/innobase/handler/ha_innodb.cc:8278
#14 0x0000000000fd7b48 in handler::ha_write_row (this=0x7fff9c02ee30, buf=0x7fff9c0334b0 "\375\001")
    at ./sql/handler.cc:8434
#15 0x000000000188a1af in write_record (thd=0x7fff9c000950, table=0x7fff9c02e470, info=0x7ffff10fdfb0, update=0x7ffff10fe030)
    at ./sql/sql_insert.cc:1875
#16 0x0000000001886ff9 in Sql...

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :
Download full text (13.0 KiB)

What I see on the customer's host does not correspond to the bug
description. The general difference is suggestion about the cause of
S-lock is wrong.

I used the following branch for debugging:
https://github.com/vlad-lesin/percona-server/tree/lp-1735555-xa-transaction-lock-5.7.20-logging-and-stable-test-cust

1) Here is the information from PFS about the locks:
mysql> select * from information_schema.INNODB_LOCKS;
+----------------------------+-------------+-----------+-----------+---------------------+------------+------------+-----------+----------+------------------------+
| lock_id | lock_trx_id | lock_mode | lock_type |
lock_table | lock_index | lock_space | lock_page | lock_rec |
lock_data |
+----------------------------+-------------+-----------+-----------+---------------------+------------+------------+-----------+----------+------------------------+
| 28911602068:8421:1056018:1 | 28911602068 | X | RECORD |
`XXX`.`YYY` | PRIMARY | 8421 | 1056018 | 1 |
supremum pseudo-record |
| 28911602059:8421:1056018:1 | 28911602059 | S | RECORD |
`XXX`.`YYY` | PRIMARY | 8421 | 1056018 | 1 |
supremum pseudo-record |
+----------------------------+-------------+-----------+-----------+---------------------+------------+------------+-----------+----------+------------------------+
2 rows in set, 1 warning (0.01 sec)

Pay attention to transaction id's.
28911602068 - for S-lock
28911602059 - for X-lock

2) The following is the backtrace of the S-lock:

=========bt 2.1===============
Breakpoint 1, RecLock::init (this=this@entry=0x7ff49c058af0,
page=<optimized out>) at ./storage/innobase/include/lock0priv.h:863
863 ib::info() << "^^^^^^^^^^^^^^^^^^SET BREAKPOINT
HERE!";
Missing separate debuginfos, use: debuginfo-install
glibc-2.17-196.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64
krb5-libs-1.15.1-8.el7.x86_64 libaio-0.3.109-13.el7.x86_64
libcom_err-1.42.9-10.el7.x86_64 libgcc-4.8.5-16.el7_4.1.x86_64
libselinux-2.5-11.el7.x86_64 libstdc++-4.8.5-16.el7_4.1.x86_64
nss-softokn-freebl-3.28.3-8.el7_4.x86_64
openssl-libs-1.0.2k-8.el7.x86_64 pcre-8.32-17.el7.x86_64
zlib-1.2.7-17.el7.x86_64
(gdb) p m_trx
$1 = (trx_t *) 0x0
(gdb) bt
#0 RecLock::init (this=this@entry=0x7ff49c058af0, page=<optimized out>)
at ./storage/innobase/include/lock0priv.h:863
#1 0x0000000000f3c5fd in RecLock (mode=<optimized out>,
heap_no=140724951089696, block=0x7ffd14b88220, index=0x7ff40cb5e858,
this=0x7ff49c058af0)
    at ./storage/innobase/include/lock0priv.h:697
#2 lock_rec_add_to_queue (type_mode=<optimized out>,
block=block@entry=0x7ffd14b88220, heap_no=heap_no@entry=1,
index=0x7ff40cb5e858, trx=0x7fffea8650c8, caller_owns_trx_mutex=false)
    at ./storage/innobase/lock/lock0lock.cc:1880
#3 0x0000000000f3c790 in lock_rec_inherit_to_gap
(heir_block=heir_block@entry=0x7ffd14b88220,
block=block@entry=0x7ffd14b91fa0, heir_heap_no=heir_heap_no@entry=1,
heap_no=heap_no@entry=2)
    at ./storage/innobase/lock/lock0lock.cc:2744
#4 0x0000000000f3cf50 in lock_update_split_right
(right_block=right_block@entry=0x7ffd14b91fa0,
left_block=left_block@entry=0x7ffd14b88220)
    ...

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :
Download full text (5.9 KiB)

The following is the draft of the stable mtr test case. The test case does not contain replication at all. Because to reproduce the case the replication is not important, what is important is the sequence of locking caused by the sequence of transactions and statements, which can reproduced whether on master or on slave.

So what the sequence should be?

1) Set S-lock on some record R1 (which can be caused by foreign checking in this case and on the customer's side).

2) Cause page split for the page which contains R1. In the current test right-split is caused, what means that the new page is created to the right of the old page. R1 must be located so that during split process it is moved to the new right page at the first position, i.e., just after infinum.
See:
---------------
rec_t*
btr_page_split_and_insert(...) {
...
 } else if (btr_page_get_split_rec_to_right(cursor, &split_rec)) {
  direction = FSP_UP;
  hint_page_no = page_no + 1;

 } else if (btr_page_get_split_rec_to_left(cursor, &split_rec)) {
...
  left_block = block;
  right_block = new_block;

  if (!dict_table_is_locking_disabled(cursor->index->table)) {
   lock_update_split_right(right_block, left_block);
  }
....
}
----------------

3) If the above condition is true, then R1 S-lock will be copied to the supremum of left-half page, which will cause gap lock.
See
---------------
void
lock_update_split_right(...)
{
...
 /* Inherit the locks to the supremum of left page from the successor
 of the infimum on right page */

 lock_rec_inherit_to_gap(left_block, right_block,
    PAGE_HEAP_NO_SUPREMUM, heap_no);

...
}
---------------

There are several questions which have not yet been covered:

1) Is it possible to repeat the case with non-XA transactions? What is about the same but on slave?

2) This test shows only right-split case, the same issue might also be for left-split case, it needs to be checked.

The test itself:

==============
--source include/have_innodb.inc

--eval SET GLOBAL TRANSACTION ISOLATION LEVEL READ COMMITTED;

CREATE TABLE t1 (t1_pk DECIMAL(20,0) PRIMARY KEY , t1_blob BLOB) ENGINE=InnoDB;

--echo # Initial filling
INSERT INTO t1 VALUES (10000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (20000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (30000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (40000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (50000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (60000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (70000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (80000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (90000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (100000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (110000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (120000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (130000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (140000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (150000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (160000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (170000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (180000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (190000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (200000, REPEAT("a", 1082));
INSERT INTO t1 VALUES (210000, REPEAT("a", 1082));
...

Read more...

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :
Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

How to fix it?

I'm not sure if my suggestion is correct because I don't fully understand the following logic:

=====================
static
void
lock_rec_inherit_to_gap()
{
...
  if (!lock_rec_get_insert_intention(lock)
      && !((srv_locks_unsafe_for_binlog
     || lock->trx->isolation_level
     <= TRX_ISO_READ_COMMITTED)
    && lock_get_mode(lock) ==
    (lock->trx->duplicates ? LOCK_S : LOCK_X))) {
   lock_rec_add_to_queue(
    LOCK_REC | LOCK_GAP | lock_get_mode(lock),
    heir_block, heir_heap_no, lock->index,
    lock->trx, FALSE);
  }
...
}
===================

What was the intention for the following condition:

lock->trx->isolation_level <= TRX_ISO_READ_COMMITTED

Why not:

lock->trx->isolation_level < TRX_ISO_READ_COMMITTED

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

The isolation level condition has been introduced by 450bbd7c3b3f and touched by bd92052b8f3f. Superficially it seems that it indeed should not apply for RC

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

The condition in comment 11 can be rewritten as:

  !lock_rec_get_insert_intention(lock) &&
    (!(srv_locks_unsafe_for_binlog || lock->trx->isolation_level <= TRX_ISO_READ_COMMITTED) ||
  lock_get_mode(lock) == (lock->trx->duplicates ? LOCK_X : LOCK_S))

It can be divided into three parts:

bool cond1 = !lock_rec_get_insert_intention(lock);
bool cond2 = !(srv_locks_unsafe_for_binlog || lock->trx->isolation_level <= TRX_ISO_READ_COMMITTED);
bool cond3 = lock_get_mode(lock) == (lock->trx->duplicates ? LOCK_X : LOCK_S);

So, it can be rewritten as:

cond1 && (cond2 || cond3)

If we consider our test case then:

cond1 = true - i.e. the the lock caused by foreign key check is not "insert intention lock";

cond2 = false - as our isolation level is READ COMMITTED, and the lock should not be inherited to avoid gap lock;

cond3 = true - because lock->trx->duplicates is 0 (as I understood correct, this means that the lock was not caused by "REPLACE" or "INSERT ... ON DUPLICATE KEY UPDATE ...") and the lock itself is LOCK_S.

The only thing which I don't fully understand is cond3. Here is the comment for the above code:

 /* If srv_locks_unsafe_for_binlog is TRUE or session is using
 READ COMMITTED isolation level, we do not want locks set
 by an UPDATE or a DELETE to be inherited as gap type locks. But we
 DO want S-locks/X-locks(taken for replace) set by a consistency
 constraint to be inherited also then */

The first sentence corresponds to the (cond1 && cond2).

The second sentence corresponds to cond3.

This condition should be modified because it means that any S-lock, which is not caused by "REPLACE" or "INSERT ... ON DUPLICATE KEY UPDATE ..." statements, must be inherited despite on cond2(despite isolation level). I think this is not corresponds to the initial intention.

I think "REPLACE" or "INSERT ... ON DUPLICATE KEY UPDATE ..." case should be considered separately, as described in the code comments above, but this specific condition cond3 is wrong. To fix it we need to understand the initial intention about the S-LOCK in cond3.

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

My suggestion is to use the following condition:
=============================
if (!lock_rec_get_insert_intention(lock) &&
   ((!srv_locks_unsafe_for_binlog &&
     (lock->trx->isolation_level > TRX_ISO_READ_COMMITTED)) ||
   (lock->trx->duplicates ?
     lock_get_mode(lock) == LOCK_X :
     ((lock->trx->isolation_level > TRX_ISO_READ_COMMITTED) ?
       lock_get_mode(lock) == LOCK_S :
       false)))) {
   lock_rec_add_to_queue(
    LOCK_REC | LOCK_GAP | lock_get_mode(lock),
    heir_block, heir_heap_no, lock->index,
    lock->trx, FALSE);
  }
======================

instead of the original
=======================
  if (!lock_rec_get_insert_intention(lock)
      && !((srv_locks_unsafe_for_binlog
     || lock->trx->isolation_level
     <= TRX_ISO_READ_COMMITTED)
    && lock_get_mode(lock) ==
    (lock->trx->duplicates ? LOCK_S : LOCK_X))) {
   lock_rec_add_to_queue(
    LOCK_REC | LOCK_GAP | lock_get_mode(lock),
    heir_block, heir_heap_no, lock->index,
    lock->trx, FALSE);
  }
===========================

in lock_rec_inherit_to_gap(). Testing now.

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

Waiting for upstream fix.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1130

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.