Innodb deadlock with already locked row (non IX)

Bug #1323235 reported by Nickolay Ihalainen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.1
Won't Fix
Medium
Unassigned
5.5
Triaged
Medium
Unassigned
5.6
Triaged
Medium
Unassigned
5.7
Triaged
Medium
Unassigned

Bug Description

Percona Server 5.5.20

The deadlock happens during pt-online-schema-change and includes triggers on DELETE, INSERT, UPDATE between two INSERT or UPDATE trigger actions.

Rolled back transaction:
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 50538 page no 6778 n bits 224 index `KEY01` of table `test`.`_tbl_new` trx id 549F20D92 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 50538 page no 6778 n bits 224 index `KEY01` of table `test`.`_tbl_new` trx id 549F20D92 lock_mode X waiting

It seems like the transaction is trying to obtain the already locked one. The problem looks similar to "not a bug" https://bugs.launchpad.net/percona-server/+bug/796610
and http://bugs.mysql.com/bug.php?id=69835 ,
but there is no "intention" waiting.

Transaction isolation level: repeatable-read

140516 10:26:00
*** (1) TRANSACTION:
TRANSACTION 549F20D91, ACTIVE 0 sec inserting
mysql tables in use 2, locked 2
LOCK WAIT 7 lock struct(s), heap size 1248, 5 row lock(s), undo log entries 3
MySQL thread id 3036282, OS thread handle 0x7f51d4c21700, query id 20738656662 172.18.32.20 zwebapps update
REPLACE INTO `test`.`_tbl_new` (`id`, `c1`, `c2`, `c3`, `c4`, `c5`, `c6`, `c7`, `c8`, `c9`, `c10`, `c11`, `c12`, `c13`, `c14`, `c15`, `c16`, `c17`, `c18`, `c19`, `c20`, `c21`, `c22`, `c23`, `c24`, `c25`, `c26`, `c27`, `c28`, `c29`, `c30`, `c31`, `c32`, `c33`, `c34`) VALUES (NEW.`id`, NEW.`c1`, NEW.`c2`, NEW.`c3`, NEW.`c4`, NEW.`c5`, NEW.`c6`, NEW.`c7`, NEW.`c8`, NEW.`c9`, NEW.`c10`, NEW.`c11`, NEW.`number_
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 50538 page no 6778 n bits 224 index `KEY01` of table `test`.`_tbl_new` trx id 549F20D91 lock_mode X waiting
*** (2) TRANSACTION:
TRANSACTION 549F20D92, ACTIVE 0 sec inserting, thread declared inside InnoDB 497
mysql tables in use 2, locked 2
6 lock struct(s), heap size 1248, 4 row lock(s), undo log entries 3
MySQL thread id 3036211, OS thread handle 0x7f51ab0af700, query id 20738656663 172.18.32.21 zwebapps update
REPLACE INTO `test`.`_tbl_new` (`id`, `c1`, `c2`, `c3`, `c4`, `c5`, `c6`, `c7`, `c8`, `c9`, `c10`, `c11`, `c12`, `c13`, `c14`, `c15`, `c16`, `c17`, `c18`, `c19`, `c20`, `c21`, `c22`, `c23`, `c24`, `c25`, `c26`, `c27`, `c28`, `c29`, `c30`, `c31`, `c32`, `c33`, `c34`) VALUES (NEW.`id`, NEW.`c1`, NEW.`c2`, NEW.`c3`, NEW.`c4`, NEW.`c5`, NEW.`c6`, NEW.`c7`, NEW.`c8`, NEW.`c9`, NEW.`c10`, NEW.`c11`, NEW.`number_
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 50538 page no 6778 n bits 224 index `KEY01` of table `test`.`_tbl_new` trx id 549F20D92 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 50538 page no 6778 n bits 224 index `KEY01` of table `test`.`_tbl_new` trx id 549F20D92 lock_mode X waiting
*** WE ROLL BACK TRANSACTION (2)

CREATE TABLE `tbl` (
`id` varchar(32) NOT NULL,
`c1` int(11) NOT NULL,
`c2` varchar(50) NOT NULL,
`c3` varchar(20) DEFAULT NULL,
`c4` varchar(32) DEFAULT NULL,
`c5` bit(1) NOT NULL DEFAULT b'0',
`c6` date DEFAULT NULL,
`c7` date DEFAULT NULL,
`c8` datetime DEFAULT NULL,
`c9` varchar(32) DEFAULT NULL,
`c10` varchar(20) NOT NULL,
`c11` varchar(255) DEFAULT NULL,
`c12` int(11) NOT NULL DEFAULT '0',
`c13` int(11) NOT NULL DEFAULT '0',
`c14` varchar(32) NOT NULL,
`c15` bit(1) NOT NULL DEFAULT b'0',
`c16` varchar(32) NOT NULL,
`c17` datetime NOT NULL,
`c18` varchar(32) DEFAULT NULL,
`c19` datetime DEFAULT NULL,
`c20` date DEFAULT NULL,
`c21` datetime DEFAULT NULL,
`c22` datetime DEFAULT NULL,
`c23` bigint(20) DEFAULT '0',
`c24` int(11) DEFAULT '0',
`c25` int(10) NOT NULL DEFAULT '0',
`c26` bit(1) NOT NULL DEFAULT b'0',
`c27` varchar(100) DEFAULT NULL,
`c28` bit(1) NOT NULL DEFAULT b'0',
`c29` bit(1) NOT NULL DEFAULT b'0',
`c30` bit(1) DEFAULT b'0',
`c31` datetime DEFAULT NULL,
`c32` varchar(32) DEFAULT NULL,
`c33` varchar(20) DEFAULT NULL,
`c34` varchar(32) DEFAULT NULL,
`c35` datetime DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `KEY01` (`c14`,`c2`),
KEY `KEY02` (`c9`),
KEY `KEY03` (`c14`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Triggers created by pt-online-schema-change:
CREATE TRIGGER `pt_osc_test_tbl_del` AFTER DELETE ON `test`.`trigtest` FOR EACH ROW DELETE IGNORE FROM `test`.`_tbl_new` WHERE `test`.`_tbl_new`.`id` <=> OLD.`id`
CREATE TRIGGER `pt_osc_test_tbl_upd` AFTER UPDATE ON `test`.`trigtest` FOR EACH ROW REPLACE INTO `test`.`_tbl_new` (`id`, `c1`, `c2`, `c3`, `c4`, `c5`, `c6`, `c7`, `c8`, `c9`, `c10`, `c11`, `c12`, `c13`, `c14`, `c15`, `c16`, `c17`, `c18`, `c19`, `c20`, `c21`, `c22`, `c23`, `c24`, `c25`, `c26`, `c27`, `c28`, `c29`, `c30`, `c31`, `c32`, `c33`, `c34`) VALUES (NEW.`id`, NEW.`c1`, NEW.`c2`, NEW.`c3`, NEW.`c4`, NEW.`c5`, NEW.`c6`, NEW.`c7`, NEW.`c8`, NEW.`c9`, NEW.`c10`, NEW.`c11`, NEW.`number_
CREATE TRIGGER `pt_osc_test_tbl_ins` AFTER INSERT ON `test`.`trigtest` FOR EACH ROW REPLACE INTO `test`.`_tbl_new` (`id`, `c1`, `c2`, `c3`, `c4`, `c5`, `c6`, `c7`, `c8`, `c9`, `c10`, `c11`, `c12`, `c13`, `c14`, `c15`, `c16`, `c17`, `c18`, `c19`, `c20`, `c21`, `c22`, `c23`, `c24`, `c25`, `c26`, `c27`, `c28`, `c29`, `c30`, `c31`, `c32`, `c33`, `c34`) VALUES (NEW.`id`, NEW.`c1`, NEW.`c2`, NEW.`c3`, NEW.`c4`, NEW.`c5`, NEW.`c6`, NEW.`c7`, NEW.`c8`, NEW.`c9`, NEW.`c10`, NEW.`c11`, NEW.`number_
(includes all fields)

I'm not able to reproduce the bug on my own hardware, maybe it's important what there are many rollback operations and transaction could contain several statements on production system.
There are also similar rollbacks with different tables (also having many fields).

Revision history for this message
lisheng.liu (lisheng-liu) wrote :

I found some other customer get similar problem by 3 threads running in parallel to delete same row in high concurrency.
and the reason is the second and third thread both get into no gap lock wait before this record not be marked to delate by thread , not X lock wait.
Then after thread1 commit thread 2 and 3 get gap lock but because of index id unique index it need none gap lock then they trying to request X lock which caused deadlock.

I found the symptom slimiar in deadlock info, so just for reference while you analyze checking source code, hope it could help.

---------------------------
LATEST DETECTED DEADLOCK
------------------------
140123 12:20:50
*** (1) TRANSACTION:
TRANSACTION 2E10, ACTIVE 5325 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 376, 1 row lock(s)
MySQL thread id 3, OS thread handle 0x1008, query id 43 192.168.xx.x username upd
ating
delete from abcdefg WHERE abc= 'vars' and def= 'vars' and ghi= '2012-12-19 00:00:00' and jkl= '2012-12-20 00:00:00';
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 23415 n bits 528 index `uniqeindex` of table
`deadlock`.`abcdefg` trx id 2E10 lock_mode X locks rec but not gap waiti
ng
Record lock, heap no 167 PHYSICAL RECORD: n_fields 4; compact format;

*** (2) TRANSACTION:
TRANSACTION 2E0E, ACTIVE 5325 sec starting index read
mysql tables in use 1, locked 1
3 lock struct(s), heap size 1248, 2 row lock(s)
MySQL thread id 1, OS thread handle 0x1190, query id 41 192.168.xx.xx username upd
ating
delete from abcdefg WHERE abc= 'vars' and def= 'vars' and ghi= '2012-12-19 00:00:00' and jkl= '2012-12-20 00:00:00';
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 23415 n bits 528 index `uniqeindex` of table
`deadlock`.`abcdefg` trx id 2E0E lock_mode X locks rec but not gap
Record lock, heap no 167 PHYSICAL RECORD: n_fields 4; compact format;

*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 23415 n bits 528 index `uniqeindex` of table
`deadlock`.`abcdefg` trx id 2E0E lock_mode X waiting
Record lock, heap no 167 PHYSICAL RECORD: n_fields 4; compact format;
*** WE ROLL BACK TRANSACTION (1)
---------------------------

Revision history for this message
lisheng.liu (lisheng-liu) wrote :

"Then after thread1 commit thread 2 and 3 get none gap lock but because of index id unique index it the code path are different then they trying to request X lock which caused deadlock." sorry, to correct the my expressing.

Revision history for this message
Peiran Song (peiran-song) wrote :

Lisheng,

Can you provide the details for the case of three concurrent deletions on the same row? I tried it as below and didn't observe deadlock.

session1: (unique index on u_id)
mysql>begin
mysql>delete from t1 where u_id = 9

session2 and session3, same command:
mysql>begin
mysql>delete from t1 where u_id = 9

Before session1 commit, the locks are:
session 1:
TABLE LOCK table `test`.`t1` trx id 7D7 lock mode IX
RECORD LOCKS space id 0 page no 324 n bits 72 index `idx_u_id` of table `test`.`t1` trx id 7D7 lock_mode X locks rec but not gap
RECORD LOCKS space id 0 page no 322 n bits 72 index `PRIMARY` of table `test`.`t1` trx id 7D7 lock_mode X locks rec but not gap

session2/3:
RECORD LOCKS space id 0 page no 324 n bits 72 index `idx_u_id` of table `test`.`t1` trx id 7D8 lock_mode X waiting

After session1 commit, session2 and 3 both continue with deletion of 0 row, before commit, the locks held were:
session 2/3:
3 lock struct(s), heap size 376, 1 row lock(s)
TABLE LOCK table `test`.`t1` trx id 7D9 lock mode IX
RECORD LOCKS space id 0 page no 324 n bits 72 index `idx_u_id` of table `test`.`t1` trx id 7D9 lock_mode X
RECORD LOCKS space id 0 page no 324 n bits 72 index `idx_u_id` of table `test`.`t1` trx id 7D9 lock_mode X locks gap before rec

Both session 2 and session 3 held two X locks on unique idx_u_id, but no dead lock or "not gap lock" as you suggested.

Revision history for this message
Peiran Song (peiran-song) wrote :

Test case provided by Lisheng:
--------------------------------------
1) create table test1(id varchar(32), uid varchar(32), primary key(id), unique key uk(uid)) engine = innodb;
2) insert 100000 rows into test1:
select replace(uuid(),'-','') into @oneid; insert into test1 values (@oneid, @oneid);
3) run the following perl script in three sessions. The script does REPLACE INTO for each row and invokes 5 threads to do the same.

#!/usr/bin/perl

use DBI;
use threads;

sub executeStmt {
my ($dbh, $stmt) = @_;
my $sth = $dbh->prepare($stmt);
print "$stmt\n";
$sth->execute();
return $sth;
}

sub doStmt {
my ($dbh, $stmt) = @_;
print "$stmt\n";
my $sth = $dbh->do($stmt);
return $sth;
}

sub getList {
my ($dbh)=@_;
my $stmt= "select uid from test1" ;
my ($ref,$sth);
my $list=();
$sth=executeStmt($dbh,$stmt);
while( my $ref=$sth->fetchrow_arrayref()){
push @${list},@{$ref} ;
}
return $list;
}

sub deleteRow{
my ($dsn,$user,$password,$uid)=@_;
my $dbh=DBI->connect($dsn,$user,$password, {RaiseError=>1});
my $stmt="replace into test1 (id,uid) values(\'$uid\',\'$uid\')";
doStmt($dbh,$stmt);
}

my($dbh,$dbh1,$dbh2,$dbh3,$dbh4,$dbh5);
my $user="user";
my $password="password";
my $dsn="DBI:mysql:database=test:host=127.0.0.1:port=3306";
$dbh = DBI->connect($dsn,$user,$password, {RaiseError=>1});

my $ref=getList($dbh);
foreach my $uid(@{$ref}){
my $thread_1_01 = threads->create(\&deleteRow,$dsn,$user,$password,$uid);
my $thread_1_02 = threads->create(\&deleteRow,$dsn,$user,$password,$uid);
my $thread_1_03 = threads->create(\&deleteRow,$dsn,$user,$password,$uid);
my $thread_1_04 = threads->create(\&deleteRow,$dsn,$user,$password,$uid);
my $thread_1_05 = threads->create(\&deleteRow,$dsn,$user,$password,$uid);
$thread_1_01->join();
$thread_1_02->join();
$thread_1_03->join();
$thread_1_04->join();
$thread_1_05->join();
}

Revision history for this message
Peiran Song (peiran-song) wrote :
Download full text (4.6 KiB)

I reproduced the following deadlocks on PS 5.5.24 and community v5.6.19. They look the same as the one in the original reporting and could theoretically happen in the trigger situation with pt-osc: 1) deadlocks happened on two different rows,(in the pt-osc case, two UPDATEs on different rows in the base table, they proceeded, but the triggered REPLACE statements got into deadlock). 2) the record locks held and waiting are all on unique key locks.

I tried it on community version because the deadlock section would give more information on the particular record lock. As below, the “heap no” is the same which means the index record was the same even though the rows to be updated were different. This suggested that the lock was on a non-leave record, to my understanding. It certainly looked interesting that Transaction2 held X lock not gap and requested X lock again. Maybe between the DELETEs and INSERTs of the concurrent REPLACE statements, the shared non-leave record later needed X gap lock?

Here is the deadlock from PS 5.5.24
------------------------
LATEST DETECTED DEADLOCK
------------------------
140624 7:10:30
*** (1) TRANSACTION:
TRANSACTION D6B79, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1248, 4 row lock(s), undo log entries 2
MySQL thread id 496045, OS thread handle 0x7fbe1c9ae700, query id 1605727 localhost 127.0.0.1 msandbox update
replace into test1 (id,uid) values('3d294b45fb5611e3889f001e67998d81','3d294b45fb5611e3889f001e67998d81')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 22 page no 1733 n bits 184 index `uk` of table `test`.`test1` trx id D6B79 lock_mode X waiting
*** (2) TRANSACTION:
TRANSACTION D6B78, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 2
MySQL thread id 496044, OS thread handle 0x7fbe1958a700, query id 1605726 localhost 127.0.0.1 msandbox update
replace into test1 (id,uid) values('3d29f881fb5511e3889f001e67998d81','3d29f881fb5511e3889f001e67998d81')
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 22 page no 1733 n bits 184 index `uk` of table `test`.`test1` trx id D6B78 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 22 page no 1733 n bits 184 index `uk` of table `test`.`test1` trx id D6B78 lock_mode X waiting
*** WE ROLL BACK TRANSACTION (2)

Here is the deadlock from community version 5.6.19 with more lock details:
------------------------
LATEST DETECTED DEADLOCK
------------------------
2014-06-20 01:06:38 7f488c064700
*** (1) TRANSACTION:
TRANSACTION 383294, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1184, 4 row lock(s), undo log entries 2
MySQL thread id 182941, OS thread handle 0x7f485fa02700, query id 558909 localhost 127.0.0.1 root update
replace into test1 (id,uid) values('30fdc71df7bc11e39110080027303a4b','30fdc71df7bc11e39110080027303a4b')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 112 page no 23 n bits 296 index `uk` of table `test`.`test1` trx id 383294 lock_mode X waiting
Record lock, heap no 76 PHYSICAL RECORD: n_fields ...

Read more...

Revision history for this message
Peiran Song (peiran-song) wrote :

Reproduced on PS 5.1.68 and latest PS 5.6.17. I am not sure if this is a bug, or a corner case that unfortunately causes deadlock. So I prefer not filing an upstream bug till confirm by developer that it is indeed an unexpected(buggy) behavior.

On PS 5.1.68
------------------------
LATEST DETECTED DEADLOCK
------------------------
140625 13:39:46
*** (1) TRANSACTION:
TRANSACTION 80115, ACTIVE 0 sec, process no 24713, OS thread id 139694711215872 inserting
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1248, 4 row lock(s), undo log entries 2
MySQL thread id 291509, query id 934736 localhost 127.0.0.1 msandbox update
replace into test1 (id,uid) values('46a2c93bfc4e11e390ed001e67998d81','46a2c93bfc4e11e390ed001e67998d81')
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 6527 n bits 296 index `uk` of table `test`.`test1` trx id 80115 lock_mode X waiting
*** (2) TRANSACTION:
TRANSACTION 80116, ACTIVE 0 sec, process no 24713, OS thread id 139694707488512 inserting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1248, 3 row lock(s), undo log entries 2
MySQL thread id 291510, query id 934737 localhost 127.0.0.1 msandbox update
replace into test1 (id,uid) values('46a35d00fc4f11e390ed001e67998d81','46a35d00fc4f11e390ed001e67998d81')
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 6527 n bits 296 index `uk` of table `test`.`test1` trx id 80116 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 6527 n bits 296 index `uk` of table `test`.`test1` trx id 80116 lock_mode X waiting
*** WE ROLL BACK TRANSACTION (2)

On PS 5.6.17
------------------------
LATEST DETECTED DEADLOCK
------------------------
2014-06-25 13:24:21 7f84b67fc700
*** (1) TRANSACTION:
TRANSACTION 481777, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 5 lock struct(s), heap size 1184, 4 row lock(s), undo log entries 2
MySQL thread id 289291, OS thread handle 0x7f84bfe03700, query id 967656 localhost 127.0.0.1 msandbox update
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 6 page no 353 n bits 296 index `uk` of table `test`.`test1` trx id 481777 lock_mode X waiting
*** (2) TRANSACTION:
TRANSACTION 481776, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
4 lock struct(s), heap size 1184, 3 row lock(s), undo log entries 2
MySQL thread id 289290, OS thread handle 0x7f84b67fc700, query id 967655 localhost 127.0.0.1 msandbox update
replace into test1 (id,uid) values('23a82463fc5311e3887b001e67998d81','23a82463fc5311e3887b001e67998d81')
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 6 page no 353 n bits 296 index `uk` of table `test`.`test1` trx id 481776 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 6 page no 353 n bits 296 index `uk` of table `test`.`test1` trx id 481776 lock_mode X waiting
*** WE ROLL BACK TRANSACTION (2)

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1503

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.