Duplicate UK values in READ-COMMITTED (again)
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
MySQL Server |
Unknown
|
Unknown
|
||||
MySQL patches by Codership |
New
|
Undecided
|
Unassigned | |||
Percona Server moved to https://jira.percona.com/projects/PS |
Fix Released
|
Medium
|
Alexey Kopytov | |||
5.1 |
Won't Fix
|
Undecided
|
Unassigned | |||
5.5 |
Triaged
|
Medium
|
Unassigned | |||
5.6 |
Fix Released
|
Medium
|
Alexey Kopytov | |||
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC | Status tracked in 5.6 | |||||
5.5 |
Triaged
|
Medium
|
Unassigned | |||
5.6 |
Fix Released
|
Medium
|
Alexey Kopytov |
Bug Description
On a table having PK and UK keys defined, it is possible to crash nodes with consistency errors or lock whole cluster for writes.
This is a result of InnoDB behaviour as reported in upstream MySQL bug: bugs.mysql.
The good thing is that it is harder to break Galera cluster under the same conditions then normal asynchronous replication. For example, I am not able to break PXC cluster with just two concurrent sessions where the p1() procedure from Kevin Lewis' test case is run. But with 3 or more, running on the same node, PXC crashes due to consistency compromised. This happens faster in read-committed isolation level, but happens also with repeatable-read. The result error is like this:
2014-04-15 13:19:53 9570 [ERROR] Slave SQL: Could not execute Write_rows event on table test.t1; Duplicate entry '1' for key 'a', Error_code: 1062; handler error HA_ERR_
2014-04-15 13:19:53 9570 [Warning] WSREP: RBR event 4 Write_rows apply warning: 121, 649369
2014-04-15 13:19:53 9570 [ERROR] WSREP: Failed to apply trx: source: fe30ab01-
2014-04-15 13:19:53 9570 [ERROR] WSREP: Failed to apply trx 649369 4 times
2014-04-15 13:19:53 9570 [ERROR] WSREP: Node consistency compromized, aborting...
When I call the procedure from two different nodes in repeatable-read - the cluster gets locked for writes and the only way to fix is to kill -9 mysqld on one of the nodes. Example hanged state:
percona1 mysql> show processlist;
+----+-
| Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined |
+----+-
| 1 | system user | | NULL | Sleep | 3905 | wsrep aborter idle | NULL | 0 | 0 |
| 2 | system user | | NULL | Sleep | 508 | System lock | NULL | 0 | 0 |
| 3 | system user | | NULL | Sleep | 508 | committed 356488 | NULL | 0 | 0 |
| 4 | cmon | percona5:48670 | NULL | Sleep | 79 | | NULL | 1 | 1 |
| 8 | root | localhost | test | Killed | 266 | wsrep in pre-commit stage | insert into t1 values (22,22,22,22,22,22) | 0 | 0 |
| 9 | root | localhost | test | Killed | 508 | wsrep in pre-commit stage | start transaction | 0 | 16 |
| 15 | root | localhost | test | Query | 0 | init | show processlist | 0 | 0 |
+----+-
7 rows in set (0.00 sec)
Unfortunately upstream bug is marked as "not a bug", but maybe there is a way to fix that in Galera replication?
tags: | added: i40914 |
summary: |
- Duplicate UK values can be replicated under concurrent wokload + Duplicate UK values can be replicated under concurrent wokload with + READ-COMMITTED |
summary: |
- Duplicate UK values can be replicated under concurrent wokload with + Duplicate UK values can be replicated under concurrent workload with READ-COMMITTED |
summary: |
Duplicate UK values can be replicated under concurrent workload with - READ-COMMITTED + READ-COMMITTED with multi-node configuration |
summary: |
Duplicate UK values can be replicated under concurrent workload with - READ-COMMITTED with multi-node configuration + READ-COMMITTED |
The test case I used from upstream bug report:
drop table if exists t1;
create table t1(a tinyint not null, b tinyint not null, who int, rep_count int, trx_count int, trx_started int, primary key(b), unique key(a)) engine=innodb;
drop procedure if exists p1; =null,@ cnt_b:= null,@a: =null; =(@trx_ count + 1); =@replace_ count; count:= (@replace_ count + 1); who,rep_ count,trx_ count,trx_ started)
values( floor(3* rand()) , floor(3*rand()), =null,@ cnt_b:= null,@a: =null;
delimiter $
create procedure p1(me int)
begin
declare continue handler for 1062 begin end;
declare continue handler for 1213 begin end;
set transaction isolation level repeatable read;
set @trx_count:=0;
set @trx_count:=0;
set @replace_count:=0;
set @cnt_a:
repeat
if rand() > 0.5 then
if @trx_started = 0 then
set @trx_count:
end if;
set @trx_started:
start transaction;
end if;
if rand() > 0.5 then
set @replace_
replace into t1(a,b,
me, @replace_count, @trx_count, @trx_started);
end if;
select count(*) cnt_a,a into
@cnt_a,@a from t1 group by a having cnt_a > 1 limit 1;
select count(*) cnt_b,b into
@cnt_b,@b from t1 group by a having cnt_b > 1 limit 1;
if (@cnt_a is not null) || (@cnt_b is not null) then
select * from t1;
end if;
set @cnt_a:
if rand() > 0.5 then
set @trx_started:=0;
commit;
end if;
until 1=2 end repeat;
end $
delimiter ;
session1 > call p1(77);
session2 > call p1(33);
session3 > call p1(55);
etc.