"Certification failed for TO isolated action" when frequent truncate table
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC | Status tracked in 5.6 | |||||
5.6 |
Fix Committed
|
Undecided
|
Unassigned | |||
5.7 |
Fix Committed
|
Undecided
|
Unassigned |
Bug Description
On a workload where one PXC node receives any DML writes and second node receives frequent TRUNCATE TABLE commands (to unrelated tables), while third node restarts, whole cluster fails with "Certification failed for TO isolated action" error.
Reproduced on PXC 5.6.37 with pretty basic configuration:
[mysqld]
binlog_format = ROW
innodb_
innodb_
innodb_flush_method = O_DIRECT
datadir = /var/lib/mysql
innodb_
wsrep_cluster_
wsrep_provider = /usr/lib64/
wsrep_slave_threads = 1
wsrep_cluster_name = Cluster
wsrep_node_name = Node1
wsrep_node_address = 172.28.128.3
wsrep_sst_auth = "root:"
* How to reproduce *
Prepare simple tables:
Node1 > show create table t1\G
*******
Table: t1
Create Table: CREATE TABLE `t1` (
`id` int(11) DEFAULT NULL,
`a` char(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
Node1 > show create table t2\G
*******
Table: t2
Create Table: CREATE TABLE `t2` (
`id` int(11) DEFAULT NULL,
`a` char(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
Run in parallel on the nodes accordingly
-- node1:
for i in {1..10000}; do mysql test -e "TRUNCATE table t2"; sleep 0.1; done
-- node2:
./mysql_
-- node3:
for i in {1..40}; do systemctl restart mysql; sleep 2; done
Result logs in the attachment. Basically, nodes 1 and 2 fail with:
2017-12-12 11:36:15 11545 [Note] WSREP: Assign initial position for certification: 767581, protocol version: 3
2017-12-12 11:36:15 11545 [Note] WSREP: Service thread queue flushed.
2017-12-12 11:36:15 11545 [ERROR] WSREP: Certification failed for TO isolated action: source: f55d4498-
Also reproduced on PXC 5.7.19-17-57-log:
2017-12- 12T12:39: 47.150706Z 5 [Note] WSREP: New cluster view: global state: ea51527b- df36-11e7- a529-3e7b805376 00:92030, view# 8: Primary, number of nodes: 2, my index: 1, protocol version 3 12T12:39: 47.150714Z 5 [Note] WSREP: Setting wsrep_ready to true 12T12:39: 47.159516Z 5 [Note] WSREP: REPL Protocols: 7 (3, 2) 12T12:39: 47.159996Z 5 [Note] WSREP: Assign initial position for certification: 92030, protocol version: 3 12T12:39: 47.161030Z 0 [Note] WSREP: Service thread queue flushed. 12T12:39: 47.161814Z 5 [ERROR] WSREP: Certification failed for TO isolated action: source: 1b19a14d- d689-11e7- 8bab-9ebe5b9af4 cc version: 3 local: 0 state: CERTIFYING flags: 65 conn_id: 416 trx_id: -1 seqnos (l: 93906, g: 92031, s: 92020, d: -1, ts: 1151558439465456) 12T12:39: 47.161848Z 5 [Note] WSREP: Closing send monitor... 12T12:39: 47.161854Z 5 [Note] WSREP: Closed send monitor. 12T12:39: 47.161861Z 5 [Note] WSREP: gcomm: terminating thread 12T12:39: 47.161870Z 5 [Note] WSREP: gcomm: joining thread 12T12:39: 47.162019Z 5 [Note] WSREP: gcomm: closing backend
2017-12-
2017-12-
2017-12-
2017-12-
2017-12-
2017-12-
2017-12-
2017-12-
2017-12-
2017-12-