Multi-row inserts into a table with auto_increment column when PXC node is in async replication and binlog_format is STATEMENT, causes data inconsistencies
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC | Status tracked in 5.6 | |||||
| 5.5 |
Won't Fix
|
High
|
Unassigned | ||
| 5.6 |
Fix Released
|
High
|
Unassigned |
Bug Description
Suppose we have a PXC 3 node cluster which is replicating from an async master. Now if the binlog_format is STATEMENT and multi-row inserts are executed on a table with an auto_increment column such that values are automatically generated by MySQL, then the PXC node generates wrong auto_increment values which are different from what was generated on the async master.
The test case follows:
master> use test
Database changed
master> drop table t1;
Query OK, 0 rows affected (0.14 sec)
master> CREATE TABLE `t1` (
-> `i` int(11) NOT NULL AUTO_INCREMENT,
-> `c` char(32) DEFAULT 'dummy_text',
-> PRIMARY KEY (`i`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.37 sec)
master> select * from t1;
Empty set (0.00 sec)
master> insert into t1(i) values(null);
Query OK, 1 row affected (0.27 sec)
master> select * from t1;
+---+------------+
| i | c |
+---+------------+
| 1 | dummy_text |
+---+------------+
1 row in set (0.00 sec)
pxc_node1> select * from t1; -- all good here because only a single value was inserted
+---+------------+
| i | c |
+---+------------+
| 1 | dummy_text |
+---+------------+
1 row in set (0.00 sec)
master> insert into t1(i) values(null), (null), (null); -- OK we now insert 3 values such that all 3 have to be automatically generated
Query OK, 3 rows affected (0.51 sec)
Records: 3 Duplicates: 0 Warnings: 0
master> select * from t1;
+---+------------+
| i | c |
+---+------------+
| 1 | dummy_text |
| 2 | dummy_text |
| 3 | dummy_text |
| 4 | dummy_text |
+---+------------+
4 rows in set (0.00 sec)
pxc_node1> select * from t1; -- and there is an inconsistency for the auto_inc value generated for the fourth row
+---+------------+
| i | c |
+---+------------+
| 1 | dummy_text |
| 2 | dummy_text |
| 3 | dummy_text |
| 6 | dummy_text |
+---+------------+
4 rows in set (0.00 sec)
The variable wsrep_auto_
pxc_node1> show variables like '%increment%';
+------
| Variable_name | Value |
+------
| auto_increment_
| auto_increment_
| div_precision_
| innodb_
| wsrep_auto_
+------
The problem can be resolved by using binlog_format=ROW on the async master or by disabling wsrep_auto_
It looks to me as though PXC node does not completely obey all the contextual information that is logged in case of SBR. For example, when I look at the relay log on the PXC node it has the following relevant variables set:
SET @@session.
However, either that is not applied correctly to the slave_sql_thread or the variable 'wsrep_
tags: | added: i30888 |
Ovais Tariq (ovais-tariq) wrote : | #2 |
Hi Raghu,
Yes looks more similar to 587170
Ovais Tariq (ovais-tariq) wrote : | #3 |
I think if PXC node can correctly make use of the contextual information added when SBR is used then this issue can be fixed. For example in normal async replication, the slave thread sets session variables according to what it sees in the relay-log, in this case and in the case of lp:587170 it appears that those variables are not properly set:
SET @@session.
tags: | added: auto-inc |
The problem here is that, you have two set of values:
a) auto-inc from master
b) auto-inc determined from cluster - determined from wsrep_auto_
It would be incorrect to take a) because if
wsrep_auto_
values get inconsistent with what master has provided.
Now, it may be possible to account for master while calculating
the auto-inc, however, the problem remains in that master is not
aware of this.
It may be better to make this combination fail, ie. STMT +
wsrep_auto_
the cluster.
Is there any other way this inconsistency can be resolved - ie.
taking into account both the value derived from
wsrep_auto_
master?
Ovais Tariq (ovais-tariq) wrote : | #5 |
Hi Raghu,
For events that come in via replication, its perfectly valid to use the auto_increment* variables as set on the master. I do not see why we shouldn't be doing that, this is how its managed in async replication, the master also injects its auto_increment* variables and the slave SQL_threads sets those for its session, I would think of that as the correct way in the context of PXC cluster as well. Any SET SESSION .... in the binary log should be honoured and should not be overridden by the global variable if the variable is allowed to be changed dynamically because that's how variable scoping works in MySQL :)
Nilnandan Joshi (nilnandan-joshi) wrote : | #6 |
Verified with PXC 5.5.37.
On aync master:
mysql> use test;
Database changed
mysql>
mysql> CREATE TABLE `t1` (
-> `i` int(11) NOT NULL AUTO_INCREMENT,
-> `c` char(32) DEFAULT 'dummy_text',
-> PRIMARY KEY (`i`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.06 sec)
mysql> select * from t1;
Empty set (0.00 sec)
mysql> insert into t1(i) values(null);
Query OK, 1 row affected (0.05 sec)
mysql> select * from t1;
+---+------------+
| i | c |
+---+------------+
| 1 | dummy_text |
+---+------------+
1 row in set (0.00 sec)
mysql> insert into t1(i) values(null), (null), (null);
Query OK, 3 rows affected (0.03 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> select * from t1;
+---+------------+
| i | c |
+---+------------+
| 1 | dummy_text |
| 2 | dummy_text |
| 3 | dummy_text |
| 4 | dummy_text |
+---+------------+
4 rows in set (0.00 sec)
on slave (PXC clusternode1/2/3)
mysql> show variables like '%increment%';
+------
| Variable_name | Value |
+------
| auto_increment_
| auto_increment_
| div_precision_
| innodb_
| wsrep_auto_
+------
5 rows in set (0.00 sec)
mysql> use test
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql>
mysql>
mysql> select * from t1;
+---+------------+
| i | c |
+---+------------+
| 1 | dummy_text |
| 2 | dummy_text |
| 4 | dummy_text |
| 7 | dummy_text |
+---+------------+
4 rows in set (0.00 sec)
mysql>
Nilnandan Joshi (nilnandan-joshi) wrote : | #7 |
Verified same with PXC 5.6.19
On async master:
mysql> use test
Database changed
mysql>
mysql> CREATE TABLE `t1` (
-> `i` int(11) NOT NULL AUTO_INCREMENT,
-> `c` char(32) DEFAULT 'dummy_text',
-> PRIMARY KEY (`i`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.33 sec)
mysql> select * from t1;
Empty set (0.00 sec)
mysql> insert into t1(i) values(null);
Query OK, 1 row affected (0.02 sec)
mysql> select * from t1;
+---+------------+
| i | c |
+---+------------+
| 1 | dummy_text |
+---+------------+
1 row in set (0.00 sec)
mysql> insert into t1(i) values(null), (null), (null);
Query OK, 3 rows affected (0.03 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> select * from t1;
+---+------------+
| i | c |
+---+------------+
| 1 | dummy_text |
| 2 | dummy_text |
| 3 | dummy_text |
| 4 | dummy_text |
+---+------------+
4 rows in set (0.00 sec)
mysql>
on slave (PXC clusternode1/2/3)
mysql> show variables like '%increment%';
+------
| Variable_name | Value |
+------
| auto_increment_
| auto_increment_
| div_precision_
| innodb_
| wsrep_auto_
+------
5 rows in set (0.01 sec)
mysql> use test
Database changed
mysql> select * from t1;
+---+------------+
| i | c |
+---+------------+
| 1 | dummy_text |
| 2 | dummy_text |
| 4 | dummy_text |
| 6 | dummy_text |
+---+------------+
4 rows in set (0.00 sec)
Krunal Bauskar (krunal-bauskar) wrote : | #8 |
commit f3e541aa0355c6e
Merge: 1e75388 3bf804b
Author: Krunal Bauskar <email address hidden>
Date: Tue Feb 9 10:08:54 2016 +0530
Merge pull request #86 from natsys/5.6-pxc-474
- #PXC-474: Async replication + auto-increment + log-format (STMT) = inconsistent cluster node
commit 3bf804b48eb6860
Author: Julius Goryavsky <email address hidden>
Date: Wed Feb 3 21:28:25 2016 -0500
This is patch for the PXC-474 issue: suppose we have a PXC 2+ node cluster which is replicating from an async master. Now if the binlog_format is STATEMENT and multi-row inserts are executed on a table with an auto_increment column such that values are automatically generated by MySQL, then the PXC node generates wrong auto_increment values which are different from what was generated on the async master.
The causes and fixes:
1. We must ignore wsrep_auto_
2. If wsrep auto_increment_
3. If wsrep auto_increment_
Shahriyar Rzayev (rzayev-sehriyar) wrote : | #9 |
Percona now uses JIRA for bug reports so this bug report is migrated to: https:/
Looks similar to lp:587170 in terms of workaround.
Also, there seem to be issues with STATEMENT binlog_format as
reported here:
lp:1161060
lp:1160854