Percona XtraDB Cluster - HA scalable solution for MySQL

Multi-row inserts into a table with auto_increment column when PXC node is in async replication and binlog_format is STATEMENT, causes data inconsistencies

Reported by Ovais Tariq on 2013-04-12
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster
Undecided
Unassigned

Bug Description

Suppose we have a PXC 3 node cluster which is replicating from an async master. Now if the binlog_format is STATEMENT and multi-row inserts are executed on a table with an auto_increment column such that values are automatically generated by MySQL, then the PXC node generates wrong auto_increment values which are different from what was generated on the async master.

The test case follows:

master> use test
Database changed
master> drop table t1;
Query OK, 0 rows affected (0.14 sec)

master> CREATE TABLE `t1` (
    -> `i` int(11) NOT NULL AUTO_INCREMENT,
    -> `c` char(32) DEFAULT 'dummy_text',
    -> PRIMARY KEY (`i`)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Query OK, 0 rows affected (0.37 sec)

master> select * from t1;
Empty set (0.00 sec)

master> insert into t1(i) values(null);
Query OK, 1 row affected (0.27 sec)

master> select * from t1;
+---+------------+
| i | c |
+---+------------+
| 1 | dummy_text |
+---+------------+
1 row in set (0.00 sec)

pxc_node1> select * from t1; -- all good here because only a single value was inserted
+---+------------+
| i | c |
+---+------------+
| 1 | dummy_text |
+---+------------+
1 row in set (0.00 sec)

master> insert into t1(i) values(null), (null), (null); -- OK we now insert 3 values such that all 3 have to be automatically generated
Query OK, 3 rows affected (0.51 sec)
Records: 3 Duplicates: 0 Warnings: 0

master> select * from t1;
+---+------------+
| i | c |
+---+------------+
| 1 | dummy_text |
| 2 | dummy_text |
| 3 | dummy_text |
| 4 | dummy_text |
+---+------------+
4 rows in set (0.00 sec)

pxc_node1> select * from t1; -- and there is an inconsistency for the auto_inc value generated for the fourth row
+---+------------+
| i | c |
+---+------------+
| 1 | dummy_text |
| 2 | dummy_text |
| 3 | dummy_text |
| 6 | dummy_text |
+---+------------+
4 rows in set (0.00 sec)

The variable wsrep_auto_increment_control is enabled:
pxc_node1> show variables like '%increment%';
+------------------------------+-------+
| Variable_name | Value |
+------------------------------+-------+
| auto_increment_increment | 3 |
| auto_increment_offset | 3 |
| div_precision_increment | 4 |
| innodb_autoextend_increment | 8 |
| wsrep_auto_increment_control | ON |
+------------------------------+-------+

The problem can be resolved by using binlog_format=ROW on the async master or by disabling wsrep_auto_increment_control. But suppose the async master is 5.0 then RBR is not available, and as such this is a serious issue.

It looks to me as though PXC node does not completely obey all the contextual information that is logged in case of SBR. For example, when I look at the relay log on the PXC node it has the following relevant variables set:
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;

However, either that is not applied correctly to the slave_sql_thread or the variable 'wsrep_auto_increment_control' overrides that.

tags: added: i30888

Looks similar to lp:587170 in terms of workaround.

Also, there seem to be issues with STATEMENT binlog_format as
reported here:

lp:1161060
lp:1160854

Ovais Tariq (ovais-tariq) wrote :

Hi Raghu,

Yes looks more similar to 587170

Ovais Tariq (ovais-tariq) wrote :

I think if PXC node can correctly make use of the contextual information added when SBR is used then this issue can be fixed. For example in normal async replication, the slave thread sets session variables according to what it sees in the relay-log, in this case and in the case of lp:587170 it appears that those variables are not properly set:
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;

tags: added: auto-inc

The problem here is that, you have two set of values:

a) auto-inc from master
b) auto-inc determined from cluster - determined from wsrep_auto_increment_control

It would be incorrect to take a) because if
wsrep_auto_increment_control is enabled, the cluster auto-inc
values get inconsistent with what master has provided.

Now, it may be possible to account for master while calculating
the auto-inc, however, the problem remains in that master is not
aware of this.

It may be better to make this combination fail, ie. STMT +
wsrep_auto_increment_control than introducing inconsistency in
the cluster.

Is there any other way this inconsistency can be resolved - ie.
taking into account both the value derived from
wsrep_auto_increment_control and the value from
master?

Ovais Tariq (ovais-tariq) wrote :

Hi Raghu,

For events that come in via replication, its perfectly valid to use the auto_increment* variables as set on the master. I do not see why we shouldn't be doing that, this is how its managed in async replication, the master also injects its auto_increment* variables and the slave SQL_threads sets those for its session, I would think of that as the correct way in the context of PXC cluster as well. Any SET SESSION .... in the binary log should be honoured and should not be overridden by the global variable if the variable is allowed to be changed dynamically because that's how variable scoping works in MySQL :)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers