GTID holes created on cluster peers of async slave node

Bug #1522385 reported by Przemek on 2015-12-03
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.6
Fix Released
Undecided
Krunal Bauskar

Bug Description

In a replication topology like this:
async master -> node1<->nodeN...
when node1 slave ignores some row updates, for example due to slave_skip_errors=1032, it creates empty trx to keep consistent with GTID sequence. However it's Galera peers end up with GTID holes.

-- async master:
db1 {root} (test) > create table test.g1 (id int auto_increment primary key);
Query OK, 0 rows affected (0.06 sec)

db1 {root} (test) > insert into test.g1 values (null),(null),(null);
Query OK, 3 rows affected (0.03 sec)
Records: 3 Duplicates: 0 Warnings: 0

db1 {root} (test) > select * from test.g1;
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
+----+
3 rows in set (0.00 sec)

-- async slave node:
percona1 mysql> select * from test.g1;
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
+----+
3 rows in set (0.00 sec)

percona1 mysql> delete from test.g1 where id=2;
Query OK, 1 row affected (0.12 sec)

percona1 mysql> flush logs;
Query OK, 0 rows affected (0.07 sec)

-- async master:
db1 {root} (test) > flush logs;
Query OK, 0 rows affected (0.07 sec)

db1 {root} (test) > delete from test.g1 where id=2;
Query OK, 1 row affected (0.02 sec)

db1 {root} (test) > delete from test.g1 where id=3;
Query OK, 1 row affected (0.00 sec)

db1 {root} (test) > show binlog events in 'binlog.000014';
+---------------+-----+----------------+-----------+-------------+------------------------------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+---------------+-----+----------------+-----------+-------------+------------------------------------------------------------------------+
| binlog.000014 | 4 | Format_desc | 1 | 120 | Server ver: 5.6.24-72.2-log, Binlog ver: 4 |
| binlog.000014 | 120 | Previous_gtids | 1 | 191 | 12446bf7-3219-11e5-9434-080027079e3d:1-963126 |
| binlog.000014 | 191 | Gtid | 1 | 239 | SET @@SESSION.GTID_NEXT= '12446bf7-3219-11e5-9434-080027079e3d:963127' |
| binlog.000014 | 239 | Query | 1 | 311 | BEGIN |
| binlog.000014 | 311 | Table_map | 1 | 356 | table_id: 90 (test.g1) |
| binlog.000014 | 356 | Delete_rows | 1 | 396 | table_id: 90 flags: STMT_END_F |
| binlog.000014 | 396 | Xid | 1 | 427 | COMMIT /* xid=1114262 */ |
| binlog.000014 | 427 | Gtid | 1 | 475 | SET @@SESSION.GTID_NEXT= '12446bf7-3219-11e5-9434-080027079e3d:963128' |
| binlog.000014 | 475 | Query | 1 | 547 | BEGIN |
| binlog.000014 | 547 | Table_map | 1 | 592 | table_id: 90 (test.g1) |
| binlog.000014 | 592 | Delete_rows | 1 | 632 | table_id: 90 flags: STMT_END_F |
| binlog.000014 | 632 | Xid | 1 | 663 | COMMIT /* xid=1114265 */ |
+---------------+-----+----------------+-----------+-------------+------------------------------------------------------------------------+
12 rows in set (0.00 sec)

db1 {root} (test) > show global variables like 'gtid_executed';
+---------------+-----------------------------------------------+
| Variable_name | Value |
+---------------+-----------------------------------------------+
| gtid_executed | 12446bf7-3219-11e5-9434-080027079e3d:1-963128 |
+---------------+-----------------------------------------------+
1 row in set (0.00 sec)

-- async slave node:
percona1 mysql> show binlog events in 'binlog.000004';
+---------------+-----+----------------+-----------+-------------+---------------------------------------------------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+---------------+-----+----------------+-----------+-------------+---------------------------------------------------------------------------------------------+
| binlog.000004 | 4 | Format_desc | 3 | 120 | Server ver: 5.6.26-74.0-56-log, Binlog ver: 4 |
| binlog.000004 | 120 | Previous_gtids | 3 | 231 | 052077a5-b6f4-ee1b-61ec-d80a8b27d749:1-36,
12446bf7-3219-11e5-9434-080027079e3d:8058-963126 |
| binlog.000004 | 231 | Gtid | 1 | 279 | SET @@SESSION.GTID_NEXT= '12446bf7-3219-11e5-9434-080027079e3d:963127' |
| binlog.000004 | 279 | Query | 1 | 347 | BEGIN |
| binlog.000004 | 347 | Query | 1 | 416 | COMMIT |
| binlog.000004 | 416 | Gtid | 1 | 464 | SET @@SESSION.GTID_NEXT= '12446bf7-3219-11e5-9434-080027079e3d:963128' |
| binlog.000004 | 464 | Query | 1 | 532 | BEGIN |
| binlog.000004 | 532 | Table_map | 1 | 577 | table_id: 86 (test.g1) |
| binlog.000004 | 577 | Delete_rows | 1 | 617 | table_id: 86 flags: STMT_END_F |
| binlog.000004 | 617 | Xid | 1 | 648 | COMMIT /* xid=1441296 */ |
+---------------+-----+----------------+-----------+-------------+---------------------------------------------------------------------------------------------+
10 rows in set (0.00 sec)

percona1 mysql> show global variables like 'gtid%';
+----------------------+---------------------------------------------------------------------------------------------+
| Variable_name | Value |
+----------------------+---------------------------------------------------------------------------------------------+
| gtid_deployment_step | OFF |
| gtid_executed | 052077a5-b6f4-ee1b-61ec-d80a8b27d749:1-36,
12446bf7-3219-11e5-9434-080027079e3d:8058-963128 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | 052077a5-b6f4-ee1b-61ec-d80a8b27d749:1-34,
12446bf7-3219-11e5-9434-080027079e3d:8058-963124 |
+----------------------+---------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

-- Galera peer node:
percona2 mysql> show global variables like 'gtid%';
+----------------------+----------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+----------------------+----------------------------------------------------------------------------------------------------+
| gtid_deployment_step | OFF |
| gtid_executed | 052077a5-b6f4-ee1b-61ec-d80a8b27d749:1-36,
12446bf7-3219-11e5-9434-080027079e3d:8058-963126:963128 |
| gtid_mode | ON |
| gtid_owned | |
| gtid_purged | 052077a5-b6f4-ee1b-61ec-d80a8b27d749:1-34,
12446bf7-3219-11e5-9434-080027079e3d:8058-963124 |
+----------------------+----------------------------------------------------------------------------------------------------+
5 rows in set (0.01 sec)

So the 963127 trx is missing on the cluster peers, making the GTID numeration inconsistent within the cluster.

tags: added: i62896
Krunal Bauskar (krunal-bauskar) wrote :
Download full text (3.8 KiB)

Scenario-1:
=========
I tried reproducing the issue and found some interesting observation.
Let me list them down.
1. Started master server. Some important configuration to note.
[mysqld]
log-bin=mysql-bin
server-id=1
log-slave-updates=true
gtid-mode=on
enforce-gtid-consistency=true
-------------
2. Started pxc-node-1 and 2.
[mysqld]
log-bin=mysql-bin
server-id=2
log-slave-updates=true
gtid-mode=on
enforce-gtid-consistency=true
-------------
3. Select one of the pxc-node (node-1 in my case) and make it an async slave
point to master that we started in step-1
------------
4. Execute following workload on master
create table t (i int) engine=innodb;
insert into t values (1), (2), (3), (4);
select * from t;
As expected I gtid has been properly captured to cover the needed actions:
mysql> show global variables like 'gtid_executed';
gtid_executed e20191fe-9fdc-11e5-9ff6-340286b888be:1-5
------------
5. Moving to slave. Check if the async-replication was done.
(Same can be confirmed out even on node-2 which is replicting from node-1 using galera)
mysql> show global variables like 'gtid_executed';
gtid_executed e20191fe-9fdc-11e5-9ff6-340286b888be:4-5
mysql> select * from t;
------
i
------
1
2
3
4
------
4 rows in set (0.00 sec)
------------
6. Execute following workload on node-1.
delete from t where i = 2;
This action too is reflected as expected (on node-2 also)
mysql> show global variables like 'gtid_executed';
gtid_executed 122fdc30-6023-ee1a-7a64-f545ef74c7ac:1,
e20191fe-9fdc-11e5-9ff6-340286b888be:4-5
Note: There are 2 uuid. One is GTID from MySQL other is GTID/UUID from Galera and both has different
seqno to follow. Also, note that by directly modifying slave we have put slave and master out of sync.
------------
7. Run the same delete statement on master. Statement will pass on master and seqno will increment
as expected.
mysql> show global variables like 'gtid_executed';
gtid_executed e20191fe-9fdc-11e5-9ff6-340286b888be:1-6
But when this stmt is replicated on async slave then the record is not found which will cause
slave thrd to generate following error and stop SQL slave thread.
<snippet>
2015-12-11 13:31:32 8881 [ERROR] Slave SQL: Could not execute Delete_rows event on table test.t; Can't find record in 't', Error_code: 1032; handler error HA_ERR_END_OF_FILE; the event's master log mysql-bin.000001, end_log_pos 1645, Error_code: 1032
2015-12-11 13:31:32 8881 [Warning] Slave: Can't find record in 't' Error_code: 1032
2015-12-11 13:31:32 8881 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000001' position 1441
</snippet>
This is expected as user has maliciously madde them inconsistent.
-------------
Now onwards master and slave are out of sync.
Master ops can continues and seqno increases but doesn't affect async slave which in this case is galera-node-1.
Operation on galera-node-1 are logged using separate uuid which are replicated as expected.
This behavior looks perfectly fine and inline with understanding.
---------------------------------------------------------------------------------------------------------------------------...

Read more...

Krunal Bauskar (krunal-bauskar) wrote :

With all that analysis in place
a. GTID holes can happen only if user explicitly configures server to suppress error. With normal flow this is not an issue.
b. Action of inserting a dummy trx is slave thread specific action which is outside the galera purview. (Also, it is not an user generated action for Galera to take care of it). May be it is not correct to expect that such actions should be replicated across galera cluster.
------
There are 2 different eco-systems (and both of them operate independently):
1. Master-Slave eco-system
2. Galera eco-system (Multi-Master Eco-system)
Expecting event of one eco-system to trigger series of event in other system doesn't look feasible and creates un-needed dependency path
(especially when these are error events triggered by configuration setting done by user).
I don't foresee we should try to even explore this dependency path.

Krunal Bauskar (krunal-bauskar) wrote :

Some more investigational notes:
Workload is executed on async master. (delete from t where i = 2)
Given that the same workload is already executed independently on async slave re-execution of it by slave thread on async slave will not have any affect.
(Ideally, it should cause SLAVE thread to stop as it has detected in-consistency)
User has configured async-slave to ignore error from this action of async slave allowing trx to proceed (instead of causing it to fail and rollback)
Here is the relevant code function in SLAVE thread that handles masking of error returned.
Rows_log_event::handle_idempotent_and_ignored_errors() ..... sql/log_event.cc
So even if the trx fails it is allowed to proceed as usual which in turn causes increment of MySQL GTID:seqno.
This being a out-of-course action (error correction path) it is not replicated by Galera.
(Action at SE level has failed and so nothing was written to bin-log for Galera to replicate and corrective action which is slave specific
is then applied in SLAVE space after SE confirm the error.)
Another interesting aspect is if I change master workload from point query to "delete from t where i >=2" and keep rest of the things same
given that async slave will fail for first delete condition itself rest of the condition is not executed (that is if followup rows let's say (3), (4) can be deleted but aren't
causing further inconsistency). [Now MASTER has rows with i < 2 .... but SLAVE has rows with i > 2]
(Well trying i <=2 will have different effect as SLAVE will hit error condition at last so object <=2 will be deleted)
In short, if user try to skip such errors then he/she is better aware of the consequence or inconsistency it may cause.

Przemek (pmalkowski) wrote :

OK, it appears that not only skipped replication errors may cause dummy, empty trx on async slave, hence GTID hole on galera peers.

When a row in table with PK is updated first on slave, then same update is done on master, then the PK value to identify the row is still present on the slave, and no replication error happens. But as the other columns have already the same values, no real update happens, but only empty begin/commit. I made the below test without any slave_skip_errors setting.

 CREATE TABLE `g1` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `a` char(10) DEFAULT 'test',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB

--slave:
percona1 mysql> update g1 set a="aa" where id=3;
Query OK, 1 row affected (0.01 sec)
Rows matched: 1 Changed: 1 Warnings: 0

--master:
db1 {root} (test) > update g1 set a="aa" where id=3;
Query OK, 1 row affected (0.02 sec)
Rows matched: 1 Changed: 1 Warnings: 0

-- master's binlog:
| binlog.000009 | 689 | Gtid | 1 | 737 | SET @@SESSION.GTID_NEXT= '12446bf7-3219-11e5-9434-080027079e3d:7206' |
| binlog.000009 | 737 | Query | 1 | 809 | BEGIN |
| binlog.000009 | 809 | Table_map | 1 | 857 | table_id: 71 (test.g1) |
| binlog.000009 | 857 | Update_rows | 1 | 908 | table_id: 71 flags: STMT_END_F |
| binlog.000009 | 908 | Xid | 1 | 939 | COMMIT /* xid=32 */ |

-- slave's binlog:
| binlog.000005 | 1929751 | Gtid | 1 | 1929799 | SET @@SESSION.GTID_NEXT= '12446bf7-3219-11e5-9434-080027079e3d:7206' |
| binlog.000005 | 1929799 | Query | 1 | 1929867 | BEGIN |
| binlog.000005 | 1929867 | Query | 1 | 1929936 | COMMIT |

And the 7206 seqno is missing on cluster peers.

Krunal Bauskar (krunal-bauskar) wrote :

commit 71bf847450c8dc9b1e5992cf4d5b0858ac77646e
Merge: f6f18fc 8181e9e
Author: Krunal Bauskar <email address hidden>
Date: Wed Jan 20 11:55:41 2016 +0530

    Merge pull request #79 from kbauskar/5.6-pxc-489

    - PXC#489: GTID holes created on cluster peers of async slave node (C…

commit 8181e9e9efe5a993977890f3c000d088fb7fd9b8
Author: Krunal Bauskar <email address hidden>
Date: Mon Jan 18 14:36:33 2016 +0530

    - PXC#489: GTID holes created on cluster peers of async slave node (CID#62236)

      Issue:
      -----

      Let's understand the issue using a topology.

      - Independent Master
      - Galera Nodes (node#1 and node#2).
      - Node#1 is acting as async-slave to Independent Master.

      Let's say user execute some action on node#1 these action will get replicated
      in galera-eco-system but at same time it is data is being modified directly
      on slave putting slave out of sync from master.

      Now same action is executed on master. This action is replicated to slave
      but given that slave already has executed the action slave would raise and
      error. This is can be handled by installed error handler that would allow
      slave to suppress such errors.

      There are other use-cases where-in "UPDATE" can cause same record to update
      to same value in which case UPDATE simply doesn't execute the action on slave.

      In all such cases even though action is suppressed GTID sequence update
      still needs to get replicated in galera-eco-system as it is done in
      Master-Slave eco-system.

      (There could be different views on whether event in master-slave eco-system
       should be propogated to galera-eco-system but let's for now assume
       this is needed).

      Solution:
      ---------

      When such a situation arise in Master-Slave eco-system, Slave will create
      an empty GTID event that is then replicated if Slave is acting as Master
      to some other Slave.

      Galera will follow the same. Such empty GTID event will now also be replicated
      in Galera eco-system.

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1868

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers