Async slave fails after master full SST
Affects | Status | Importance | Assigned to | Milestone | ||
---|---|---|---|---|---|---|
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC | Status tracked in 5.6 | |||||
5.6 |
Invalid
|
Undecided
|
Unassigned | |||
5.7 |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Tested in 5.6.36-82.0-56-log and 5.7.18-15-57-log Percona XtraDB Cluster.
When a master async replication node from a percona cluster is shutdown and writes occurs in others nodes, then when starting up again the master replication node doing full SST produces following error in slave connected to this node:
Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_
Test case:
1) Bootstrap a master cluster (2 nodes) with following settings:
server_id=200
log_bin=percona-bin
log_slave_updates
binlog_format=ROW
enforce_
gtid_mode=on
2) Start a slave with following settings:
server_id=201
log_bin=percona-bin
log_slave_updates
binlog_format=ROW
enforce_
gtid_mode=on
3) Point async-slave from server 201 to node 1 of 200.
4) Shutdown node 1 of the master cluster
5) Dropped all files in datadir from node 1 to force SST
6) Execute some writes in node 2 of the master cluster
7) Start node 1 of the master cluster
After failed connection attempts the slave threads shows the following error in mysql error log:
2017-08-
2017-08-
2017-08-
Possible workarounds are:
1) Avoid SST, IST doesn't show the same behavior.
2) Change master to the PXC node that generated new rows.
tags: | added: i201425 |
Verified as described.
> 2) Change master to the PXC node that generated new rows.
Change master to any node, except for "recovered by SST" one.
Slave could be re-attached to original master node after receiving all "offline" changes.
GTID auto position require binary logs with corresponding events to work.
Currently PXC nodes could have different name/location for binary logs.
Binary logs could have significant size and increase SST time.