Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC

Async slave fails after master full SST

Bug #1710297 reported by Juan Pablo Arruti on 2017-08-11

This bug affects 2 people

	Status	Importance	Assigned to
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC	Status tracked in 5.6
5.6	Invalid	Undecided	Unassigned
5.7	Invalid	Undecided	Unassigned

Bug Description

Tested in 5.6.36-82.0-56-log and 5.7.18-15-57-log Percona XtraDB Cluster.

When a master async replication node from a percona cluster is shutdown and writes occurs in others nodes, then when starting up again the master replication node doing full SST produces following error in slave connected to this node:

Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

Test case:

1) Bootstrap a master cluster (2 nodes) with following settings:

server_id=200
log_bin=percona-bin
log_slave_updates
binlog_format=ROW
enforce_gtid_consistency=1
gtid_mode=on

2) Start a slave with following settings:

server_id=201
log_bin=percona-bin
log_slave_updates
binlog_format=ROW
enforce_gtid_consistency=1
gtid_mode=on

3) Point async-slave from server 201 to node 1 of 200.

4) Shutdown node 1 of the master cluster

5) Dropped all files in datadir from node 1 to force SST

6) Execute some writes in node 2 of the master cluster

7) Start node 1 of the master cluster

After failed connection attempts the slave threads shows the following error in mysql error log:

2017-08-11T19:39:36.389754Z 14 [ERROR] Error reading packet from server for channel '': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
2017-08-11T19:39:36.389767Z 14 [ERROR] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
2017-08-11T19:39:36.389787Z 14 [Note] Slave I/O thread exiting for channel '', read up to log 'percona-bin.000012', position 194

Possible workarounds are:

1) Avoid SST, IST doesn't show the same behavior.
2) Change master to the PXC node that generated new rows.

Tags:

Revision history for this message

Nickolay Ihalainen (ihanick) wrote on 2017-08-12:

Verified as described.

> 2) Change master to the PXC node that generated new rows.

Change master to any node, except for "recovered by SST" one.
Slave could be re-attached to original master node after receiving all "offline" changes.

GTID auto position require binary logs with corresponding events to work.

Currently PXC nodes could have different name/location for binary logs.
Binary logs could have significant size and increase SST time.

Nickolay Ihalainen (ihanick) on 2017-08-12

tags:

added: i201425

Revision history for this message

Krunal Bauskar (krunal-bauskar) wrote on 2017-08-14:

Not sure if anything can be done from PXC side.

Solution lies in complete solution setup where-in ProxySQL or whatever you use should re-direct the async slave to point to a member of cluster if existing async master fails or shutdown.
Re-direction can be done only after the original master is back in SYNCED state.

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-18:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1998

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.