innobackupex --slave-info doesn't handle slave_parallel_workers>0

Bug #1372679 reported by Kolbe
28
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Percona XtraBackup moved to https://jira.percona.com/projects/PXB
Fix Released
Medium
Alexey Kopytov
2.1
Won't Fix
Undecided
Unassigned
2.2
Fix Released
Medium
Alexey Kopytov
2.3
Fix Released
Medium
Alexey Kopytov

Bug Description

From http://dev.mysql.com/doc/refman/5.6/en/show-slave-status.html:

  "When using a multi-threaded slave (by setting slave_parallel_workers to a nonzero value in MySQL 5.6.3 and later), the value in [the Exec_Master_Log_Pos] column actually represents a “low-water” mark, before which no uncommitted transactions remain. Because the current implementation allows execution of transactions on different databases in a different order on the slave than on the master, this is not necessarily the position of the most recently executed transaction."

So, it is not safe to rely on Exec_Master_Log_Pos for "latest executed position in master binary log" if parallel replication is used, i.e. slave_parallel_workers>0.

innobackupex should refuse to allow the use of the --slave-info option if slave_parallel_workers>0.

A new option to innobackupex could be offered that would execute a sequence such as this:

SET @old_slave_parallel_workers:=@@slave_parallel_workers;
SET GLOBAL slave_parallel_workers=0;
STOP SLAVE;
START SLAVE;
...execute backup logic...
SET GLOBAL slave_parallel_workers=@old_slave_parallel_workers;
STOP SLAVE;
START SLAVE;

Related branches

Revision history for this message
Nilnandan Joshi (nilnandan-joshi) wrote :

I have tried to check it with slave_parallel_workers=4 and yes, backup silently starts without warnings and this is a problem.
Either it should not allow to use --slave-info OR we can use above sequence to take backup.

Changed in percona-xtrabackup:
status: New → Confirmed
Revision history for this message
Alexey Kopytov (akopytov) wrote :

Exec_Master_Log_Pos indeed cannot be trusted with slave_parallel_workers > 0. Currently no backup utilities warn users about this fact (I have checked mysqldump, mydumper and mylvmbackup).

The only available option to clone a multi-threaded slave seems to be using GTID (though I could not find any confirmation in the MySQL manual).

I'm not sure about the suggested workaround. Does setting slave_parallel_workers to 0 and then resetting the slave guarantee that Exec_Master_Log_Pos will correspond to the most recently executed transaction? The manual says the following:

"START SLAVE UNTIL SQL_AFTER_MTS_GAPS should be used before switching the slave from multi-threaded mode to single-threaded mode (that is, when resetting slave_parallel_workers back to 0 from a positive, nonzero value) after slave has failed with errors in multi-threaded mode. "

If the slave is GTID-enabled, write_slave_info() will automatically write "SET GLOBAL gtid_purged=...; CHANGE MASTER TO MASTER_AUTO_POSITION=1" to xtrabackup_slave_info, i.e. Exec_Master_Log_Pos will not be used.

I'm going to add a check to innobackupex so that if slave_parallel_works is non-zero and GTID is not enabled, the --slave-info option would fail with a descriptive error.

Revision history for this message
Kolbe (kolbe) wrote :

I'm not sure about the GTID case.

I found that using --safe-slave-backup is also OK, since STOP SLAVE causes all worker threads to execute up to a particular position and stop. If you add a check to innobackupex, please allow --slave-info if --safe-slave-backup is enabled.

Revision history for this message
Alexey Kopytov (akopytov) wrote :

I found that STOP SLAVE does not guarantee any specific position for Exec_Master_Log_Pos, which means --safe-slave-backup is not a solution to this problem.

See http://bugs.mysql.com/bug.php?id=74528

Revision history for this message
Rick Pizzi (pizzi) wrote :

Hi Alexey,

since Oracle confirmed that stopping the slave SQL thread is actually safe... can you please allow xtrabackup to use --slave-info when --safe-slave-backup is also there, in presence of MTS?

Thanks
Rick

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXB-704

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.