Percona XtraBackup moved to https://jira.percona.com/projects/PXB

Safe-slave-backups stucks with long query

Bug #1717158 reported by markus_albe on 2017-09-14

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Percona XtraBackup moved to https://jira.percona.com/projects/PXB	Status tracked in 2.4
	2.3	Fix Released	Wishlist	Vasily Nemkov	Percona XtraBackup moved to https://jira.percona.com/projects/PXB 2.3.10
	2.4	Fix Released	Wishlist	Vasily Nemkov	Percona XtraBackup moved to https://jira.percona.com/projects/PXB 2.4.9

Bug Description

When using --safe-slave-backup the following can happen:

t1: temporary table is created
t2: long DML starts
t3: safe-slave-backup STOP SLAVE arrives
t4: slave SQL thread is killed and long DML is rollback'ed.
t5: safe-slave-backup START SLAVE arrives... and restarts where long DML starts.
t6 safe-slave-backup sleeps a bit...but far less than the long DML, so it will STOP SLAVE again while the long DML is still running, repeating the above ad-eternum.

Proposed fix would be for safe-slave-backup to check position when it stops slave, and make sure replication position has moved forward since it last stopped. This way at least we won't loop over the exact same query every time, depleting all retries.

Tags:

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2017-09-14:

#1

It makes sense to me. Before executing STOP SLAVE we will check if binlog position has changed and if it didn't, we will wait for 3 more seconds. However, if master running

CRETATE tmp_table; INSERT INTO SELECT; DROP tmp_table

in loop, it will not help much.

Chances to catch the moment right after tmp_table is dropped are small.

Revision history for this message

markus_albe (markus-albe) wrote on 2017-09-18:

#2

Yeah, the loop with CREATE TEMPORARY...; INSERT...SELECT; DROP TEMPORARY was an example to show the effect; I reckon it would be impossible to make --safe-slave-backup work in such case. Guess the best alternative (better than waiting for position to advance) is to inspect binlog and find the DROP TEMPORARY, stop immediately after and check... This would be the ultimate fix, as it only makes sense to check open_slave_temp_tables again after a DROP TEMPORARY.

Revision history for this message

Vasily Nemkov (vasily.nemkov) wrote on 2017-10-11:

#3

PR: https://github.com/percona/percona-xtrabackup/pull/441

Revision history for this message

Vasily Nemkov (vasily.nemkov) wrote on 2017-10-23:

#4

PR to 2.4: https://github.com/percona/percona-xtrabackup/pull/442

Revision history for this message

Vasily Nemkov (vasily.nemkov) wrote on 2017-10-27:

#5

PR merged to 2.3

Revision history for this message

Vasily Nemkov (vasily.nemkov) wrote on 2017-10-27:

#6

PR merge to 2.4

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-20:

#7

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXB-1039

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.