Percona Toolkit moved to https://jira.percona.com/projects/PT

pt-slave-restart skips binlog position on lock timeout

Bug #1463314 reported by Przemek on 2015-06-09

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Percona Toolkit moved to https://jira.percona.com/projects/PT	Confirmed	Undecided	Unassigned

Bug Description

By default, when the error numbers are not explicitly listed with --error-numbers, pt-slave-restart uses SQL_SLAVE_SKIP_COUNTER=1 before resuming replication, even or errors like 'Lock wait timeout exceeded' (1205). Such behaviour is dangerous and introduces data inconsistencies on the slave.
IMHO for the slave thread stopped on lock timeout, the tool should just do 'start slave sql_thread' couple of times before taking further action.
Then, either abort with warning that some user thread is blocking the slave sql_thread, or skip the event. Though the second option I think should be only used when user consciously forces it knowing it may (or almost for sure it will) introduce new data inconsistencies on the slave. Hence a new option like '--force-skip-on-lock-timout' maybe could be added.

Current behaviour example(version 2.2.14):
$ pt-slave-restart --user=msandbox --password=msandbox -S /tmp/mysql_sandbox20485.sock
2015-06-09T04:41:11 S=/tmp/mysql_sandbox20485.sock,p=...,u=msandbox mysql_sandbox20485-relay-bin.000026 6987 1205

2015-06-09 04:37:01 5614 [Warning] Slave SQL: Could not execute Write_rows event on table test.address_detail; Lock wait timeout exceeded; try restarting transaction, Error_code: 1205; handler error HA_ERR_LOCK_WAIT_TIMEOUT; the event's master log mysql-bin.000005, end_log_pos 97297, Error_code: 1205
2015-06-09 04:37:01 5614 [ERROR] Slave SQL: Slave SQL thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable. Error_code: 1205
2015-06-09 04:37:01 5614 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
2015-06-09 04:37:01 5614 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000005' position 95652
2015-06-09 04:41:11 5614 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000005' at position 95652, relay log './mysql_sandbox20485-relay-bin.000026' position: 6987
2015-06-09 04:41:11 5614 [Note] 'SQL_SLAVE_SKIP_COUNTER=1' executed at relay_log_file='./mysql_sandbox20485-relay-bin.000026', relay_log_pos='6987', master_log_name='mysql-bin.000005', master_log_pos='95652' and new position at relay_log_file='./mysql_sandbox20485-relay-bin.000026', relay_log_pos='8663', master_log_name='mysql-bin.000005', master_log_pos='97328'

See original description

Tags:

Przemek (pmalkowski) on 2015-06-09

description:

updated

Valerii Kravchuk (valerii-kravchuk) on 2015-06-18

Changed in percona-toolkit:
status:	New → Confirmed

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-24:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-1290

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.