pt-slave-restart skips binlog position on lock timeout
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
By default, when the error numbers are not explicitly listed with --error-numbers, pt-slave-restart uses SQL_SLAVE_
IMHO for the slave thread stopped on lock timeout, the tool should just do 'start slave sql_thread' couple of times before taking further action.
Then, either abort with warning that some user thread is blocking the slave sql_thread, or skip the event. Though the second option I think should be only used when user consciously forces it knowing it may (or almost for sure it will) introduce new data inconsistencies on the slave. Hence a new option like '--force-
Current behaviour example(version 2.2.14):
$ pt-slave-restart --user=msandbox --password=msandbox -S /tmp/mysql_
2015-06-09T04:41:11 S=/tmp/
2015-06-09 04:37:01 5614 [Warning] Slave SQL: Could not execute Write_rows event on table test.address_
2015-06-09 04:37:01 5614 [ERROR] Slave SQL: Slave SQL thread retried transaction 10 time(s) in vain, giving up. Consider raising the value of the slave_transacti
2015-06-09 04:37:01 5614 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
2015-06-09 04:37:01 5614 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000005' position 95652
2015-06-09 04:41:11 5614 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000005' at position 95652, relay log './mysql_
2015-06-09 04:41:11 5614 [Note] 'SQL_SLAVE_
description: | updated |
Changed in percona-toolkit: | |
status: | New → Confirmed |
Percona now uses JIRA for bug reports so this bug report is migrated to: https:/ /jira.percona. com/browse/ PT-1290