pt-table-checksum deadlock
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
Fix Released
|
Medium
|
Frank Cizmich |
Bug Description
Percona Toolkik 2.2.6
We currently using pt-table-checksum to check data integrity on all the oltp cluster, we have 6 servers there with 8k qps and we start to get deadlocks on one of the biggest tables when this tool try to get the checksum on some chunks. I already drop chunk time, but it doesn't fix completly the problem :-/.
Command Line used:
pt-table-checksum --host=
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
Checksumming tewn.landing_
02-28T15:30:44 Error checksumming table tewn.landing_
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
02-28T15:30:44 1 0 80558617 16297 0 886.019 tewn.landing_
May suggest something, I was checking this script and MySQL suggest when you have this kind of deadlock you must to retry to execute again this sentence, but in pt-table-checksum script line 10510 we have "fail" function which is evaluating kind of error so, retry argument option for the tool will retry to execute the last sql statement, like I said MySQL suggest to retry to execute, so with this modification this could be fixed.
fail => sub {
my (%args) = @_;
my $error = $args{error};
if ( $error =~ m/Lock wait timeout exceeded/
|| $error =~ m/Query execution was interrupted/
|| $error =~ m/Deadlock found when trying to get lock/
) {
return 1;
}
Related branches
- Daniel Nichter: Approve
-
Diff: 75 lines (+8/-6)5 files modifiedbin/pt-table-checksum (+4/-2)
t/pt-table-checksum/samples/default-results-5.5.txt (+1/-1)
t/pt-table-checksum/samples/default-results-5.6.txt (+1/-1)
t/pt-table-checksum/samples/static-chunk-size-results-5.5.txt (+1/-1)
t/pt-table-checksum/samples/static-chunk-size-results-5.6.txt (+1/-1)
Changed in percona-toolkit: | |
milestone: | none → 2.2.10 |
status: | New → Fix Committed |
assignee: | nobody → Frank Cizmich (frank-cizmich) |
importance: | Undecided → Medium |
Changed in percona-toolkit: | |
status: | Fix Committed → Fix Released |
Patch for this problem