pt-table-checksum doesn't reconnect the slave $dbh
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
Fix Released
|
High
|
Daniel Nichter |
Bug Description
When replication is very delayed, pt-table-checksum will not keep its connection to the replica [was:master] alive, and when the replica catches up or if it dies for some reason, we get an error. It looks like this:
================
08-27T09:44:10 Error waiting for the last checksum of table <...> to replicate to replica <...>: DBD::mysql::db selectrow_array failed: MySQL server has gone away [for Statement "SELECT MAX(chunk) FROM `percona`
Check that the replica is running and has the replicate table `percona`
08-27T09:44:10 Error checking for checksum differences of table <...> on replica <...>: DBD::mysql::db selectall_arrayref failed: MySQL server has gone away [for Statement "SELECT CONCAT(db, '.', tbl) AS `table`, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(
Check that the replica is running and has the replicate table `percona`
================
I think the tool needs to reconnect to replicas.
[redacted: I think the tool needs to do a keepalive SELECT 1 or something like that.]
Changed in percona-toolkit: | |
status: | Confirmed → In Progress |
assignee: | nobody → Daniel Nichter (daniel-nichter) |
importance: | Undecided → High |
Changed in percona-toolkit: | |
milestone: | none → 2.2.15 |
Changed in percona-toolkit: | |
status: | Fix Committed → Fix Released |
Changed in percona-toolkit: | |
importance: | High → Medium |
importance: | Medium → High |
I wonder what would happen if, instead of keeping the connection alive, we used $dbh->{ mysql_auto_ reconnect} = 1. Does anyone have any experience with that?