ptc 2.0 replicate-check error does not include hostname
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
Fix Released
|
High
|
Daniel Nichter |
Bug Description
Version: http://
Description:
The following error should include the hostname against which the query failed:
1-29T17:00:38 Error checksumming table foo.bar: DBD::mysql::db selectrow_array failed: Table 'percona.checksums' doesn't exist [for Statement "SELECT MAX(chunk) FROM `percona`
How to repeat:
In an environment with some replicas up to date and others very far behind, where the --replicate table does not exist, run pt-table-checksum --replicate --create-
By the time pttc finishes checksumming a table, the laggy replica(s) will not yet have replayed the CREATE TABLE percona.checksums query. Therefor, pttc fails to SELECT from percona.checksums on any replica sufficiently far behind in replication.
Suggested solution:
The error is valid -- this table does not exist. However, the error should inform the user as to which host it occurred on, and possibly indicate why pttc was not able to determine checksum results on that (very lagged) replica.
summary: |
- pttc 2.0 replicate-check error does not include hostname + ptc 2.0 replicate-check error does not include hostname |
Changed in percona-toolkit: | |
importance: | Undecided → High |
assignee: | nobody → Daniel Nichter (daniel-nichter) |
milestone: | none → 2.0-beta1 |
tags: | added: pt-table-checksum |
Changed in percona-toolkit: | |
status: | Fix Committed → Fix Released |
New error:
if ( $o->get('quiet') < 2 ) { {db}.$tbl- >{tbl} to replicate to "
warn ts("Error waiting for the last checksum of table "
. "$tbl->
. "replica " . $slave->name() . ": $EVAL_ERROR\n"
. "Check that the replica is running and has the "
. "replicate table $repl_table. Checking the replica "
. "for checksum differences will probably cause "
. "another error.\n");
}
The last sentence refers to what follows:
eval {
my $diffs = $rc->find_ replication_ differences(
...
}
So if there's an error waiting for the last chunk, the slave is skipped. Then checking for diffs happens and as the warning says, checking for diffs is probably going to cause an error too. Maybe the bad slave will catchup between checks, which is why I made them separate.
In any case, the error messages are more helpful now (with hostname and table) and the tool will continue working despite a bad/extremely lagging slave.