ptc 2.0 replicate-check error does not include hostname

Bug #897961 reported by aeva black
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
High
Daniel Nichter

Bug Description

Version: http://bazaar.launchpad.net/~percona-toolkit-dev/percona-toolkit/pt-table-checksum-2.0/revision/240

Description:

The following error should include the hostname against which the query failed:

1-29T17:00:38 Error checksumming table foo.bar: DBD::mysql::db selectrow_array failed: Table 'percona.checksums' doesn't exist [for Statement "SELECT MAX(chunk) FROM `percona`.`checksums` WHERE db='foo' AND tbl='bar' AND master_crc IS NOT NULL"] at bin/pt-table-checksum2.0-beta0 line 6568.

How to repeat:

In an environment with some replicas up to date and others very far behind, where the --replicate table does not exist, run pt-table-checksum --replicate --create-replicate-table --check-slave-lag=<an_up_to_date_replica>.

By the time pttc finishes checksumming a table, the laggy replica(s) will not yet have replayed the CREATE TABLE percona.checksums query. Therefor, pttc fails to SELECT from percona.checksums on any replica sufficiently far behind in replication.

Suggested solution:

The error is valid -- this table does not exist. However, the error should inform the user as to which host it occurred on, and possibly indicate why pttc was not able to determine checksum results on that (very lagged) replica.

summary: - pttc 2.0 replicate-check error does not include hostname
+ ptc 2.0 replicate-check error does not include hostname
Changed in percona-toolkit:
importance: Undecided → High
assignee: nobody → Daniel Nichter (daniel-nichter)
milestone: none → 2.0-beta1
tags: added: pt-table-checksum
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

New error:

            if ( $o->get('quiet') < 2 ) {
               warn ts("Error waiting for the last checksum of table "
                  . "$tbl->{db}.$tbl->{tbl} to replicate to "
                  . "replica " . $slave->name() . ": $EVAL_ERROR\n"
                  . "Check that the replica is running and has the "
                  . "replicate table $repl_table. Checking the replica "
                  . "for checksum differences will probably cause "
                  . "another error.\n");
            }

The last sentence refers to what follows:

               eval {
                  my $diffs = $rc->find_replication_differences(
...

                  if ( $o->get('quiet') < 2 ) {
                     warn ts("Error checking for checksum differences of table "
                        . "$tbl->{db}.$tbl->{tbl} on replica " . $slave->name()
                        . ": $EVAL_ERROR\n"
                        . "Check that the replica is running and has the "
                        . "replicate table $repl_table.\n");
                  }

So if there's an error waiting for the last chunk, the slave is skipped. Then checking for diffs happens and as the warning says, checking for diffs is probably going to cause an error too. Maybe the bad slave will catchup between checks, which is why I made them separate.

In any case, the error messages are more helpful now (with hostname and table) and the tool will continue working despite a bad/extremely lagging slave.

Changed in percona-toolkit:
status: New → Fix Committed
Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-285

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.