Update documentation and/or implentation of pt-archiver --check-interval

Bug #1443763 reported by Jaime Sicam
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
Medium
Frank Cizmich

Bug Description

"--check-interval" is not how often you check the lag as the name
indicates, but once there is a lag, how long it sleeps. This conclusion is
based on examining the code:

if ( $lag_dbh ) {
my $lag = $ms->get_slave_lag($lag_dbh);
while ( !defined $lag || $lag > $o->get('max-lag') ) {
PTDEBUG && _d('Sleeping: slave lag is', $lag);
*sleep($o->get('check-interval'));*
$lag = $ms->get_slave_lag($lag_dbh);
}
}

The documentation is incorrect:

--check-interval
type: time; default: 1s
*How often to check for slave lag if "--check-slave-lag" is
given.*

and inconsistent:

--max-lag
type: time; default: 1s

Pause archiving if the slave given by "--check-slave-lag" lags.

This option causes pt-archiver to look at the slave every
time it's about to fetch another row. If the slave's lag
is greater than the option's value, or if the slave isn't
running (so its lag is NULL), *pt-table-checksum sleeps for*
* "--check-interval" seconds and then looks at the lag again.*
It repeats until the slave is caught up, then proceeds to
fetch and archive the row.

To recap:
1. Have the man page fixed regarding the --*check-**interval*. The
name indicates it is check-interval but in reality it is
*"sleep-interval" *inside
the code.
2. To have the code patched so there is a real *check-interval* and
*sleep-interval.*

Revision history for this message
Miguel Angel Nieto (miguelangelnieto) wrote :

Another thing to fix, that the developer has ACKed is that the tool check the replication lag for every row fetched. This is inefficient and should be corrected.

tags: added: pt-archiver
Changed in percona-toolkit:
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Frank Cizmich (frank-cizmich)
milestone: none → 2.3.1
Revision history for this message
Frank Cizmich (frank-cizmich) wrote :

Hello Miguel Angel,

I noticed replication lag checking "aggressiveness" is already reported here:
https://bugs.launchpad.net/percona-toolkit/+bug/1056507

will fix

Changed in percona-toolkit:
milestone: 2.3.1 → none
Changed in percona-toolkit:
status: In Progress → Fix Committed
milestone: none → 2.2.15
Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-679

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.