Comment 4 for bug 1193474

Revision history for this message
Steven (libby-steven) wrote :

After some investigation, I found the root cause of this problem, along with a temporary workaround...

In the pt-archiver perl script, around line 6248, I found the following code:

    my ($val) = $dbh->selectrow_array("SELECT MAX($col) FROM $src->{db_tbl}");

I realized it was using the MAX(id) of the entire table to determine the upper bound that pt-archiver will use when evaluating the final two SELECT statements. However, the WHERE clause I'm passing to pt-archiver is limiting the relevant rows to anything older than 35 days. And since we ingest approximately the same amount of data each day (in some tables, consisting of > 500M rows per day), those final two SELECT statements were looking at the entire table when they really only needed to look at 1/35th of the table.

Knowing only enough about perl to be dangerous, I tried the following modification to that line, and it worked just as I expected. What was taking > 8 hours to run the final SELECT statement is now completing in a matter of seconds:

      my $selective_where = {
          where => $o->get('where')
      };
      my ($val) = $dbh->selectrow_array("SELECT MAX($col) FROM $src->{db_tbl} WHERE $selective_where->{where}");

Note that this is only applicable when your WHERE statement is relatively selective, and the --safe-auto-increment option is enabled (which it is, by default).

CAUTION: There may be some negative implications of using this workaround, as the pt-archiver documentation states the following about --safe-auto-increment: "...This guards against re-using AUTO_INCREMENT values if the server restarts...". However, in our case the benefit FAR outweighs the potential consequences.