pt-archiver hangs on last rows

Bug #1193474 reported by Moody Youssef on 2013-06-21
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Percona Toolkit

Bug Description

Percona-Toolkit 2.2.2
Percona-Server 5.5.31

When running pt-archiver, I am filtering by created_date on a large table. Anything older than 4 days I am archiving. This is the command I am running within a bash script.

ptarchiver -u $MyUSER -p$MyPASS --source h=$MyHOST,D=$DEST_DB,t=$DEST_TABLE --file $DEST_FILE --why-quit --progress 50000 --charset utf8 --retries 3 --commit-each --limit 2000 --where "CREATED_DATE <= DATE_SUB(NOW(), INTERVAL 4 DAY)"

These are the results :

Starting PT-Archiver NOW
2013-06-21T15:10:58 0 0
2013-06-21T15:11:44 46 50000
2013-06-21T15:12:06 67 100000
2013-06-21T15:12:27 89 150000
2013-06-21T15:12:52 113 200000
2013-06-21T15:13:15 137 250000
2013-06-21T15:13:36 158 300000
2013-06-21T15:13:55 176 350000
2013-06-21T15:14:15 196 400000
2013-06-21T15:14:33 215 450000
2013-06-21T15:14:51 233 500000
2013-06-21T15:16:42 344 502034

For the last ~2k rows, it took 111 seconds to archive. It took 500k rows 233 seconds.

Amol (ajkedar) on 2013-06-21
tags: added: pt-archiver
Daniel Nichter (daniel-nichter) wrote :

Interesting, I've not heard of or seen this before, but we'll try to reproduce it.

summary: - Pt-Archiver hangs on last rows
+ pt-archiver hangs on last rows
Changed in percona-toolkit:
status: New → Confirmed
Amol (ajkedar) wrote :

This is still a problem in
$ dpkg --list | grep percona
ii percona-toolkit 2.2.5-2 Advanced MySQL and system command-line tools
 on ubuntu 12.04 LTS

can i help you in anyway?

Steven (libby-steven) wrote :

We are also having this issue. We have 6 different tables containing syslog data from numerous network devices.
All tables are affected by this behavior, the smallest containing about 4M rows, the largest containing about 2B rows.
During most of the process, we average anywhere between 300K-450K rows archived per minute. It seems that once
the last block of rows is reached, pt-archiver just stalls during the final SELECT and basically spins its wheels
for hours and hours on end.

Complete command-line used to run the tool:
pt-archiver --commit-each --bulk-insert --bulk-delete --limit 1000 --progress 10000 --no-check-charset \
--source h=localhost,S=$liveSock,D=$database,t=$table,u=$user,p=$pass \
--dest h=localhost,S=$archiveSock,D=$database,t=$table,u=$user,p=$pass \
--where "deviceReportedTime < DATE_SUB(NOW(), INTERVAL 35 DAY)"

Tool --version:
pt-archiver 2.2.16

MySQL version of all servers involved:
Percona Server using TokuDB storage engine (tokudb_lzma compression) version 5.6.28-76.1

Output from the tool including STDERR:
2016-04-07T03:00:02 0 0
2016-04-07T03:00:07 5 10000
2016-04-07T03:00:09 7 20000
2016-04-07T03:00:11 9 30000
2016-04-07T03:00:13 11 40000
2016-04-07T03:00:15 13 50000
2016-04-07T03:00:16 14 60000
2016-04-07T03:00:18 16 70000
2016-04-07T03:00:19 17 80000
2016-04-07T03:00:21 19 90000
2016-04-07T03:00:23 21 100000
2016-04-07T03:00:24 22 110000
2016-04-07T03:00:26 24 120000
2016-04-07T03:00:28 26 130000
2016-04-07T03:00:29 27 140000
2016-04-07T03:00:31 29 150000
2016-04-07T03:00:32 30 160000
2016-04-07T03:00:34 31 170000
2016-04-07T03:00:35 33 180000
2016-04-07T03:00:36 34 190000
2016-04-07T03:00:38 36 200000
2016-04-07T03:00:39 37 210000
2016-04-07T03:00:41 39 220000
2016-04-07T03:00:42 40 230000
2016-04-07T03:00:44 42 240000
2016-04-07T03:00:46 43 250000
2016-04-07T03:00:47 45 260000
2016-04-07T03:00:49 46 270000
2016-04-07T03:00:50 48 280000
2016-04-07T03:00:52 49 290000
2016-04-07T03:00:53 51 300000
2016-04-07T03:00:54 52 310000
2016-04-07T03:00:56 54 320000
2016-04-07T03:00:57 55 330000
2016-04-07T07:12:17 15135 340000
2016-04-07T10:39:44 27582 345581

Input files (log/dump/config files, etc.):

Steven (libby-steven) wrote :

After some investigation, I found the root cause of this problem, along with a temporary workaround...

In the pt-archiver perl script, around line 6248, I found the following code:

    my ($val) = $dbh->selectrow_array("SELECT MAX($col) FROM $src->{db_tbl}");

I realized it was using the MAX(id) of the entire table to determine the upper bound that pt-archiver will use when evaluating the final two SELECT statements. However, the WHERE clause I'm passing to pt-archiver is limiting the relevant rows to anything older than 35 days. And since we ingest approximately the same amount of data each day (in some tables, consisting of > 500M rows per day), those final two SELECT statements were looking at the entire table when they really only needed to look at 1/35th of the table.

Knowing only enough about perl to be dangerous, I tried the following modification to that line, and it worked just as I expected. What was taking > 8 hours to run the final SELECT statement is now completing in a matter of seconds:

      my $selective_where = {
          where => $o->get('where')
      my ($val) = $dbh->selectrow_array("SELECT MAX($col) FROM $src->{db_tbl} WHERE $selective_where->{where}");

Note that this is only applicable when your WHERE statement is relatively selective, and the --safe-auto-increment option is enabled (which it is, by default).

CAUTION: There may be some negative implications of using this workaround, as the pt-archiver documentation states the following about --safe-auto-increment: "...This guards against re-using AUTO_INCREMENT values if the server restarts...". However, in our case the benefit FAR outweighs the potential consequences.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers