pt-archiver --bulk-delete does not imply --commit-each as intended
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
New
|
Undecided
|
Unassigned |
Bug Description
pt-archiver version 2.2.15
MySQL version: Percona Server 5.6.28-76.1
Storage Engine: TokuDB
When using --bulk-delete and --limit options together, one would assume that each chunk of rows is committed, based on the documentation.
From pt-archiver 2.2 documentation:
--bulk-delete
Delete each chunk with a single statement (implies --commit-each).
...
--commit-each
...
Commits transactions and flushes --file after each set of rows has been archived, before fetching
the next set of rows, and before sleeping if --sleep is specified. Disables --txn-size;
use --limit to control the transaction size with --commit-each.
...
However, unless you explicitly pass the --commit-each option on the command line in this scenario, pt-archiver is performing a commit on every single row in each chunk. Most use cases probably won't even notice the additional overhead caused by this, but when trying to purge hundreds of millions of rows each day, the added time for this commit overhead becomes significant.
To reproduce:
mysql> use debug;
mysql> create table a (id int unsigned not null primary key auto_increment) ENGINE=TokuDB;
mysql> insert into a values(); -- repeat 100 times
[user@device /]$ pt-archiver --purge --primary-key-only --bulk-delete --bulk-delete-limit --limit 100000 --progress 1000000 --no-check-charset --safe-
Started at 2016-08-
Source: D=debug,
SELECT 99
INSERT 0
DELETE 99
Action Count Time Pct
commit 100 0.0189 65.08
bulk_deleting 1 0.0030 10.20
select 2 0.0015 5.04
other 0 0.0057 19.68
Exiting because there are no more rows.
Now repopulate the table with 100 rows, and run pt-archiver again, but this time explicitly pass the --commit-each option on the command line:
[user@device /]$ pt-archiver --commit-each --purge --primary-key-only --bulk-delete --bulk-delete-limit --limit 100000 --progress 1000000 --no-check-charset --safe-
Started at 2016-08-
Source: D=debug,
SELECT 99
INSERT 0
DELETE 99
Action Count Time Pct
commit 2 0.0010 37.77
select 2 0.0006 22.92
bulk_deleting 1 0.0005 18.09
other 0 0.0005 21.22
Exiting because there are no more rows.
The reason seems to be related to the following code in sub main {}:
if ( $bulk_del || $o->get(
$
}
There are multiple lines further down in sub main {} which then use the variable $commit_each to determine whether or not to commit:
...
my $ac = !$txnsize && !$commit_each;
...
commit($o) unless $commit_each;
...
commit($o, $txnsize || $commit_each);
The problem is that $commit_each is still set to 0 because $o->set(
To fix this, I simply changed:
if ( $bulk_del || $o->get(
$
}
..to now read:
if ( $bulk_del || $o->get(
$commit_each = 1;
}
Now repopulate the table with 100 rows, and run pt-archiver again, but this time do NOT explicitly pass the --commit-each option on the command line:
[user@device /]$ pt-archiver --purge --primary-key-only --bulk-delete --bulk-delete-limit --limit 100000 --progress 1000000 --no-check-charset --safe-
Started at 2016-08-
Source: D=debug,
SELECT 99
INSERT 0
DELETE 99
Action Count Time Pct
commit 2 0.0007 30.59
select 2 0.0005 20.01
bulk_deleting 1 0.0004 17.05
other 0 0.0007 32.35
Exiting because there are no more rows.
SUCCESS!
Percona now uses JIRA for bug reports so this bug report is migrated to: https:/ /jira.percona. com/browse/ PT-1377