pt-archiver causing statements to appear non-deterministic
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
Won't Fix
|
Medium
|
Unassigned |
Bug Description
When using binlog_format=MIXED the pt-archiver's delete statements are being saved as ROW format. This can be problematic so I attempted to force statement format using --set-vars binlog_
I am using pt-archiver 3.0.4:
I created and filled a table with:
CREATE TABLE `joinit` (
`i` int(11) NOT NULL AUTO_INCREMENT,
`s` varchar(64) DEFAULT NULL,
`t` time NOT NULL,
`g` int(11) NOT NULL,
PRIMARY KEY (`i`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
INSERT INTO joinit VALUES (NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )));
INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit;
The pt-archiver I test with was:
pt-archiver --source h=localhost,
The binlog entries were similar to:
# at 499
#171018 12:23:11 server id 1 end_log_pos 679 CRC32 0x2f3266cd Query thread_id=4 exec_time=0 error_code=0
SET TIMESTAMP=
DELETE FROM `percona`.`joinit` WHERE (((`i` >= '158'))) AND (((`i` <= '320'))) AND (i<500) LIMIT 100
Assuming the limit 100 is necessary would mean this could cause different results on the slave than a master, especially since it does not include an order by clause. If the limit 100 is redundant, which is my assumption, then the limit should not be added.
Changed in percona-toolkit: | |
importance: | Undecided → Medium |
status: | New → Triaged |
status: | Triaged → Won't Fix |
Pt-archiver attempts to make the statements deterministic, however there are situations where this is not feasible.
suppose you have this table:
CREATE SCHEMA IF NOT EXISTS `test`;
USE `test`;
DROP TABLE IF EXISTS `t1`;
CREATE TABLE `test`.`t1` (
t1id INT(11) NOT NULL,
t2id INT(11) NOT NULL,
t3id INT(11) NOT NULL,
description varchar(30),
KEY idx1 (t1id, t2id , t3id)
) ENGINE=InnoDB;
INSERT INTO `test`.`t1` VALUES (1,1,1,"a"), (1,1,2,"b"), (1,1,3,"c");
pt_archiver:6338 15477 Index for DELETE: idx1
# TableNibbler:3098 15477 Will ascend index idx1
# TableNibbler:3110 15477 Will ascend columns t1id, t2id, t3id
# TableNibbler:3121 15477 Will ascend, in ordinal position: 0, 1, 2
pt-archiver will use the composite index for the deletes, however it is not possible to calculate the exact range.
In the example provided the primary key is a single int field, and so pt-archiver is able to identify the appropriate range, and the limit clause is redundant. There is an option to disable the limit clause, however it should be used when you are absolutely certain that the statements will be deterministic.
from the docs: delete- limit. --limit must still be specified.
This is an advanced option and you should not disable it unless you know what you are doing and why! By default, --bulk-delete appends a --limit clause to the bulk delete SQL statement. In certain cases, this clause can be omitted by specifying --no-bulk-