pt-archiver misses data when only copying data

Bug #1096274 reported by Pritam Mungse on 2013-01-05
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Toolkit
Undecided
Unassigned

Bug Description

when using '--no-delete' option in pt-archiver (to only copy data to destination) if the where clause column can have multiple records then pt-archiver will miss records in copying.
e.g. create table with following columns:
id, timestamp
data:
| 001-25941882 | 1280490242452 |
| 001-25941883 | 1280490242452 |
| 001-25941884 | 1280490242452 |
| 001-25941885 | 1280490242452 |
| 001-25941886 | 1280495048106 |
| 001-25941887 | 1280495048108 |
| 001-25942993 | 1280495048108 |
| 001-25942994 | 1280495048108 |
| 001-25942995 | 1280490776956 |
| 001-25942996 | 1280493558772 |

use batch size of 2 (limit) and with commit-each option, the pt-archiver will miss copying few records.
pt version- 2.1.7
mysql version - 5.1.40
command:
/usr/bin/pt-archiver --source h=127.0.0.1,P=3306,D=test_db_1,u=root,p=xxx,t=test,i=idx_timestamp,A=utf8 --dest h=127.0.0.1,P=3306,D=test_db_2,u=root,p=xxx,t=test,A=utf8 --where "created < 1280493558772 " --limit 2--sentinel /tmp/sentinel --set-vars "SQL_MODE=''" --no-delete --commit-each --skip-foreign-key-checks --no-safe-auto-increment --why-quit --statistics --progress 2

It's pretty easy to reproduce and happens due to optimized SQL (which add in above case timestamp > [max value in last run]) in the next sql clause - which cause the next run to miss few records.

Daniel Nichter (daniel-nichter) wrote :

Thanks for looking into this. We'll try to reproduce it on our end too. Bug 820079 is similar, though unconfirmed yet.

tags: added: risk
Changed in percona-toolkit:
status: New → Triaged
Karoly Bujtor (bujtor-karoly) wrote :

It might be caused by the WHERE expression in file pt-archive on line 5768 which uses a less than comparision instead of less than equal.

walito (walito) wrote :

I have the same problem.
I'm trying to copy all table from serverA to serverB, this process copy all rows, but the last one, this don't copy and then I need insert this last row manually.
I tested in pt-archiver 2.2.4 and 2.2.8

Steps
- In serverA
mysql> use sample
Database changed
mysql> show tables;
+------------------+
| Tables_in_sample |
+------------------+
| random_data |
+------------------+
1 row in set (0.00 sec)

mysql> select count(*) from random_data;
+----------+
| count(*) |
+----------+
| 30000 |
+----------+
1 row in set (0.06 sec)

mysql> select * from random_data order by id desc limit 5;
+-------+-------+-------+-------+-------+
| id | c1 | c2 | c3 | c4 |
+-------+-------+-------+-------+-------+
| 30000 | 67938 | 80318 | 97774 | 47916 | <------------------------- this row not copy
| 29999 | 50083 | 36562 | 32562 | 53125 |
| 29998 | 74207 | 20502 | 79891 | 37951 |
| 29997 | 69171 | 88837 | 36671 | 16844 |
| 29996 | 75947 | 45070 | 97510 | 52340 |
+-------+-------+-------+-------+-------+
5 rows in set (0.00 sec)

- Run pt-archiver in serverA
pt-archiver --source h=localhost,D=sample,t=random_data --where "1=1" --dest h=192.168.88.22 --user sync --ask-pass --txn-size 10000 --no-delete --progress 10000 --limit 10000
Enter password:
Enter password:
TIME ELAPSED COUNT
2014-06-05T08:18:00 0 0
2014-06-05T08:18:10 9 10000
2014-06-05T08:18:20 19 20000
2014-06-05T08:18:29 28 29999

- Test serverB
mysql> select * from random_data order by id desc limit 5;
+-------+-------+-------+-------+-------+
| id | c1 | c2 | c3 | c4 |
+-------+-------+-------+-------+-------+
| 29999 | 50083 | 36562 | 32562 | 53125 |
| 29998 | 74207 | 20502 | 79891 | 37951 |
| 29997 | 69171 | 88837 | 36671 | 16844 |
| 29996 | 75947 | 45070 | 97510 | 52340 |
| 29995 | 82478 | 74942 | 27276 | 11555 |
+-------+-------+-------+-------+-------+
5 rows in set (0.00 sec)

mysql> select count(*) from random_data;
+----------+
| count(*) |
+----------+
| 29999 |
+----------+
1 row in set (0.02 sec)

Thanks in advance

Code in recent version, 2.2.14, still says:

   5804 if ( $o->get('safe-auto-increment')
   5805 && $sel_stmt->{index}
   5806 && scalar(@{$src->{info}->{keys}->{$sel_stmt->{index}}->{cols}}) == 1
   5807 && $src->{info}->{is_autoinc}->{
   5808 $src->{info}->{keys}->{$sel_stmt->{index}}->{cols}->[0]
   5809 }
   5810 ) {
   5811 my $col = $q->quote($sel_stmt->{scols}->[0]);
   5812 my ($val) = $dbh->selectrow_array("SELECT MAX($col) FROM $src->{db_tbl}");
   5813 $first_sql .= " AND ($col < " . $q->quote_val($val) . ")";
   5814 }

So, "last" row is missing.

tags: added: i57076
Dave Gregg (dgregg) wrote :

I have a similar issue....
when there are records remaining that are LESS than the LIMIT - they are not copied.

In the following - there are 20111 records in the data set under the WHERE condition. But I only end up with
20000 records copied over.

pt-archiver --source h=10.0.10.140,D=db,t=tbl,u=perconatool,p=pswd--dest h=10.0.2.116,D=db,t=tbl_2014Q3_Cont,u=perconatool,p=pswd--where "DTime<='2014-08-19' and DTime>'2014-07-30'" --limit 500 --txn-size 500 --statistics --progress 500 --max-lag 10 --retries 10 --no-delete --password pswd --check-slave-lag h=10.0.10.137 --check-slave-lag h=10.0.10.138 --check-slave-lag h=10.0.10.139

I have played with this for a while now....seems that every time - if the LIMIT is set higher than the remaining records - it will just copy over everything except for the very last records that are under the LIMIT.

Also if you attempt to copy over a small number of records - say 100 and your LIMIT is set to 500 - the tool will not Copy ANYTHING! It just hangs!!!

Dave Gregg (dgregg) wrote :

Also this does not matter if you are just copying or trying to Delete....
it still works the same - it leaves the remaining records LESS the LIMIT value.

Tried ROW by ROW and I tried BULK - both are the same result. Records are left behind!

Dave Gregg (dgregg) wrote :

Here is my example BULK COPY and DELETE -

pt-archiver --source h=10.0.10.140,D=db,t=table,u=perconatool,p=pswd--dest h=10.0.2.116,D=DB,t=table,u=perconatool,p=pswd,L=yes --where "DTime<='2014-08-21' and DTime>'2014-07-30'" --limit 400 --txn-size 400 --statistics --progress 1000 --max-lag 10 --retries 10 --bulk-insert --bulk-delete --password pswd --check-slave-lag h=10.0.10.137 --check-slave-lag h=10.0.10.138 --check-slave-lag h=10.0.10.139

Dave Gregg (dgregg) wrote :

Another thing - the tool just HANGS after if finishes the copy or delete.......hangs forever. You need to kill it.

Frank Cizmich (frank-cizmich) wrote :

I'm hearing you Dave. As soon as I have a moment I'll look into this.
Meanwhile if you can provide a debug log that would be great. i.e: PTDEBUG=1 pt-archiver params > logfile

Dave Gregg (dgregg) wrote :

Also - this is 2.2.14 - latest release.
Interestingly - sometimes the jobs just hang right away....as below....after 10 minutes it times out......
THEN - if I restart the job again - it will work??????

TIME ELAPSED COUNT
2015-07-31T16:33:14 0 0
Exhausted retries while bulk_deleting; rolling back 0 rows.
2015-07-31T16:33:15 0 400
Started at 2015-07-31T16:33:14, ended at 2015-07-31T16:42:36
Source: D=DB,h=10.0.10.140,p=...,t=table,u=perconatool
Dest: D=DB,L=yes,h=10.0.2.116,p=...,t=Summary_2014Q3_Cont,u=perconatool
SELECT 400
INSERT 400
DELETE 0
Action Count Time Pct
select 2 0.5062 0.09
rollback 2 0.0568 0.01
bulk_inserting 1 0.0427 0.01
commit 4 0.0020 0.00
print_bulkfile 400 0.0013 0.00
other 0 561.6044 99.89

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers