pt-table-sync division by zero error with varchar primary key

Bug #1034717 reported by Ryan Brothers
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
High
Brian Fraser

Bug Description

When using pt-table-sync 2.1.2 to sync a table with a varchar primary key, I am receiving the following errors at times:

- Failed to prepare TableSyncChunk plugin: Illegal division by zero at /usr/bin/pt-table-sync line 3937. while doing db2.table1 on 127.0.0.1

- Failed to prepare TableSyncChunk plugin: Use of uninitialized value in join or string at /usr/bin/pt-table-sync line 3954. while doing db2.table1 on 127.0.0.1

I attached a reproduce table for the "Illegal division by zero" error, but I'm having difficulty narrowing down a reproduce table for the 2nd error without it having thousands of rows. I believe both issues relate to the distribution of values in some form as it seems to be very intermittent based on the data in my table.

My sync script is:

pt-table-sync --execute h=localhost,P=3306,u=test,p=test,D=db1,t=table1 D=db2

Please let me know if this is enough to investigate, or if you need more details. Thanks.

Related branches

Revision history for this message
Ryan Brothers (ryan-brothers) wrote :
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

Reproduced as follows:

mysql -e 'create database test1;'
mysql test < reproduce.sql
mysql test1 < reproduce.sql

pt-table-sync --execute h=localhost,P=3306,D=test,t=table1 D=test1

It is failing here: my $highest_power = floor(log($n)/log($base)); when $base=1 the denom. is zero. That may be because the ord of '1001' and '10873' (the first and last column values) is same.

This seems to be fixing it:

=========================

diff -u bin/pt-table-sync /tmp/pt-table-sync
--- bin/pt-table-sync 2012-08-08 11:52:07.869453000 +0530
+++ /tmp/pt-table-sync 2012-08-09 16:51:41.673466697 +0530
@@ -4438,7 +4438,7 @@
    }
    my ($n, $base, $symbols) = @args{@required_args};

- return $symbols->[0] if $n == 0;
+ return $symbols->[0] if $n == 0 || $base == 1;

    my $highest_power = floor(log($n)/log($base));
    if ( $highest_power == 0 ){

=================

Tested with different values.

Changed in percona-toolkit:
status: New → Confirmed
Revision history for this message
Ryan Brothers (ryan-brothers) wrote :

Thanks for your help. Attached in reproduce2.sql is a reproduce table for the 2nd error above "Use of uninitialized value in join or string".

Changed in percona-toolkit:
importance: Undecided → Medium
milestone: none → 2.1.4
tags: added: chunking crash pt-table-sync
Changed in percona-toolkit:
importance: Medium → High
Changed in percona-toolkit:
assignee: nobody → Brian Fraser (fraserbn)
Revision history for this message
Brian Fraser (fraserbn) wrote :

The second bug is.. odd. The only way I can think of triggering the second bug is by having your mysql's latin1 charset be different from the standard charset. I'm really at loss for how to test or avoid it; it's one of those "shouldn't happen" scenarios. Ryan, could you post a PTDEBUG=1 of the "Use of uninitialized value in join or string" error? It would really help in getting this fixed.

Daniel & I discussed the division by zero bug, and the consensus was that if base was equal to one, then the chunking algorithm wasn't able to handle the table, so the tool will just exit and suggest picking a different algorithm; We'll add a note in the documentation about cases like this.

Revision history for this message
Ryan Brothers (ryan-brothers) wrote :

Brian - thanks for checking into the issues. I actually was able to create a reproduce script for the 2nd bug and I posted it above in comment #3 in reproduce2.sql. Does that reproduce the problem for you?

For the division by zero bug, rather than returning an error to pick a different algorithm, could pt-table-sync itself just recognize this scenario and pick a different algorithm on its own? In my call to pt-table-sync, I'm not specifying the --algorithm, so I don't have a preference about which algorithm is used, so I prefer in this case just for pt-table-sync to pick the correct algorithm so the data gets synced. Thanks.

Brian Fraser (fraserbn)
Changed in percona-toolkit:
status: Confirmed → In Progress
Revision history for this message
Brian Fraser (fraserbn) wrote :

Ryan, unfortunately, no dice, the second sql didn't reproduce the bug for me. Since it does for you, mind sending the command line and if possible, the PTDEBUG output?

For the division by zero bug, admittedtly, that would be ideal, but by the time the base==1 case comes up, we're deep inside the tool, and because of some design limitations changing the algorithm just isn't feasible; pt-table-sync is long due an overhaul, but that's not going to happen in the near future, I'm afraid.
(Although personally, I think it's a bit of a blessing in disguise: The tools are already quite magical, so being explicit about the algo earns points in my book.)

Revision history for this message
Ryan Brothers (ryan-brothers) wrote :

Brian - interesting, I just noticed that I can only reproduce it with MySQL 5.5.25a from mysql.com. If I try it with Percona Server 5.5.25a, it works fine. Attached is the PTDEBUG output from running against MySQL 5.5.25a from mysql.com.

I am running:

pt-table-sync --execute h=localhost,P=3306,u=test,p=test,D=db1,t=table1 D=db2

Brian Fraser (fraserbn)
Changed in percona-toolkit:
status: In Progress → Fix Committed
Revision history for this message
Brian Fraser (fraserbn) wrote :

Ryan: Ah, that's really helpful, thank you! To keep things a bit more organized, I've opened a new bug report for the second crash (https://bugs.launchpad.net/percona-toolkit/+bug/1042036). With some luck, this will get a fix by the time 2.1.4 rolls out in the next week or two.

Brian Fraser (fraserbn)
summary: - pt-table-sync errors with varchar primary key
+ pt-table-sync division by zero error with varchar primary key
Brian Fraser (fraserbn)
Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-324

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.