pt-table-sync does not detect data difference using CRC32 hash

Bug #1030053 reported by LCM
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Triaged
Undecided
Unassigned

Bug Description

While syncing a slave to its master, I noticed some sets of rows are not getting synced. A table on the slave was loaded incorrectly and the timestamp field was different by 7 hours from the master. I ran the following command to do the initial sync:

pt-table-sync --execute h=dbslave01,P=3306,D=zappos,t=style_image --sync-to-master --verbose

This reported back that about half the rows were replaced to fix the time discrepancies, but not all the rows. In troubleshooting the problem, I found an anomaly in the calculations performed by this query that is run by pt-table-sync:

SELECT
  /*zappos.style_image:8924/60176*/
  8923 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `style_image_id`, `style_id`, `image_type_code`, `zima_recipe_id`, `image_format`, `filename`, `width`, `height`, `updated_at` + 0)) AS UNSIGNED)), 10, 16)), 0) AS crc
FROM `zappos`.`style_image` FORCE INDEX (`PRIMARY`)
WHERE (`style_image_id` >= '1429'
    AND `style_image_id` < '1435'
  ) FOR UPDATE ;

Whenever this query would run against an even number of rows that all had the same time error, then it calculates the same checksum value. Here is a sample of the data I'm comparing with the CRC32 value for the row and the cumulative XOR value. As you can see every other row generates the same value for XOR. Note, there are other columns that went into the calculation of the CRC, but I did not include them since they are verified to be the same on master and slave

Data from master:

style_image_id updated_at CRC32 XOR
1429 10/19/2011 9:41 3952427013
1430 10/19/2011 9:41 407848744 4091152173
1431 10/19/2011 9:41 2677520393 1817034532
1432 10/19/2011 9:41 2468571109 4285454529
1433 10/19/2011 9:41 1208292545 3077295104
1434 10/19/2011 9:41 3915229246 1580622910

Data from slave:

style_image_id updated_at CRC32 XOR
1429 10/19/2011 16:41 2727937137
1430 10/19/2011 16:41 1363346268 4091152173
1431 10/19/2011 16:41 3600546941 625081168
1432 10/19/2011 16:41 3660522385 4285454529
1433 10/19/2011 16:41 17387701 4268198004
1434 10/19/2011 16:41 2689723466 1580622910

This problem does not seem to occur with MD5, but CRC32 is the default and may affect more users of the tool.

This is in version pt-table-sync 2.1.2
In my case the master is MySQL Percona 5.1.56-rel12.7-log and the slave is MySQL 5.5.25a-enterprise-commercial-advanced

Revision history for this message
mark s (mark-stafford) wrote :

I have a pair of 5.5.25a-rel27.1.277.rhel6 servers that show similar symptoms, though I would have logged it under pt-table-checksum 2.1.2, not sync.

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

We've seen this type of CRC collision before; I'll dig up the previous bug (it's on the Maatkit bug tracker I think). I'm not sure if it can be solved because it's an inherent limitation of CRC checksumming, i.e. that just the right combinations of rows produces the same CRC.

tags: added: chunking pt-table-sync
Changed in percona-toolkit:
status: New → Triaged
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-1007

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.