pt-table-sync does not detect data difference using CRC32 hash

Reported by LCM on 2012-07-27
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Percona Toolkit
Undecided
Unassigned

Bug Description

While syncing a slave to its master, I noticed some sets of rows are not getting synced. A table on the slave was loaded incorrectly and the timestamp field was different by 7 hours from the master. I ran the following command to do the initial sync:

pt-table-sync --execute h=dbslave01,P=3306,D=zappos,t=style_image --sync-to-master --verbose

This reported back that about half the rows were replaced to fix the time discrepancies, but not all the rows. In troubleshooting the problem, I found an anomaly in the calculations performed by this query that is run by pt-table-sync:

SELECT
  /*zappos.style_image:8924/60176*/
  8923 AS chunk_num, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `style_image_id`, `style_id`, `image_type_code`, `zima_recipe_id`, `image_format`, `filename`, `width`, `height`, `updated_at` + 0)) AS UNSIGNED)), 10, 16)), 0) AS crc
FROM `zappos`.`style_image` FORCE INDEX (`PRIMARY`)
WHERE (`style_image_id` >= '1429'
    AND `style_image_id` < '1435'
  ) FOR UPDATE ;

Whenever this query would run against an even number of rows that all had the same time error, then it calculates the same checksum value. Here is a sample of the data I'm comparing with the CRC32 value for the row and the cumulative XOR value. As you can see every other row generates the same value for XOR. Note, there are other columns that went into the calculation of the CRC, but I did not include them since they are verified to be the same on master and slave

Data from master:

style_image_id updated_at CRC32 XOR
1429 10/19/2011 9:41 3952427013
1430 10/19/2011 9:41 407848744 4091152173
1431 10/19/2011 9:41 2677520393 1817034532
1432 10/19/2011 9:41 2468571109 4285454529
1433 10/19/2011 9:41 1208292545 3077295104
1434 10/19/2011 9:41 3915229246 1580622910

Data from slave:

style_image_id updated_at CRC32 XOR
1429 10/19/2011 16:41 2727937137
1430 10/19/2011 16:41 1363346268 4091152173
1431 10/19/2011 16:41 3600546941 625081168
1432 10/19/2011 16:41 3660522385 4285454529
1433 10/19/2011 16:41 17387701 4268198004
1434 10/19/2011 16:41 2689723466 1580622910

This problem does not seem to occur with MD5, but CRC32 is the default and may affect more users of the tool.

This is in version pt-table-sync 2.1.2
In my case the master is MySQL Percona 5.1.56-rel12.7-log and the slave is MySQL 5.5.25a-enterprise-commercial-advanced

obbyyoyo (mark-stafford) wrote :

I have a pair of 5.5.25a-rel27.1.277.rhel6 servers that show similar symptoms, though I would have logged it under pt-table-checksum 2.1.2, not sync.

Daniel Nichter (daniel-nichter) wrote :

We've seen this type of CRC collision before; I'll dig up the previous bug (it's on the Maatkit bug tracker I think). I'm not sure if it can be solved because it's an inherent limitation of CRC checksumming, i.e. that just the right combinations of rows produces the same CRC.

tags: added: chunking pt-table-sync
Changed in percona-toolkit:
status: New → Triaged
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers