Comment 20 for bug 1080765

Revision history for this message
Dan Farrell (dfarrell) wrote :

I have been running pt-table-checksum periodically in a test environment every 15 minutes. This environment has master-slave replication, but the data is not changing. Nevertheless, I occasionally (once every few days or so) get a false positive on the checksums as well.

My ears perked up when you hypothesized that ptc might check the slave before it was done replicating the checksum from the master. Since my databases never change in my test environment it seems as if you've hit the head here.

Mostly I just wanted to speak up that your theory is very much in line with my experience. However if you're bereft of ideas, may I suggest the tool be modified to use the master log position / file and the slave log position/file instead to ensure that the checksum tables have replicated? A timestamp in the checksums table could also be used, but it seems like it makes as much sense to get the master position off the master and then wait until that position is reached on the slave.

I also theorize that the option to only check, but not compute, the checksums might facilitate a work-around in which the end user could run the checksum, wait for the master pos to be reached on the slave externally, and then run in check-only mode to verify the checksum accuracy.

I look forward to seeing the fix! Thanks for all your hard work!