pt-table-checksum isn't atomic
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
Triaged
|
Undecided
|
Unassigned |
Bug Description
Much to my surprise, I realized that pt-table-checksum isn't completely atomic.
This causes issues on high load replicated systems using InnoDB where single rows may change, and lead to false diff positives.
Enabling the general log on the slave I found this (with --log-slave-updates on. table/db names obscured for security):
REPLACE INTO `percona`
2 Query COMMIT /* implicit, from Xid_log_event */
followed by a few queries to the table being checksummed, and then shortly after:
UPDATE `percona`
2 Query COMMIT /* implicit, from Xid_log_event */
This causes a problem for 2 reasons.
1.) This particular table has writes anywhere from 1-30 times a second.
2.) Even though I can confirm by hand checking the table (it's only a single row) that it is identical on slave/master, pt-table-checksums seems to always get it wrong and says they're different.
tags: | added: false-positive-error pt-table-checksum |
Here's the relevant create table statement fwiw:
CREATE TABLE `site_stats` ( articles` bigint(20) unsigned DEFAULT '0',
`ss_row_id` int(8) unsigned NOT NULL DEFAULT '0',
`ss_total_views` bigint(20) unsigned DEFAULT '0',
`ss_total_edits` bigint(20) unsigned DEFAULT '0',
`ss_good_
`ss_total_pages` bigint(20) DEFAULT '-1',
`ss_users` bigint(20) DEFAULT '-1',
`ss_admins` int(10) DEFAULT '-1',
`ss_images` int(10) DEFAULT '0',
`ss_active_users` bigint(20) DEFAULT '-1',
UNIQUE KEY `ss_row_id` (`ss_row_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;