pt-table-checksum issued a full table checksum ,and caused slave delay

Bug #1697584 reported by zhang zheng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Expired
Undecided
Unassigned

Bug Description

mysql master/slave version:
mysql> select version();
+-----------------+
| version() |
+-----------------+
| 5.5.52-38.3-log |
+-----------------+
1 row in set (0.00 sec)

pt-table-checksum version:
pt-table-checksum --version
pt-table-checksum 2.2.17

executed command:
pt-table-checksum --nocheck-replication-filters --replicate=test_checksum.checksums --ignore-databases=information_schema,mysql,performance_schema --host=xx --port=xx -udba_checksum -pxx --recursion-method dsn=h=xx,u=dba_checksum,p=xx,D=test_checksum,t=dsns --no-check-binlog-format --chunk-index=PRIMARY --chunk-index-columns=3 --chunk-size-limit=1000 --max-lag=1

description:
I issued a command ,listed above ,to do master/slave checksum.It worked well at first and no slave delay.But after a while ,I found a big replication delay in slave.By executing show full processlist in slave ,I saw the sql_thread was running:
REPLACE INTO `test_checksum`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'xxx', 'mmm_tab', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, `uid`, `type`, `data`, `flag`, `status`, `create_time`, `update_time`)) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `xxx`.`mmm_tab` /*checksum table*/
It was a full table checksum ,which lasted for about 1200 seconds, and the table mmm_tab sizes 100G,and the lag continued to increase.

I also found this statement in master's error log:
170612 17:16:33 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. REPLACE... SELECT is unsafe because the order in which rows are retrieved by th
e SELECT determines which (if any) rows are replaced. This order cannot be predicted and may differ on master and the slave. Statement: REPLACE INTO `test_checksum`.`checksums` (db, tbl, chunk, chunk_inde
x, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'xxx', 'mmm_tab', '234', 'PRIMARY', '48046052', '48269634', COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `
id`, `uid`, `type`, `data`, `flag`, `status`, `create_time`, `update_time`)) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `xxx`.`mmm_tab` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '48046052')) AND ((`id
` <= '48269634')) /*checksum chunk*/
170612 17:16:34 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. REPLACE... SELECT is unsafe because the order in which rows are retrieved by th
e SELECT determines which (if any) rows are replaced. This order cannot be predicted and may differ on master and the slave. Statement: REPLACE INTO `test_checksum`.`checksums` (db, tbl, chunk, chunk_inde
x, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'xxx', 'mmm_tab', '235', 'PRIMARY', '48269635', '48505239', COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `
id`, `uid`, `type`, `data`, `flag`, `status`, `create_time`, `update_time`)) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `xxx`.`mmm_tab` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '48269635')) AND ((`id
` <= '48505239')) /*checksum chunk*/
170612 17:16:39 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. REPLACE... SELECT is unsafe because the order in which rows are retrieved by th
e SELECT determines which (if any) rows are replaced. This order cannot be predicted and may differ on master and the slave. Statement: REPLACE INTO `test_checksum`.`checksums` (db, tbl, chunk, chunk_inde
x, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'xxx', 'mmm_tab', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `id`, `uid`, `type`,
 `data`, `flag`, `status`, `create_time`, `update_time`)) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `xxx`.`mmm_tab` /*checksum table*/
170612 17:16:40 [Warning] Aborted connection 2832926 to db: 'xxx' user: 'dba_checksum' host: 'xx.xx.xx.xx' (Got an error reading communication packets)
170612 17:16:41 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. REPLACE... SELECT is unsafe because the order in which rows are retrieved by th
e SELECT determines which (if any) rows are replaced. This order cannot be predicted and may differ on master and the slave. Statement: REPLACE INTO `test_checksum`.`checksums` (db, tbl, chunk, chunk_inde
x, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'xxx', 'mmm_tab', '236', 'PRIMARY', '48505240', '48746455', COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `
id`, `uid`, `type`, `data`, `flag`, `status`, `create_time`, `update_time`)) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `xxx`.`mmm_tab` FORCE INDEX(`PRIMARY`) WHERE ((`id` >= '48505240')) AND ((`id
` <= '48746455')) /*checksum chunk*/

It seemed that,normally pt-table-checksum would issue sql statement by force index(primary) where id xxx and xxx, but in this case, it issued a full table checksum.

Revision history for this message
zhang zheng (zhangzheng2040) wrote :

in the error log,we can see : pt-table-checksum issued a full table checksum without a where condition between mmm_tab chunk 235 and 236.

Revision history for this message
Jericho Rivera (jericho-rivera) wrote :

hi Zhang,

Can you test with Percona Toolkit 3.0.3, and share the table structure of mmm_tab (SHOW CREATE TABLE). If you have a full log when running the tool with PTDEBUG=1 that would also be very welcome.

Changed in percona-toolkit:
status: New → Invalid
status: Invalid → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Percona Toolkit because there has been no activity for 60 days.]

Changed in percona-toolkit:
status: Incomplete → Expired
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-1432

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.