pt-table-checksum doesn't work if slaves use RBR
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
Fix Released
|
Low
|
Daniel Nichter |
Bug Description
Problem:
Does not find slaves connected to slaves.
How to duplicate:
1. Need at least three servers with at least two levels of replication: master1 -> slave1 -> slave2
2. Create a simple table on the master and delete some records on each slave so that no DB has the same number of records
3. Run pt-table-checksum --create-
4. Inspect testdb.pt_checksums table...you should find the slave2 copy of the table = slave1 copy of the table but does not reflect the number of records on slave2.
Background:
I've tried it with both --recursion-medhod processlist and hosts
I'm using version = 1.0.1
Related branches
- Daniel Nichter: Approve
-
Diff: 50 lines (+16/-5)1 file modifiedbin/pt-table-checksum (+16/-5)
tags: | added: pt-table-checksum slave-recursion |
Changed in percona-toolkit: | |
milestone: | none → 2.1.4 |
Changed in percona-toolkit: | |
assignee: | nobody → Daniel Nichter (daniel-nichter) |
status: | Confirmed → In Progress |
summary: |
- pt-table-checksum doesn't recurse beyond 1 level + pt-table-checksum doesn't work if slaves use RBR |
Changed in percona-toolkit: | |
status: | Fix Committed → Fix Released |
OK, it seems to be finding all the slaves (except it gets to port wrong on the one using port 3307)
but the checksums tables are incorrect for all slaves except the one directly connected to the master.
Here's the end of the debug output when MKDEBUG=1:
DATABASE TABLE CHUNK HOST ENGINE COUNT CHECKSUM TIME WAIT STAT LAG checksum: 6415 1247 DBI::db= HASH(0x2a52540) USE `fabdb` checksum: 5731 1247 DBI::db= HASH(0x2a52540) DELETE FROM fabdb.pt_checksums WHERE db=? AND tbl=? AND chunk > ? fabdb pt_test 0 checksum: 5885 1247 Checksumming 1 chunks checksum: 6950 1247 Checking slave P=3306, S=/home/ mysql/mysql. sock,h= apk-trinet- db-02.tqs. com,p=. ..,u=mfgis- siu lag for throttle HASH(0x2b20568) SHOW SLAVE STATUS checksum: 6963 1247 Slave ready, lag 0 <= 1 checksum: 6950 1247 Checking slave P=3306, S=/home/ mysql/mysql. sock,h= amphitrite. tqs.com, p=...,u= mfgis-siu lag for throttle HASH(0x2a52a80) SHOW SLAVE STATUS checksum: 6963 1247 Slave ready, lag 0 <= 1 checksum: 6950 1247 Checking slave P=3306, S=/home/ mysql/mysql. sock,h= sjo-trinet- ft.tqs. com,p=. ..,u=mfgis- siu lag for throttle HASH(0x2b1b128) SHOW SLAVE STATUS checksum: 6963 1247 Slave ready, lag 0 <= 1 checksum: 6950 1247 Checking slave P=3306, S=/home/ mysql/mysql. sock,h= sjo-trinet- dell2950b. tqs.com, p=...,u= mfgis-siu lag for throttle HASH(0x2b20238) SHOW SLAVE STATUS checksum: 6963 1247 Slave ready, lag 0 <= 1 checksum: 5962 1247 Starting chunk 0 at 1323113447.84584 checksum: 6542 1247 Replicating chunk 0 of table fabdb . pt_test on poseidon.sawtek.com : 3306 checksum: 6415 1247 DBI::db= HASH(0x2a52540) USE `fabdb` .`pt_test` ',INDEX_ HINT => 'FORCE INDEX (`PRIMARY`)',WHERE => 'WHERE (1=1)'}; checksum: 6564 1247 SQL for inject chunk 0: REPLACE /*fabdb. pt_test: 1/1*/ INTO fabdb.pt_checksums (db, tbl, chunk, boundaries, this_cnt, this_crc) SELECT ?, ?, 0 AS chunk_num, ?, COUNT(*) AS cnt, COALESCE( LOWER(CONV( BIT_XOR( CAST(CRC32( `idpt_test` ) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `fabdb`.`pt_test` FORCE INDEX (`PRIMARY`) WHERE (1=1) checksum: 6597 1247 SHOW WARNINGS checksum: 6077 1247 Finished chunk at 1323113447.84817
# pt_table_
# pt_table_
# pt_table_
# pt_table_
# MasterSlave:3838 1247 DBI::db=
# pt_table_
# pt_table_
# MasterSlave:3838 1247 DBI::db=
# pt_table_
# pt_table_
# MasterSlave:3838 1247 DBI::db=
# pt_table_
# pt_table_
# MasterSlave:3838 1247 DBI::db=
# pt_table_
# pt_table_
# pt_table_
# pt_table_
# TableChunker:3135 1247 Injecting chunk 0
# TableChunker:3150 1247 Parameters: $VAR1 = {DB_TBL => '`fabdb`
# pt_table_
# Retry:4791 1247 Retry 1 of 2
# Retry:4798 1247 Try code succeeded
# pt_table_
fabdb pt_test 0 poseidon.sawtek.com InnoDB 6 9e6495a3 0 NULL NULL NULL
# pt_table_