pt-table-checksum doesn't work if slaves use RBR

Bug #899415 reported by Jason Welter
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
Low
Daniel Nichter

Bug Description

Problem:
Does not find slaves connected to slaves.

How to duplicate:
1. Need at least three servers with at least two levels of replication: master1 -> slave1 -> slave2
2. Create a simple table on the master and delete some records on each slave so that no DB has the same number of records
3. Run pt-table-checksum --create-replicate-table --empty-replicate-table --nocheck-replication-filters --replicate testdb.pt_checksums --databases testdb --tables testtbl
4. Inspect testdb.pt_checksums table...you should find the slave2 copy of the table = slave1 copy of the table but does not reflect the number of records on slave2.

Background:
I've tried it with both --recursion-medhod processlist and hosts
I'm using version = 1.0.1

Related branches

Revision history for this message
Jason Welter (jason-welter) wrote :

OK, it seems to be finding all the slaves (except it gets to port wrong on the one using port 3307)
but the checksums tables are incorrect for all slaves except the one directly connected to the master.
Here's the end of the debug output when MKDEBUG=1:

DATABASE TABLE CHUNK HOST ENGINE COUNT CHECKSUM TIME WAIT STAT LAG
# pt_table_checksum:6415 1247 DBI::db=HASH(0x2a52540) USE `fabdb`
# pt_table_checksum:5731 1247 DBI::db=HASH(0x2a52540) DELETE FROM fabdb.pt_checksums WHERE db=? AND tbl=? AND chunk > ? fabdb pt_test 0
# pt_table_checksum:5885 1247 Checksumming 1 chunks
# pt_table_checksum:6950 1247 Checking slave P=3306,S=/home/mysql/mysql.sock,h=apk-trinet-db-02.tqs.com,p=...,u=mfgis-siu lag for throttle
# MasterSlave:3838 1247 DBI::db=HASH(0x2b20568) SHOW SLAVE STATUS
# pt_table_checksum:6963 1247 Slave ready, lag 0 <= 1
# pt_table_checksum:6950 1247 Checking slave P=3306,S=/home/mysql/mysql.sock,h=amphitrite.tqs.com,p=...,u=mfgis-siu lag for throttle
# MasterSlave:3838 1247 DBI::db=HASH(0x2a52a80) SHOW SLAVE STATUS
# pt_table_checksum:6963 1247 Slave ready, lag 0 <= 1
# pt_table_checksum:6950 1247 Checking slave P=3306,S=/home/mysql/mysql.sock,h=sjo-trinet-ft.tqs.com,p=...,u=mfgis-siu lag for throttle
# MasterSlave:3838 1247 DBI::db=HASH(0x2b1b128) SHOW SLAVE STATUS
# pt_table_checksum:6963 1247 Slave ready, lag 0 <= 1
# pt_table_checksum:6950 1247 Checking slave P=3306,S=/home/mysql/mysql.sock,h=sjo-trinet-dell2950b.tqs.com,p=...,u=mfgis-siu lag for throttle
# MasterSlave:3838 1247 DBI::db=HASH(0x2b20238) SHOW SLAVE STATUS
# pt_table_checksum:6963 1247 Slave ready, lag 0 <= 1
# pt_table_checksum:5962 1247 Starting chunk 0 at 1323113447.84584
# pt_table_checksum:6542 1247 Replicating chunk 0 of table fabdb . pt_test on poseidon.sawtek.com : 3306
# pt_table_checksum:6415 1247 DBI::db=HASH(0x2a52540) USE `fabdb`
# TableChunker:3135 1247 Injecting chunk 0
# TableChunker:3150 1247 Parameters: $VAR1 = {DB_TBL => '`fabdb`.`pt_test`',INDEX_HINT => 'FORCE INDEX (`PRIMARY`)',WHERE => 'WHERE (1=1)'};
# pt_table_checksum:6564 1247 SQL for inject chunk 0: REPLACE /*fabdb.pt_test:1/1*/ INTO fabdb.pt_checksums (db, tbl, chunk, boundaries, this_cnt, this_crc) SELECT ?, ?, 0 AS chunk_num, ?, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(`idpt_test`) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `fabdb`.`pt_test` FORCE INDEX (`PRIMARY`) WHERE (1=1)
# Retry:4791 1247 Retry 1 of 2
# Retry:4798 1247 Try code succeeded
# pt_table_checksum:6597 1247 SHOW WARNINGS
fabdb pt_test 0 poseidon.sawtek.com InnoDB 6 9e6495a3 0 NULL NULL NULL
# pt_table_checksum:6077 1247 Finished chunk at 1323113447.84817

Revision history for this message
Jason Welter (jason-welter) wrote :

I've looked at the binary logs on two of the slaves and found no 'slave2' specific update to the checksums table, just the insert that would come from replicating the 'slave1' update.

Revision history for this message
Jason Welter (jason-welter) wrote :

Let me give you my workaround.

But first let me describe my replication topology in detail:

master1a -> slave2a -> slave3a -> slave4a
                    -> slave3b
                    -> slave3c

what I was expecting to be able to do was:

pt-table-checsum --replicate pt_checksums h=master1a
pt-table-sync --execute --replicate pt_checksums

and have slave2a,slave3a,slave3b,slave3c,slave4a all sync'd to master1a.
What happens instead is even though pt-table-checksum detects all the slaves it
only checksums master1a and slave2a. Likewise, when I checksum slave2a
I find correct checksums on slave3a,slave3b,slave3c but not slave4a.

So I created a shell script to handle master1a, slave2a & slave3a separately:

MASTER_SERVERS=(master1a slave2a slave3a)
for MASTER_SERVER in ${MASTER_SERVERS[@]}; do

    #Checksum tables
    echo "Checksumming $MASTER_SERVER and it's slaves"
    CHECKSUM_COMMAND=`which pt-table-checksum`
    CHECKSUM_COMMAND="$CHECKSUM_COMMAND --algorithm BIT_XOR --optimize-xor --create-replicate-table --empty-replicate-table --nocheck-replication-filters --replicate fabdb.pt_checksums "
    CHECKSUM_COMMAND="$CHECKSUM_COMMAND --databases $DATABASES "
    CHECKSUM_COMMAND="$CHECKSUM_COMMAND h=$MASTER_SERVER,P=3306,u=,p= "
    #echo $CHECKSUM_COMMAND
    $CHECKSUM_COMMAND

    echo "Finished Checksumming" `/bin/date`
    echo

    #Sync tables
    SYNC_COMMAND=`which pt-table-sync`
    SYNC_COMMAND="$SYNC_COMMAND --verbose --execute --replicate fabdb.pt_checksums "
    SYNC_COMMAND="$SYNC_COMMAND h=$MASTER_SERVER,P=3306,u=,p= "
    #echo $SYNC_COMMAND
    $SYNC_COMMAND

    echo "Finished Syncing" `/bin/date`
    echo

done

This does result in double checksumming of the tables on slave2a and slave3a
but it does get all the tables sync'd which is the final goal. I'd still prefer
just having to call everything once since it would eliminate the double checksuming.

tags: added: pt-table-checksum slave-recursion
Revision history for this message
Joe (joegrasse) wrote :

I ran into this problem with the original mk-table-checksum. The problem is setting the binlog_format in your session is not honored all the way through the replication hierarchy for some statements. I actually have a support request into Oracle about this. You can see the conversation I had with Baron about this earlier in the below discussion. It would be nice if this issue got a little more "press", maybe Oracle would be more likely to fix it.

http://groups.google.com/group/maatkit-discuss/browse_thread/thread/5749a5d7b22527d1

Revision history for this message
Joe (joegrasse) wrote :

Just thought that I would point out that the mysql docs have been updated, because of the support request I have open.

http://dev.mysql.com/doc/refman/5.5/en/replication-options-binary-log.html#sysvar_binlog_format

"Note
While you can change the logging format at runtime, it is not recommended that you change it while replication is ongoing. This is due in part to the fact that slaves do not honor the master's binlog_format setting; a given MySQL Server can change only its own logging format."

http://dev.mysql.com/doc/refman/5.5/en/binary-log-setting.html

"Note
Each MySQL Server can set its own and only its own binary logging format (true whether binlog_format is set with global or session scope). This means that changing the logging format on a replication master does not cause a slave to change its logging format to match. (When using STATEMENT mode, the binlog_format system variable is not replicated; when using MIXED or ROW logging mode, it is replicated but is ignored by the slave.) Changing the binary logging format on the master while replication is ongoing, or without also changing it on the slave can thus cause unexpected results, or even cause replication to fail altogether."

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Looks like this is a duplicate of bug 938068? The same MySQL bin log "bug" (if they decide it's a bug and not just "how it works").

tags: added: binlog-format mysql-bug
Changed in percona-toolkit:
status: New → Confirmed
Revision history for this message
Jason Welter (jason-welter) wrote :

I don't know why pt-table-checksum supposedly wants to change the binlog format?!?
But I can confirm that all the servers in the scenereo I outlined use ROW format binary logs.
According to MySQL documentation this is the most reliable although voluminous option.

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

Read the documentation.

Revision history for this message
Jason Welter (jason-welter) wrote :

Thanks...handn't thought of that

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

Unfortunately, I believe this problem is not going to be fixed inside MySQL, and that means there is virtually nothing we can do about it inside pt-table-checksum. I've actually thought through some things we might be able to do, but ran into dead ends each time. We'll need to document this limitation in the tool's docs, but that's all I can do.

tags: added: docs
Changed in percona-toolkit:
importance: Undecided → Low
Changed in percona-toolkit:
milestone: none → 2.1.4
Changed in percona-toolkit:
assignee: nobody → Daniel Nichter (daniel-nichter)
status: Confirmed → In Progress
summary: - pt-table-checksum doesn't recurse beyond 1 level
+ pt-table-checksum doesn't work if slaves use RBR
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

This limitation has been documented in the new LIMITATIONS section, referenced in the RISKS section.

Changed in percona-toolkit:
status: In Progress → Fix Committed
Brian Fraser (fraserbn)
Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Joe (joegrasse) wrote :

Just to add a little clarification, this also applies to binlog_format=MIXED.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-763

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.