pt-table-checksum false reporting when used with 5.6 multithreaded replication

Bug #1278426 reported by Noel on 2014-02-10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit

Bug Description

I have recently used 5.6 multithreaded replication in my environment. With the basic workflow of multithreaded replication each threads work per database schema. Suppose we have DB1 & DB2 on same Host with 3MultiThread Replication working. When we have pt-table-checksum run in such condition with a different DB name assuming perconaDB so a total of 3 DB per host due to multithreaded replication each thread will run DB specific sql from relay log without linear hierarchy due to which it doesnt get a constant checksum snapshot and may report false. Only solution i figured out is write a script to have the checksum table in each DB and run pt-table-checksum DB specific to get proper output in for loop. I cant have this solution work if im having huge DB nos in same host and it will be hard to manage such setup having multiple checksum table under each DB.

Noel (noelc) wrote :

any update on this ??????

Noel (noelc) on 2014-04-17
tags: added: pt-table-checksum
tags: added: mysql-5.6
Frank Cizmich (frank-cizmich) wrote :

Hello Flavian,

How frequent are the false negatives turning up?
If it's only sporadic maybe a workaround can be patched until a definite solution is developed.

Noel (noelc) wrote :

Hi Frank,
This usually happen in medium or high traffic masters. If your lucky that the master doesn't get any traffic and you run checksum it will give you proper result. Since multi thread not being serialized it does make checksum go nuts and maximum time i get false result. I then have to make the worker thread to 0 to make it serialized and than run checksum to get proper output.


I have tried to reproduce this issue with PS 5.6.17 (multi threaded replication) + sysbench but unable to do so. I have created master-slave replication with slave_parallel_workers =7 (as I have 7 databases). Then run sysbench on master and check checksum on master and slave. can you provide the exact test case?

nilnandan@Dell-XPS:~$ sysbench --test=/home/nilnandan/sysbench/sysbench/tests/db/oltp.lua --oltp-table-size=100000 --oltp-test-mode=complex --oltp-read-only=off --num-threads=50 --max-time=300 --max-requests=0 --mysql-db=dbtest --mysql-user=root --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox20082.sock run
sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 50
Random number generator seed is 0 and will be ignored

Threads started!

OLTP test statistics:
    queries performed:
        read: 6137096
        write: 1753434
        other: 876719
        total: 8767249
    transactions: 438355 (1461.11 per sec.)
    deadlocks: 9 (0.03 per sec.)
    read/write requests: 7890530 (26300.47 per sec.)
    other operations: 876719 (2922.25 per sec.)

General statistics:
    total time: 300.0148s
    total number of events: 438355
    total time taken by event execution: 14999.2038s
    response time:
         min: 4.92ms
         avg: 34.22ms
         max: 604.40ms
         approx. 95 percentile: 66.34ms

Threads fairness:
    events (avg/stddev): 8767.1000/56.69
    execution time (avg/stddev): 299.9841/0.00

nilnandan@Dell-XPS:~$ pt-table-checksum --user=root --password=msandbox --socket=/tmp/mysql_sandbox20082.sock --databases=dbtest
Cannot connect to h=,p=...,u=root
Diffs cannot be detected because no slaves were found. Please read the --recursion-method documentation for information.
Pausing because Threads_running=50.
Checksumming dbtest.sbtest: 13% 31:32 remain

10-13T14:37:38 0 0 100000 6 0 295.748 dbtest.sbtest
10-13T14:37:38 0 0 100000 1 0 0.190 dbtest.sbtest1

nilnandan@Dell-XPS:~$ pt-table-checksum --user=root --password=msandbox --socket=/tmp/mysql_sandbox20083.sock --databases=dbtest
Diffs cannot be detected because no slaves were found. Please read the --recursion-method documentation for information.
10-13T14:32:46 0 0 100000 5 0 0.684 dbtest.sbtest
10-13T14:32:47 0 0 100000 1 0 0.384 dbtest.sbtest1

Changed in percona-toolkit:
status: New → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for Percona Toolkit because there has been no activity for 60 days.]

Changed in percona-toolkit:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers