pt-table-checksum false reporting when used with 5.6 multithreaded replication

Bug #1278426 reported by Noel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Expired
Undecided
Unassigned

Bug Description

I have recently used 5.6 multithreaded replication in my environment. With the basic workflow of multithreaded replication each threads work per database schema. Suppose we have DB1 & DB2 on same Host with 3MultiThread Replication working. When we have pt-table-checksum run in such condition with a different DB name assuming perconaDB so a total of 3 DB per host due to multithreaded replication each thread will run DB specific sql from relay log without linear hierarchy due to which it doesnt get a constant checksum snapshot and may report false. Only solution i figured out is write a script to have the checksum table in each DB and run pt-table-checksum DB specific to get proper output in for loop. I cant have this solution work if im having huge DB nos in same host and it will be hard to manage such setup having multiple checksum table under each DB.

Revision history for this message
Noel (noelc) wrote :

any update on this ??????

Noel (noelc)
tags: added: pt-table-checksum
tags: added: mysql-5.6
Revision history for this message
Frank Cizmich (frank-cizmich) wrote :

Hello Flavian,

How frequent are the false negatives turning up?
If it's only sporadic maybe a workaround can be patched until a definite solution is developed.

Revision history for this message
Noel (noelc) wrote :

Hi Frank,
This usually happen in medium or high traffic masters. If your lucky that the master doesn't get any traffic and you run checksum it will give you proper result. Since multi thread not being serialized it does make checksum go nuts and maximum time i get false result. I then have to make the worker thread to 0 to make it serialized and than run checksum to get proper output.

Revision history for this message
Nilnandan Joshi (nilnandan-joshi) wrote :

Hi,

I have tried to reproduce this issue with PS 5.6.17 (multi threaded replication) + sysbench but unable to do so. I have created master-slave replication with slave_parallel_workers =7 (as I have 7 databases). Then run sysbench on master and check checksum on master and slave. can you provide the exact test case?

nilnandan@Dell-XPS:~$ sysbench --test=/home/nilnandan/sysbench/sysbench/tests/db/oltp.lua --oltp-table-size=100000 --oltp-test-mode=complex --oltp-read-only=off --num-threads=50 --max-time=300 --max-requests=0 --mysql-db=dbtest --mysql-user=root --mysql-password=msandbox --mysql-socket=/tmp/mysql_sandbox20082.sock run
sysbench 0.5: multi-threaded system evaluation benchmark

Running the test with following options:
Number of threads: 50
Random number generator seed is 0 and will be ignored

Threads started!

OLTP test statistics:
    queries performed:
        read: 6137096
        write: 1753434
        other: 876719
        total: 8767249
    transactions: 438355 (1461.11 per sec.)
    deadlocks: 9 (0.03 per sec.)
    read/write requests: 7890530 (26300.47 per sec.)
    other operations: 876719 (2922.25 per sec.)

General statistics:
    total time: 300.0148s
    total number of events: 438355
    total time taken by event execution: 14999.2038s
    response time:
         min: 4.92ms
         avg: 34.22ms
         max: 604.40ms
         approx. 95 percentile: 66.34ms

Threads fairness:
    events (avg/stddev): 8767.1000/56.69
    execution time (avg/stddev): 299.9841/0.00

nilnandan@Dell-XPS:~$
nilnandan@Dell-XPS:~$ pt-table-checksum --user=root --password=msandbox --socket=/tmp/mysql_sandbox20082.sock --databases=dbtest
Cannot connect to h=127.0.0.1,p=...,u=root
Diffs cannot be detected because no slaves were found. Please read the --recursion-method documentation for information.
Pausing because Threads_running=50.
Checksumming dbtest.sbtest: 13% 31:32 remain

            TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
10-13T14:37:38 0 0 100000 6 0 295.748 dbtest.sbtest
10-13T14:37:38 0 0 100000 1 0 0.190 dbtest.sbtest1
nilnandan@Dell-XPS:~$

nilnandan@Dell-XPS:~$ pt-table-checksum --user=root --password=msandbox --socket=/tmp/mysql_sandbox20083.sock --databases=dbtest
Diffs cannot be detected because no slaves were found. Please read the --recursion-method documentation for information.
            TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
10-13T14:32:46 0 0 100000 5 0 0.684 dbtest.sbtest
10-13T14:32:47 0 0 100000 1 0 0.384 dbtest.sbtest1
nilnandan@Dell-XPS:~$

Changed in percona-toolkit:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Percona Toolkit because there has been no activity for 60 days.]

Changed in percona-toolkit:
status: Incomplete → Expired
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-1207

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.