pt-table-checksum: recursion method default is not correct for clusters

Bug #1169853 reported by Daniël van Eeden
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
Undecided
Unassigned

Bug Description

The documentation says that the default recursion method is "processlist,hosts", but this does not seem to be true.

The "cluster" recursion method also seems to be used.

This can be tested by running pt-table-checksum on:
- a 3 node PXC setup which uses MySQL Replication to replicate to another 3 node PXC setup.
- Then run pt-table-checksum against the node which has the master role in the MySQL Replication setup.

Expected result:
pt-table-checksum detects "regular" mysql replication with 1 master and 1 slave.

Actual result:
Cluster setup is detected
"xxx is a cluster node but no other nodes or regular replicas were found. Use --recursion-method=dsn to specify the other nodes in the cluster."

According to PT_DEBUG output it does connect to the slave.
I can't upload PT_DEBUG output.

description: updated
tags: added: pt-table-checksum pxc slave-recursion
Changed in percona-toolkit:
status: New → Confirmed
milestone: none → 2.2.3
Changed in percona-toolkit:
importance: Undecided → Medium
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Daniel, given that message, the slave (i.e. the other cluster) was not found because if it was, it would work:

$ ./pt-table-checksum h=127.1,P=12345,u=msandbox,p=msandbox -d mysql
Not checking replica lag on lucid32 because it is a cluster node.
            TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
06-27T08:44:16 0 0 0 1 0 0.353 mysql.columns_priv
...

It might also die saying "these nodes are in another cluster:" if the slave's cluster name isn't the same as the master's cluster name. Since you can't send PTDEBUG output, could you double check it like "PTDEBUG=1 ... 2>&1 | grep MasterSlave" then look for a lines like:

# MasterSlave:5086 8584 Found 1 slaves
# MasterSlave:5063 8584 Recursing from P=12345,h=127.1,p=...,u=msandbox to P=2900,h=127.0.0.1,p=...,u=msandbox
# MasterSlave:5004 8584 Port number is non-standard; using only hosts method
# MasterSlave:5020 8584 Recursion methods: hosts
# MasterSlave:5030 8584 Connected to P=2900,h=127.0.0.1,p=...,u=msandbox
# MasterSlave:5039 8584 SELECT @@SERVER_ID
# MasterSlave:5041 8584 Working on server ID 2900
# MasterSlave:4973 8584 Found slave: P=2900,h=127.0.0.1,p=...,u=msandbox

Port 1235 is my master (1st cluster), port 2900 is the slave (2nd cluster).

Changed in percona-toolkit:
status: Confirmed → In Progress
assignee: nobody → Daniel Nichter (daniel-nichter)
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Daniël, are you able to check that ^?

Changed in percona-toolkit:
milestone: 2.2.4 → none
assignee: Daniel Nichter (daniel-nichter) → nobody
importance: Medium → Undecided
status: In Progress → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for Percona Toolkit because there has been no activity for 60 days.]

Changed in percona-toolkit:
status: Incomplete → Expired
Revision history for this message
Daniël van Eeden (dveeden) wrote :

Rechecked with Percona Toolkit 2.2.5. It is fixed now.

Setup
Cluster1: galera1, galera2, galera3
Cluster2: galera4, galera5, galera6

galera4 is a slave of galera1.

PXC Version: 5.5.34-23.7.6-565.precise

root@galera4:~# pt-table-checksum --recursion-method=cluster -d test1 h=galera1,u=percona,p=percona
Not checking replica lag on galera2 because it is a cluster node.
Not checking replica lag on galera3 because it is a cluster node.
            TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
11-12T12:56:33 0 0 2 1 0 0.063 test1.t1

root@galera4:~# pt-table-sync --print --replicate percona.checksums -d test1 h=galera1,u=percona,p=percona
REPLACE INTO `test1`.`t1`(`id`, `name`) VALUES ('1', 'test-galera2-2') /*percona-toolkit src_db:test1 src_tbl:t1 src_dsn:h=galera1,p=...,u=percona dst_db:test1 dst_tbl:t1 dst_dsn:h=galera4,p=...,u=percona lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:15519 user:root host:galera4*/;
REPLACE INTO `test1`.`t1`(`id`, `name`) VALUES ('4', 'test-galera3') /*percona-toolkit src_db:test1 src_tbl:t1 src_dsn:h=galera1,p=...,u=percona dst_db:test1 dst_tbl:t1 dst_dsn:h=galera4,p=...,u=percona lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:15519 user:root host:galera4*/;

root@galera4:~# pt-table-checksum --version
pt-table-checksum 2.2.5

Changed in percona-toolkit:
status: Expired → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-1098

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.