pt-table-checksum refuses to run on PXC if server_id is the same on all nodes

Reported by Fernando Ipar on 2013-08-27
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit
Undecided
Unassigned

Bug Description

I invoked pt-table-checksum on node1 of a 3 node PXC set up, like so:

[vagrant@node1 ~]$ ./pt-table-checksum h=localhost,u=root --recursion-method dsn=h=localhost,u=root,D=percona,t=dsns

And it gave me this error:

Diffs cannot be detected because no slaves were found. Please read the --recursion-method documentation for information.
node1 is a cluster node but no other nodes or regular replicas were found. Use --recursion-method=dsn to specify the other nodes in the cluster.

The table indicated by --recursion-method exists on all nodes with this data:

node1 mysql> select * from dsns;
+----+-----------+-----------------------+
| id | parent_id | dsn |
+----+-----------+-----------------------+
| 2 | NULL | h=192.168.70.2,u=root |
| 5 | NULL | h=192.168.70.3,u=root |
| 8 | NULL | h=192.168.70.4,u=root |
+----+-----------+-----------------------+
3 rows in set (0.00 sec)

And judging from what I see if I run with PTDEBUG in 1, it is finding and connecting to the nodes, but it disconnects from them as it thinks they're duplicates, based on server_id:

# Cxn:3638 3434 Removing duplicates from node1 node1 node2 node3
# Cxn:3644 3434 SELECT @@server_id
# Cxn:3646 3434 Server ID for node1 : 0
# Cxn:3644 3434 SELECT @@server_id
# Cxn:3646 3434 Server ID for node1 : 0
# Cxn:3652 3434 Removing node1 , ID 0 , because we've already seen it
# Cxn:3644 3434 SELECT @@server_id
# Cxn:3646 3434 Server ID for node2 : 0
# Cxn:3652 3434 Removing node2 , ID 0 , because we've already seen it
# Cxn:3644 3434 SELECT @@server_id
# Cxn:3646 3434 Server ID for node3 : 0
# Cxn:3652 3434 Removing node3 , ID 0 , because we've already seen it
# Cxn:3663 3434 Destroying cxn
# Cxn:3672 3434 DBI::db=HASH(0x2c83788) Disconnecting dbh on node3 h=192.168.70.4
# Cxn:3663 3434 Destroying cxn
# Cxn:3672 3434 DBI::db=HASH(0x2c856f0) Disconnecting dbh on node2 h=192.168.70.3
# Cxn:3663 3434 Destroying cxn
# Cxn:3672 3434 DBI::db=HASH(0x3127c48) Disconnecting dbh on node1 h=192.168.70.2

If I set server_id to a unique value on each node, the tool runs just fine with the same invocation.
PXC does not care about server_id so it may just happen that the severs are set up with the same value.

There is no mention in the 'Percona XtraDB Cluster' section of the tool's manual about this requirement, and in any case, I think the error message is a bit misleading.

I don't think this merits a code change, but it would be a good idea to make it clear in the docs that if server_id is not unique for each cluster node, the tool won't work.

tags: added: pt-table-checksum pxc server-id
Changed in percona-toolkit:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers