pt-table-checksum refuses to run on PXC if server_id is the same on all nodes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
Fix Released
|
Medium
|
Frank Cizmich |
Bug Description
I invoked pt-table-checksum on node1 of a 3 node PXC set up, like so:
[vagrant@node1 ~]$ ./pt-table-checksum h=localhost,u=root --recursion-method dsn=h=localhost
And it gave me this error:
Diffs cannot be detected because no slaves were found. Please read the --recursion-method documentation for information.
node1 is a cluster node but no other nodes or regular replicas were found. Use --recursion-
The table indicated by --recursion-method exists on all nodes with this data:
node1 mysql> select * from dsns;
+----+-
| id | parent_id | dsn |
+----+-
| 2 | NULL | h=192.168.
| 5 | NULL | h=192.168.
| 8 | NULL | h=192.168.
+----+-
3 rows in set (0.00 sec)
And judging from what I see if I run with PTDEBUG in 1, it is finding and connecting to the nodes, but it disconnects from them as it thinks they're duplicates, based on server_id:
# Cxn:3638 3434 Removing duplicates from node1 node1 node2 node3
# Cxn:3644 3434 SELECT @@server_id
# Cxn:3646 3434 Server ID for node1 : 0
# Cxn:3644 3434 SELECT @@server_id
# Cxn:3646 3434 Server ID for node1 : 0
# Cxn:3652 3434 Removing node1 , ID 0 , because we've already seen it
# Cxn:3644 3434 SELECT @@server_id
# Cxn:3646 3434 Server ID for node2 : 0
# Cxn:3652 3434 Removing node2 , ID 0 , because we've already seen it
# Cxn:3644 3434 SELECT @@server_id
# Cxn:3646 3434 Server ID for node3 : 0
# Cxn:3652 3434 Removing node3 , ID 0 , because we've already seen it
# Cxn:3663 3434 Destroying cxn
# Cxn:3672 3434 DBI::db=
# Cxn:3663 3434 Destroying cxn
# Cxn:3672 3434 DBI::db=
# Cxn:3663 3434 Destroying cxn
# Cxn:3672 3434 DBI::db=
If I set server_id to a unique value on each node, the tool runs just fine with the same invocation.
PXC does not care about server_id so it may just happen that the severs are set up with the same value.
There is no mention in the 'Percona XtraDB Cluster' section of the tool's manual about this requirement, and in any case, I think the error message is a bit misleading.
I don't think this merits a code change, but it would be a good idea to make it clear in the docs that if server_id is not unique for each cluster node, the tool won't work.
Related branches
- Daniel Nichter: Approve
-
Diff: 752 lines (+321/-78)10 files modifiedbin/pt-config-diff (+22/-3)
bin/pt-deadlock-logger (+22/-3)
bin/pt-fk-error-logger (+22/-3)
bin/pt-kill (+22/-3)
bin/pt-online-schema-change (+43/-6)
bin/pt-table-checksum (+44/-6)
bin/pt-upgrade (+22/-3)
lib/Cxn.pm (+19/-2)
lib/Percona/XtraDB/Cluster.pm (+23/-2)
t/pt-table-checksum/pxc.t (+82/-47)
tags: | added: pt-table-checksum pxc server-id |
Changed in percona-toolkit: | |
status: | New → Confirmed |
Changed in percona-toolkit: | |
importance: | Undecided → Medium |
assignee: | nobody → Frank Cizmich (frank-cizmich) |
milestone: | none → 2.2.12 |
Changed in percona-toolkit: | |
status: | Confirmed → In Progress |
Changed in percona-toolkit: | |
status: | In Progress → Fix Committed |
Changed in percona-toolkit: | |
status: | Fix Committed → Fix Released |
Fixed using "wsrep_ node_incoming_ address" as a unique identifier for cluster nodes, instead of relying on "server_id".