pt-table-sync doesn't recursively find slaves as opposed to pt-table-checksum
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
Triaged
|
Undecided
|
Unassigned |
Bug Description
As per a discussion just had on irc:
15/11/2011 12:32:39 walterheck> hello, I have a slave 'x' which serves as a master for another cluster. the second cluster is a master 'y' with 4 slaves. I used pt-table-checksum from 'x' to show me the changes (I already know that x is in sync with it's own master 'a', so I didn't want to run the checksum from 'a'). now when I run pt-table-sync from 'x', it's not doing anything. --verbose --execute shows just two lines and no actual queries. What's up there?
15/11/2011 12:40:54 gryp> walterheck: try with 'env PTDEBUG'
15/11/2011 12:53:33 gryp> walterheck: which version of pt-table-checksum did you use?
15/11/2011 12:54:46 walterheck> root@db05:~# ./pt-table-sync --version
15/11/2011 12:54:46 walterheck> pt-table-sync 1.0.1
15/11/2011 12:56:02 gryp> are there any inconsistencies?
15/11/2011 12:56:13 gryp> :)
[...]
15/11/2011 13:00:33 walterheck> gryp: interesting though that mk-table-checksum had no problem finding the correct slave ip's?
15/11/2011 13:00:46 walterheck> *pt-table-checksum
[..]
15/11/2011 13:04:24 gryp> voila, so pt-table-checksum checks replication lag
15/11/2011 13:04:27 gryp> and there's none
15/11/2011 13:04:33 gryp> it does not check where it is replicating from
15/11/2011 13:04:37 gryp> and doesn't check the checksum table
15/11/2011 13:04:41 gryp> unless you do --replicate-check2
15/11/2011 13:05:01 walterheck> not sure I follow. I used this for checksumming:
15/11/2011 13:05:58 walterheck> ./pt-table-checksum -umaatkit --ask-pass --replicate=
15/11/2011 13:05:58 gryp> well, pt-table-checksum does not run queries on the slaves, it only logs in on the slaves to verify if the slave is lagging or not. if a slave is lagging, it will wait until it's back in sync. But in your case, the slave that it connects to is not the right one, so it monitoring the wrong server.
15/11/2011 13:06:25 gryp> try doing ./pt-table-checksum -umaatkit --ask-pass --replicate=
15/11/2011 13:06:34 walterheck> and then ./pt-table-checksum -umaatkit --ask-pass --replicate=
15/11/2011 13:06:41 gryp> it didnt' fail?
15/11/2011 13:06:50 gryp> the slave must have test.checksum by itself :)
15/11/2011 13:07:09 walterheck> that actually does not fail, it starts
15/11/2011 13:07:10 walterheck> root@db05:~# ./pt-table-checksum -umaatkit --ask-pass --replicate=
15/11/2011 13:07:10 walterheck> Differences on P=3306,
15/11/2011 13:07:10 walterheck> DB TBL CHUNK CNT_DIFF CRC_DIFF BOUNDARIES
15/11/2011 13:07:19 walterheck> and then a whole list of differences for all of the slaves
15/11/2011 13:07:25 gryp> and are they correct?
15/11/2011 13:07:29 walterheck> yup
15/11/2011 13:07:32 gryp> hmm.
15/11/2011 13:07:40 gryp> it also uses processlist to figure that out
15/11/2011 13:07:42 walterheck> gryp: my thought :)
15/11/2011 13:09:45 walterheck> gryp: MKDEBUG=1 ./pt-table-checksum -umaatkit --ask-pass --replicate=
15/11/2011 13:10:21 walterheck> that shows me that it does connect to 10.0.78.38 correctly and there runs a show proceslist and finds the other 4 slaves
15/11/2011 13:10:38 gryp> weird
15/11/2011 13:12:26 walterheck> gryp: ah, it seems that pt-table-sync just checks the one slave it finds, but doesn't recurse
15/11/2011 13:12:42 walterheck> and the one slave happens to have no differences, just the slaves of that one slave
15/11/2011 13:13:42 walterheck> gryp: and it doesn't seem to be able to do that
15/11/2011 13:14:35 walterheck> so basically I need to run pt-table-sync from the slave 'y' of the server 'x' i'm running it from now
15/11/2011 13:14:46 gryp> hmm
15/11/2011 13:14:46 walterheck> since y=x that makes no difference I guess?
15/11/2011 13:14:59 gryp> didn't know that it doesn't support recursion
15/11/2011 13:15:08 gryp> should not be a problem to fix the inconsistencies like that
15/11/2011 13:15:18 gryp> let's call it a missing feature? :)
tags: | added: pt-table-sync slave-recursion |
Changed in percona-toolkit: | |
status: | New → Triaged |
I'm going to repeat the original post here for reference, and edit the text of the bug report to be more concise.
As per a discussion just had on irc:
15/11/2011 12:32:39 walterheck> hello, I have a slave 'x' which serves as a master for another cluster. the second cluster is a master 'y' with 4 slaves. I used pt-table-checksum from 'x' to show me the changes (I already know that x is in sync with it's own master 'a', so I didn't want to run the checksum from 'a'). now when I run pt-table-sync from 'x', it's not doing anything. --verbose --execute shows just two lines and no actual queries. What's up there? test.checksum --empty- replicate- table --create- replicate- table --no-check- replication- filters localhost test.checksum --replicate-check=2 localhost test.checksum --replicate-check=2 --no-check- replication- filters localhost test.checksum --replicate-check=2 --no-check- replication- filters localhostEnter password for localhost: h=10.0. 100.19
15/11/2011 12:40:54 gryp> walterheck: try with 'env PTDEBUG'
15/11/2011 12:53:33 gryp> walterheck: which version of pt-table-checksum did you use?
15/11/2011 12:54:46 walterheck> root@db05:~# ./pt-table-sync --version
15/11/2011 12:54:46 walterheck> pt-table-sync 1.0.1
15/11/2011 12:56:02 gryp> are there any inconsistencies?
15/11/2011 12:56:13 gryp> :)
[...]
15/11/2011 13:00:33 walterheck> gryp: interesting though that mk-table-checksum had no problem finding the correct slave ip's?
15/11/2011 13:00:46 walterheck> *pt-table-checksum
[..]
15/11/2011 13:04:24 gryp> voila, so pt-table-checksum checks replication lag
15/11/2011 13:04:27 gryp> and there's none
15/11/2011 13:04:33 gryp> it does not check where it is replicating from
15/11/2011 13:04:37 gryp> and doesn't check the checksum table
15/11/2011 13:04:41 gryp> unless you do --replicate-check2
15/11/2011 13:05:01 walterheck> not sure I follow. I used this for checksumming:
15/11/2011 13:05:58 walterheck> ./pt-table-checksum -umaatkit --ask-pass --replicate=
15/11/2011 13:05:58 gryp> well, pt-table-checksum does not run queries on the slaves, it only logs in on the slaves to verify if the slave is lagging or not. if a slave is lagging, it will wait until it's back in sync. But in your case, the slave that it connects to is not the right one, so it monitoring the wrong server.
15/11/2011 13:06:25 gryp> try doing ./pt-table-checksum -umaatkit --ask-pass --replicate=
15/11/2011 13:06:34 walterheck> and then ./pt-table-checksum -umaatkit --ask-pass --replicate=
15/11/2011 13:06:41 gryp> it didnt' fail?
15/11/2011 13:06:50 gryp> the slave must have test.checksum by itself :)
15/11/2011 13:07:09 walterheck> that actually does not fail, it starts
15/11/2011 13:07:10 walterheck> root@db05:~# ./pt-table-checksum -umaatkit --ask-pass --replicate=
15/11/2011 13:07:10 walterheck> Differences on P=3306,
15/11/2011 13:07:10 walterheck> DB TBL CHUNK CNT_DIFF CRC_DIFF BOUNDARIES
15/11/2011 13:07:19 walterheck> and then a whole list of differences for all of the slaves
15/11/2011 13:07:25 gryp> and are they correct?
15/11/2011 13:07:29 walterheck> yup
15/11/2011 13:07:32 gryp> hmm.
15/11/2011 13:07:40 gryp> it al...