Comment 3 for bug 890650

Revision history for this message
Walter Heck (walterheck) wrote : Re: [Bug 890650] Re: pt-table-sync doesn't recursively find slaves as opposed to pt-table-checksum

Well, from my case it clearly didn't recurse to the slaves (C,D,E,F)
of the slave(B) of the server(A) I ran it from. It might be because A
and B were equal? I ended up running the sync from B and then it was
fine, so my conclusion was that most likely table-sync doesn't recurse
thrugh slave B to go and find C through F. I don't have the out put
anymore and the setup has changed (this was during a migration), but
I'm failry sure I concluded this after checking out the DEBUG output
of pt-table-sync.

On Wed, Nov 23, 2011 at 16:08, Baron Schwartz <email address hidden> wrote:
> Walter, can you state your feature request / bug more clearly?  pt-
> table-sync does support finding and recursing to all slaves, but I am
> not sure how you want that to behave differently.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/890650
>
> Title:
>  pt-table-sync doesn't recursively find slaves as opposed to pt-table-
>  checksum
>
> Status in Percona Toolkit:
>  New
>
> Bug description:
>  As per a discussion just had on irc:
>
>  15/11/2011 12:32:39  walterheck> hello, I have a slave 'x' which serves as a master for another cluster. the second cluster is a master 'y' with 4 slaves. I used pt-table-checksum from 'x' to show me the changes (I already know that x is in sync with it's own master 'a', so I didn't want to run the checksum from 'a'). now when I run pt-table-sync from 'x', it's not doing anything. --verbose --execute shows just two lines and no actual queries. What's up there?
>  15/11/2011 12:40:54  gryp> walterheck: try with 'env PTDEBUG'
>  15/11/2011 12:53:33  gryp> walterheck: which version of pt-table-checksum did you use?
>  15/11/2011 12:54:46  walterheck> root@db05:~# ./pt-table-sync --version
>  15/11/2011 12:54:46  walterheck> pt-table-sync 1.0.1
>  15/11/2011 12:56:02  gryp> are there any inconsistencies?
>  15/11/2011 12:56:13  gryp> :)
>  [...]
>  15/11/2011 13:00:33  walterheck> gryp: interesting though that mk-table-checksum had no problem finding the correct slave ip's?
>  15/11/2011 13:00:46  walterheck> *pt-table-checksum
>  [..]
>  15/11/2011 13:04:24  gryp> voila, so pt-table-checksum checks replication lag
>  15/11/2011 13:04:27  gryp> and there's none
>  15/11/2011 13:04:33  gryp> it does not check where it is replicating from
>  15/11/2011 13:04:37  gryp> and doesn't check the checksum table
>  15/11/2011 13:04:41  gryp> unless you do --replicate-check2
>  15/11/2011 13:05:01  walterheck> not sure I follow. I used this for checksumming:
>  15/11/2011 13:05:58  walterheck> ./pt-table-checksum -umaatkit --ask-pass --replicate=test.checksum --empty-replicate-table --create-replicate-table --no-check-replication-filters localhost
>  15/11/2011 13:05:58  gryp> well, pt-table-checksum does not run queries on the slaves, it only logs in on the slaves to verify if the slave is lagging or not. if a slave is lagging, it will wait until it's back in sync. But in your case, the slave that it connects to is not the right one, so it monitoring the wrong server.
>  15/11/2011 13:06:25  gryp> try doing ./pt-table-checksum -umaatkit --ask-pass --replicate=test.checksum --replicate-check=2 localhost
>  15/11/2011 13:06:34  walterheck> and then ./pt-table-checksum -umaatkit --ask-pass --replicate=test.checksum --replicate-check=2 --no-check-replication-filters localhost
>  15/11/2011 13:06:41  gryp> it didnt' fail?
>  15/11/2011 13:06:50  gryp> the slave must have test.checksum by itself :)
>  15/11/2011 13:07:09  walterheck> that actually does not fail, it starts
>  15/11/2011 13:07:10  walterheck> root@db05:~# ./pt-table-checksum -umaatkit --ask-pass --replicate=test.checksum --replicate-check=2 --no-check-replication-filters localhostEnter password for localhost:
>  15/11/2011 13:07:10  walterheck> Differences on P=3306,h=10.0.100.19
>  15/11/2011 13:07:10  walterheck> DB         TBL                       CHUNK CNT_DIFF CRC_DIFF BOUNDARIES
>  15/11/2011 13:07:19  walterheck> and then a whole list of differences for all of the slaves
>  15/11/2011 13:07:25  gryp> and are they correct?
>  15/11/2011 13:07:29  walterheck> yup
>  15/11/2011 13:07:32  gryp> hmm.
>  15/11/2011 13:07:40  gryp> it also uses processlist to figure that out
>  15/11/2011 13:07:42  walterheck> gryp: my thought :)
>  15/11/2011 13:09:45  walterheck> gryp: MKDEBUG=1 ./pt-table-checksum -umaatkit --ask-pass --replicate=test.checksum --replicate-check=2 --no-check-replication-filters localhost
>  15/11/2011 13:10:21  walterheck> that shows me that it does connect to 10.0.78.38 correctly and there runs a show proceslist and finds the other 4 slaves
>  15/11/2011 13:10:38  gryp> weird
>  15/11/2011 13:12:26  walterheck> gryp: ah, it seems that pt-table-sync just checks the one slave it finds, but doesn't recurse
>  15/11/2011 13:12:42  walterheck> and the one slave happens to have no differences, just the slaves of that one slave
>  15/11/2011 13:13:42  walterheck> gryp: and it doesn't seem to be able to do that
>  15/11/2011 13:14:35  walterheck> so basically I need to run pt-table-sync from the slave 'y' of the server 'x' i'm running it from now
>  15/11/2011 13:14:46  gryp> hmm
>  15/11/2011 13:14:46  walterheck> since y=x that makes no difference I guess?
>  15/11/2011 13:14:59  gryp> didn't know that it doesn't support recursion
>  15/11/2011 13:15:08  gryp> should not be a problem to fix the inconsistencies like that
>  15/11/2011 13:15:18  gryp> let's call it a missing feature? :)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/percona-toolkit/+bug/890650/+subscriptions
>

--
Walter Heck

--
follow @walterheck on twitter to see what I'm up to!
--
Check out my new startup: Server Monitoring as a Service @ http://tribily.com
Follow @tribily on Twitter and/or 'Like' our Facebook page at
http://www.facebook.com/tribily