pt-table-sync doesn't recursively find slaves as opposed to pt-table-checksum

Reported by Walter Heck on 2011-11-15
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Toolkit
Undecided
Unassigned

Bug Description

As per a discussion just had on irc:

15/11/2011 12:32:39 walterheck> hello, I have a slave 'x' which serves as a master for another cluster. the second cluster is a master 'y' with 4 slaves. I used pt-table-checksum from 'x' to show me the changes (I already know that x is in sync with it's own master 'a', so I didn't want to run the checksum from 'a'). now when I run pt-table-sync from 'x', it's not doing anything. --verbose --execute shows just two lines and no actual queries. What's up there?
15/11/2011 12:40:54 gryp> walterheck: try with 'env PTDEBUG'
15/11/2011 12:53:33 gryp> walterheck: which version of pt-table-checksum did you use?
15/11/2011 12:54:46 walterheck> root@db05:~# ./pt-table-sync --version
15/11/2011 12:54:46 walterheck> pt-table-sync 1.0.1
15/11/2011 12:56:02 gryp> are there any inconsistencies?
15/11/2011 12:56:13 gryp> :)
[...]
15/11/2011 13:00:33 walterheck> gryp: interesting though that mk-table-checksum had no problem finding the correct slave ip's?
15/11/2011 13:00:46 walterheck> *pt-table-checksum
[..]
15/11/2011 13:04:24 gryp> voila, so pt-table-checksum checks replication lag
15/11/2011 13:04:27 gryp> and there's none
15/11/2011 13:04:33 gryp> it does not check where it is replicating from
15/11/2011 13:04:37 gryp> and doesn't check the checksum table
15/11/2011 13:04:41 gryp> unless you do --replicate-check2
15/11/2011 13:05:01 walterheck> not sure I follow. I used this for checksumming:
15/11/2011 13:05:58 walterheck> ./pt-table-checksum -umaatkit --ask-pass --replicate=test.checksum --empty-replicate-table --create-replicate-table --no-check-replication-filters localhost
15/11/2011 13:05:58 gryp> well, pt-table-checksum does not run queries on the slaves, it only logs in on the slaves to verify if the slave is lagging or not. if a slave is lagging, it will wait until it's back in sync. But in your case, the slave that it connects to is not the right one, so it monitoring the wrong server.
15/11/2011 13:06:25 gryp> try doing ./pt-table-checksum -umaatkit --ask-pass --replicate=test.checksum --replicate-check=2 localhost
15/11/2011 13:06:34 walterheck> and then ./pt-table-checksum -umaatkit --ask-pass --replicate=test.checksum --replicate-check=2 --no-check-replication-filters localhost
15/11/2011 13:06:41 gryp> it didnt' fail?
15/11/2011 13:06:50 gryp> the slave must have test.checksum by itself :)
15/11/2011 13:07:09 walterheck> that actually does not fail, it starts
15/11/2011 13:07:10 walterheck> root@db05:~# ./pt-table-checksum -umaatkit --ask-pass --replicate=test.checksum --replicate-check=2 --no-check-replication-filters localhostEnter password for localhost:
15/11/2011 13:07:10 walterheck> Differences on P=3306,h=10.0.100.19
15/11/2011 13:07:10 walterheck> DB TBL CHUNK CNT_DIFF CRC_DIFF BOUNDARIES
15/11/2011 13:07:19 walterheck> and then a whole list of differences for all of the slaves
15/11/2011 13:07:25 gryp> and are they correct?
15/11/2011 13:07:29 walterheck> yup
15/11/2011 13:07:32 gryp> hmm.
15/11/2011 13:07:40 gryp> it also uses processlist to figure that out
15/11/2011 13:07:42 walterheck> gryp: my thought :)
15/11/2011 13:09:45 walterheck> gryp: MKDEBUG=1 ./pt-table-checksum -umaatkit --ask-pass --replicate=test.checksum --replicate-check=2 --no-check-replication-filters localhost
15/11/2011 13:10:21 walterheck> that shows me that it does connect to 10.0.78.38 correctly and there runs a show proceslist and finds the other 4 slaves
15/11/2011 13:10:38 gryp> weird
15/11/2011 13:12:26 walterheck> gryp: ah, it seems that pt-table-sync just checks the one slave it finds, but doesn't recurse
15/11/2011 13:12:42 walterheck> and the one slave happens to have no differences, just the slaves of that one slave
15/11/2011 13:13:42 walterheck> gryp: and it doesn't seem to be able to do that
15/11/2011 13:14:35 walterheck> so basically I need to run pt-table-sync from the slave 'y' of the server 'x' i'm running it from now
15/11/2011 13:14:46 gryp> hmm
15/11/2011 13:14:46 walterheck> since y=x that makes no difference I guess?
15/11/2011 13:14:59 gryp> didn't know that it doesn't support recursion
15/11/2011 13:15:08 gryp> should not be a problem to fix the inconsistencies like that
15/11/2011 13:15:18 gryp> let's call it a missing feature? :)

Baron Schwartz (baron-xaprb) wrote :
Download full text (4.3 KiB)

I'm going to repeat the original post here for reference, and edit the text of the bug report to be more concise.

As per a discussion just had on irc:

15/11/2011 12:32:39 walterheck> hello, I have a slave 'x' which serves as a master for another cluster. the second cluster is a master 'y' with 4 slaves. I used pt-table-checksum from 'x' to show me the changes (I already know that x is in sync with it's own master 'a', so I didn't want to run the checksum from 'a'). now when I run pt-table-sync from 'x', it's not doing anything. --verbose --execute shows just two lines and no actual queries. What's up there?
15/11/2011 12:40:54 gryp> walterheck: try with 'env PTDEBUG'
15/11/2011 12:53:33 gryp> walterheck: which version of pt-table-checksum did you use?
15/11/2011 12:54:46 walterheck> root@db05:~# ./pt-table-sync --version
15/11/2011 12:54:46 walterheck> pt-table-sync 1.0.1
15/11/2011 12:56:02 gryp> are there any inconsistencies?
15/11/2011 12:56:13 gryp> :)
[...]
15/11/2011 13:00:33 walterheck> gryp: interesting though that mk-table-checksum had no problem finding the correct slave ip's?
15/11/2011 13:00:46 walterheck> *pt-table-checksum
[..]
15/11/2011 13:04:24 gryp> voila, so pt-table-checksum checks replication lag
15/11/2011 13:04:27 gryp> and there's none
15/11/2011 13:04:33 gryp> it does not check where it is replicating from
15/11/2011 13:04:37 gryp> and doesn't check the checksum table
15/11/2011 13:04:41 gryp> unless you do --replicate-check2
15/11/2011 13:05:01 walterheck> not sure I follow. I used this for checksumming:
15/11/2011 13:05:58 walterheck> ./pt-table-checksum -umaatkit --ask-pass --replicate=test.checksum --empty-replicate-table --create-replicate-table --no-check-replication-filters localhost
15/11/2011 13:05:58 gryp> well, pt-table-checksum does not run queries on the slaves, it only logs in on the slaves to verify if the slave is lagging or not. if a slave is lagging, it will wait until it's back in sync. But in your case, the slave that it connects to is not the right one, so it monitoring the wrong server.
15/11/2011 13:06:25 gryp> try doing ./pt-table-checksum -umaatkit --ask-pass --replicate=test.checksum --replicate-check=2 localhost
15/11/2011 13:06:34 walterheck> and then ./pt-table-checksum -umaatkit --ask-pass --replicate=test.checksum --replicate-check=2 --no-check-replication-filters localhost
15/11/2011 13:06:41 gryp> it didnt' fail?
15/11/2011 13:06:50 gryp> the slave must have test.checksum by itself :)
15/11/2011 13:07:09 walterheck> that actually does not fail, it starts
15/11/2011 13:07:10 walterheck> root@db05:~# ./pt-table-checksum -umaatkit --ask-pass --replicate=test.checksum --replicate-check=2 --no-check-replication-filters localhostEnter password for localhost:
15/11/2011 13:07:10 walterheck> Differences on P=3306,h=10.0.100.19
15/11/2011 13:07:10 walterheck> DB TBL CHUNK CNT_DIFF CRC_DIFF BOUNDARIES
15/11/2011 13:07:19 walterheck> and then a whole list of differences for all of the slaves
15/11/2011 13:07:25 gryp> and are they correct?
15/11/2011 13:07:29 walterheck> yup
15/11/2011 13:07:32 gryp> hmm.
15/11/2011 13:07:40 gryp> it al...

Read more...

Baron Schwartz (baron-xaprb) wrote :

Walter, can you state your feature request / bug more clearly? pt-table-sync does support finding and recursing to all slaves, but I am not sure how you want that to behave differently.

Download full text (5.8 KiB)

Well, from my case it clearly didn't recurse to the slaves (C,D,E,F)
of the slave(B) of the server(A) I ran it from. It might be because A
and B were equal? I ended up running the sync from B and then it was
fine, so my conclusion was that most likely table-sync doesn't recurse
thrugh slave B to go and find C through F. I don't have the out put
anymore and the setup has changed (this was during a migration), but
I'm failry sure I concluded this after checking out the DEBUG output
of pt-table-sync.

On Wed, Nov 23, 2011 at 16:08, Baron Schwartz <email address hidden> wrote:
> Walter, can you state your feature request / bug more clearly?  pt-
> table-sync does support finding and recursing to all slaves, but I am
> not sure how you want that to behave differently.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/890650
>
> Title:
>  pt-table-sync doesn't recursively find slaves as opposed to pt-table-
>  checksum
>
> Status in Percona Toolkit:
>  New
>
> Bug description:
>  As per a discussion just had on irc:
>
>  15/11/2011 12:32:39  walterheck> hello, I have a slave 'x' which serves as a master for another cluster. the second cluster is a master 'y' with 4 slaves. I used pt-table-checksum from 'x' to show me the changes (I already know that x is in sync with it's own master 'a', so I didn't want to run the checksum from 'a'). now when I run pt-table-sync from 'x', it's not doing anything. --verbose --execute shows just two lines and no actual queries. What's up there?
>  15/11/2011 12:40:54  gryp> walterheck: try with 'env PTDEBUG'
>  15/11/2011 12:53:33  gryp> walterheck: which version of pt-table-checksum did you use?
>  15/11/2011 12:54:46  walterheck> root@db05:~# ./pt-table-sync --version
>  15/11/2011 12:54:46  walterheck> pt-table-sync 1.0.1
>  15/11/2011 12:56:02  gryp> are there any inconsistencies?
>  15/11/2011 12:56:13  gryp> :)
>  [...]
>  15/11/2011 13:00:33  walterheck> gryp: interesting though that mk-table-checksum had no problem finding the correct slave ip's?
>  15/11/2011 13:00:46  walterheck> *pt-table-checksum
>  [..]
>  15/11/2011 13:04:24  gryp> voila, so pt-table-checksum checks replication lag
>  15/11/2011 13:04:27  gryp> and there's none
>  15/11/2011 13:04:33  gryp> it does not check where it is replicating from
>  15/11/2011 13:04:37  gryp> and doesn't check the checksum table
>  15/11/2011 13:04:41  gryp> unless you do --replicate-check2
>  15/11/2011 13:05:01  walterheck> not sure I follow. I used this for checksumming:
>  15/11/2011 13:05:58  walterheck> ./pt-table-checksum -umaatkit --ask-pass --replicate=test.checksum --empty-replicate-table --create-replicate-table --no-check-replication-filters localhost
>  15/11/2011 13:05:58  gryp> well, pt-table-checksum does not run queries on the slaves, it only logs in on the slaves to verify if the slave is lagging or not. if a slave is lagging, it will wait until it's back in sync. But in your case, the slave that it connects to is not the right one, so it monitoring the wrong server.
>  15/11/2011 13:06:25  gryp> try doing ./pt-table-checksum -umaatkit --ask-pass --replicate=test.checksum ...

Read more...

tags: added: pt-table-sync slave-recursion
Changed in percona-toolkit:
status: New → Triaged
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers