Comment 2 for bug 965987

Revision history for this message
Walter Heck (walterheck) wrote : Re: [Bug 965987] Re: pt-table-checksum gets stuck in "Waiting to check replicas for differences: 0% 00:00 remain"

I'm pretty sure that it's a bug since I have the same without the 1
hour param. Also, it was running like this for a long time, since it
increases the sleep time by 0.25 seconds every iteration. The log was
just an excerpt. I'm relatively sure I let it run for more then an
hour too.
Besides that, the cluster was quiet at that time, not doing a lot of load.

Do you have any ideas of what I could do to gather more info?

Thanks!

On Tue, Mar 27, 2012 at 21:27, Baron Schwartz <email address hidden> wrote:
> Infinite loop, or one-hour loop? You've set the tool to tolerate a max
> replication lag of an hour, and it looks to me like we're just waiting
> for the checksums to actually appear on the replicas.  I can't see
> evidence that it's a bug. Perhaps we should report replication lag in
> the message as well so we have more information on what's happening.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/965987
>
> Title:
>  pt-table-checksum gets stuck in "Waiting to check replicas for
>  differences:   0% 00:00 remain"
>
> Status in Percona Toolkit:
>  New
>
> Bug description:
>  pt-table-checksum ends up in an infinite loop after a bunch of tables:
>
>  <pre>
>  Waiting to check replicas for differences:   0% 00:00 remain
>  # pt_table_checksum:7367 18166 Sleep 44 waiting for chunks
>  # pt_table_checksum:7340 18166 server01 max chunk: undef
>  # pt_table_checksum:7340 18166 server02 max chunk: undef
>  # pt_table_checksum:7340 18166 server03 max chunk: undef
>  # pt_table_checksum:7340 18166 server04 max chunk: undef
>  Waiting to check replicas for differences:   0% 00:00 remain
>  # pt_table_checksum:7367 18166 Sleep 44.25 waiting for chunks
>  # pt_table_checksum:7340 18166 server01 max chunk: undef
>  # pt_table_checksum:7340 18166 server02 max chunk: undef
>  # pt_table_checksum:7340 18166 server03 max chunk: undef
>  # pt_table_checksum:7340 18166 server04 max chunk: undef
>  Waiting to check replicas for differences:   0% 00:00 remain
>  # pt_table_checksum:7367 18166 Sleep 44.5 waiting for chunks
>  # pt_table_checksum:7340 18166 server01 max chunk: undef
>  # pt_table_checksum:7340 18166 server02 max chunk: undef
>  # pt_table_checksum:7340 18166 server03 max chunk: undef
>  # pt_table_checksum:7340 18166 server04 max chunk: undef
>  Waiting to check replicas for differences:   0% 00:00 remain
>  # pt_table_checksum:7367 18166 Sleep 44.75 waiting for chunks
>  # pt_table_checksum:7340 18166 server01 max chunk: undef
>  # pt_table_checksum:7340 18166 server02 max chunk: undef
>  # pt_table_checksum:7340 18166 server03 max chunk: undef
>  # pt_table_checksum:7340 18166 server04 max chunk: undef
>  Waiting to check replicas for differences:   0% 00:00 remain
>  # pt_table_checksum:7367 18166 Sleep 45 waiting for chunks
>  # pt_table_checksum:7340 18166 server01 max chunk: undef
>  # pt_table_checksum:7340 18166 server02 max chunk: undef
>  # pt_table_checksum:7340 18166 server03 max chunk: undef
>  # pt_table_checksum:7340 18166 server04 max chunk: undef
>  Waiting to check replicas for differences:   0% 00:00 remain
>  # pt_table_checksum:7367 18166 Sleep 45.25 waiting for chunks
>  </pre>
>
>  Then when breaking with CTRL+C:
>  <pre>
>  # Caught SIGINT.
>  # RowChecksum:3483 18166 SELECT CONCAT(db, '.', tbl) AS `table`, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(this_cnt-master_cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM `test`.`checksum` WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db='mysql' AND tbl='columns_priv')
>  # pt_table_checksum:6667 18166 0 checksum diffs on server01
>  # RowChecksum:3483 18166 SELECT CONCAT(db, '.', tbl) AS `table`, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(this_cnt-master_cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM `test`.`checksum` WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db='mysql' AND tbl='columns_priv')
>  # pt_table_checksum:6667 18166 0 checksum diffs on server02
>  # RowChecksum:3483 18166 SELECT CONCAT(db, '.', tbl) AS `table`, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(this_cnt-master_cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM `test`.`checksum` WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db='mysql' AND tbl='columns_priv')
>  # pt_table_checksum:6667 18166 0 checksum diffs on server03
>  # RowChecksum:3483 18166 SELECT CONCAT(db, '.', tbl) AS `table`, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(this_cnt-master_cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM `test`.`checksum` WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db='mysql' AND tbl='columns_priv')
>  # pt_table_checksum:6667 18166 0 checksum diffs on server04
>  03-27T08:34:15      0      0        0       1       0 8120.737 mysql.columns_priv
>  # OobNibbleIterator:4302 18166 Finish nibble_sth
>  # OobNibbleIterator:4302 18166 Finish explain_nibble_sth
>  # pt_table_checksum:6805 18166 Exit status 1 oktorun 0
>  # Cxn:1514 18166 Disconnecting dbh DBI::db=HASH(0x1b4426a0) undef
>  # Cxn:1514 18166 Disconnecting dbh DBI::db=HASH(0x1b442360) undef
>  # Cxn:1514 18166 Disconnecting dbh DBI::db=HASH(0x1afe0bf0) undef
>  # Cxn:1514 18166 Disconnecting dbh DBI::db=HASH(0x1b3fba50) undef
>  # Cxn:1514 18166 Disconnecting dbh DBI::db=HASH(0x1afe0c10) undef
>  </pre>
>
>  The original command line is:
>  # PTDEBUG=1 pt-table-checksum --empty-replicate-table -umaatkit --ask-pass --replicate=test.checksum --recurse=1 --max-lag=1h --no-check-replication-filters 192.168.1.1
>
>  some more info:
>  <pre>
>  [~] $ uname -a
>  Linux server0 2.6.18-274.17.1.el5 #1 SMP Tue Jan 10 17:25:58 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
>  [~] $ date
>  Tue Mar 27 08:36:38 BST 2012
>  [~] $ lsb_release -a
>  LSB Version:    :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
>  Distributor ID: CentOS
>  Description:    CentOS release 5.8 (Final)
>  Release:        5.8
>  Codename:       Final
>  [~] $ yum list installed | grep perl
>  perl.x86_64                               4:5.8.8-38.el5               installed
>  perl-Algorithm-Diff.noarch                1.1902-1.el5.rf              installed
>  perl-Class-Singleton.noarch               1.4-1.el5.rf                 installed
>  perl-DBD-MySQL.x86_64                     3.0007-2.el5                 installed
>  perl-DBI.x86_64                           1.52-2.el5                   installed
>  perl-Git.x86_64                           1.7.8.2-2.el5.rf             installed
>  perl-Log-Log4perl.noarch                  1.26-1.el5.rf                installed
>  perl-Proc-Daemon.noarch                   0.03-1.2.el5.rf              installed
>  perl-String-CRC32.x86_64                  1.4-2.fc6                    installed
>  perl-TermReadKey.x86_64                   2.30-3.el5.rf                installed
>  [~] $ yum list installed | grep percona
>  percona-release.x86_64                    0.0-1                        installed
>  percona-toolkit.noarch                    2.0.4-1                      installed
>  </pre>
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/percona-toolkit/+bug/965987/+subscriptions

--
Walter Heck

--
follow @walterheck on twitter to see what I'm up to!
--
Check out my new startup: Server Monitoring as a Service @ http://tribily.com
Follow @tribily on Twitter and/or 'Like' our Facebook page at
http://www.facebook.com/tribily