I'm pretty sure that it's a bug since I have the same without the 1
hour param. Also, it was running like this for a long time, since it
increases the sleep time by 0.25 seconds every iteration. The log was
just an excerpt. I'm relatively sure I let it run for more then an
hour too.
Besides that, the cluster was quiet at that time, not doing a lot of load.
Do you have any ideas of what I could do to gather more info?
Thanks!
On Tue, Mar 27, 2012 at 21:27, Baron Schwartz <email address hidden> wrote:
> Infinite loop, or one-hour loop? You've set the tool to tolerate a max
> replication lag of an hour, and it looks to me like we're just waiting
> for the checksums to actually appear on the replicas. I can't see
> evidence that it's a bug. Perhaps we should report replication lag in
> the message as well so we have more information on what's happening.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/965987
>
> Title:
> pt-table-checksum gets stuck in "Waiting to check replicas for
> differences: 0% 00:00 remain"
>
> Status in Percona Toolkit:
> New
>
> Bug description:
> pt-table-checksum ends up in an infinite loop after a bunch of tables:
>
> <pre>
> Waiting to check replicas for differences: 0% 00:00 remain
> # pt_table_checksum:7367 18166 Sleep 44 waiting for chunks
> # pt_table_checksum:7340 18166 server01 max chunk: undef
> # pt_table_checksum:7340 18166 server02 max chunk: undef
> # pt_table_checksum:7340 18166 server03 max chunk: undef
> # pt_table_checksum:7340 18166 server04 max chunk: undef
> Waiting to check replicas for differences: 0% 00:00 remain
> # pt_table_checksum:7367 18166 Sleep 44.25 waiting for chunks
> # pt_table_checksum:7340 18166 server01 max chunk: undef
> # pt_table_checksum:7340 18166 server02 max chunk: undef
> # pt_table_checksum:7340 18166 server03 max chunk: undef
> # pt_table_checksum:7340 18166 server04 max chunk: undef
> Waiting to check replicas for differences: 0% 00:00 remain
> # pt_table_checksum:7367 18166 Sleep 44.5 waiting for chunks
> # pt_table_checksum:7340 18166 server01 max chunk: undef
> # pt_table_checksum:7340 18166 server02 max chunk: undef
> # pt_table_checksum:7340 18166 server03 max chunk: undef
> # pt_table_checksum:7340 18166 server04 max chunk: undef
> Waiting to check replicas for differences: 0% 00:00 remain
> # pt_table_checksum:7367 18166 Sleep 44.75 waiting for chunks
> # pt_table_checksum:7340 18166 server01 max chunk: undef
> # pt_table_checksum:7340 18166 server02 max chunk: undef
> # pt_table_checksum:7340 18166 server03 max chunk: undef
> # pt_table_checksum:7340 18166 server04 max chunk: undef
> Waiting to check replicas for differences: 0% 00:00 remain
> # pt_table_checksum:7367 18166 Sleep 45 waiting for chunks
> # pt_table_checksum:7340 18166 server01 max chunk: undef
> # pt_table_checksum:7340 18166 server02 max chunk: undef
> # pt_table_checksum:7340 18166 server03 max chunk: undef
> # pt_table_checksum:7340 18166 server04 max chunk: undef
> Waiting to check replicas for differences: 0% 00:00 remain
> # pt_table_checksum:7367 18166 Sleep 45.25 waiting for chunks
> </pre>
>
> Then when breaking with CTRL+C:
> <pre>
> # Caught SIGINT.
> # RowChecksum:3483 18166 SELECT CONCAT(db, '.', tbl) AS `table`, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(this_cnt-master_cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM `test`.`checksum` WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db='mysql' AND tbl='columns_priv')
> # pt_table_checksum:6667 18166 0 checksum diffs on server01
> # RowChecksum:3483 18166 SELECT CONCAT(db, '.', tbl) AS `table`, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(this_cnt-master_cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM `test`.`checksum` WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db='mysql' AND tbl='columns_priv')
> # pt_table_checksum:6667 18166 0 checksum diffs on server02
> # RowChecksum:3483 18166 SELECT CONCAT(db, '.', tbl) AS `table`, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(this_cnt-master_cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM `test`.`checksum` WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db='mysql' AND tbl='columns_priv')
> # pt_table_checksum:6667 18166 0 checksum diffs on server03
> # RowChecksum:3483 18166 SELECT CONCAT(db, '.', tbl) AS `table`, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(this_cnt-master_cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM `test`.`checksum` WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db='mysql' AND tbl='columns_priv')
> # pt_table_checksum:6667 18166 0 checksum diffs on server04
> 03-27T08:34:15 0 0 0 1 0 8120.737 mysql.columns_priv
> # OobNibbleIterator:4302 18166 Finish nibble_sth
> # OobNibbleIterator:4302 18166 Finish explain_nibble_sth
> # pt_table_checksum:6805 18166 Exit status 1 oktorun 0
> # Cxn:1514 18166 Disconnecting dbh DBI::db=HASH(0x1b4426a0) undef
> # Cxn:1514 18166 Disconnecting dbh DBI::db=HASH(0x1b442360) undef
> # Cxn:1514 18166 Disconnecting dbh DBI::db=HASH(0x1afe0bf0) undef
> # Cxn:1514 18166 Disconnecting dbh DBI::db=HASH(0x1b3fba50) undef
> # Cxn:1514 18166 Disconnecting dbh DBI::db=HASH(0x1afe0c10) undef
> </pre>
>
> The original command line is:
> # PTDEBUG=1 pt-table-checksum --empty-replicate-table -umaatkit --ask-pass --replicate=test.checksum --recurse=1 --max-lag=1h --no-check-replication-filters 192.168.1.1
>
> some more info:
> <pre>
> [~] $ uname -a
> Linux server0 2.6.18-274.17.1.el5 #1 SMP Tue Jan 10 17:25:58 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
> [~] $ date
> Tue Mar 27 08:36:38 BST 2012
> [~] $ lsb_release -a
> LSB Version: :core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
> Distributor ID: CentOS
> Description: CentOS release 5.8 (Final)
> Release: 5.8
> Codename: Final
> [~] $ yum list installed | grep perl
> perl.x86_64 4:5.8.8-38.el5 installed
> perl-Algorithm-Diff.noarch 1.1902-1.el5.rf installed
> perl-Class-Singleton.noarch 1.4-1.el5.rf installed
> perl-DBD-MySQL.x86_64 3.0007-2.el5 installed
> perl-DBI.x86_64 1.52-2.el5 installed
> perl-Git.x86_64 1.7.8.2-2.el5.rf installed
> perl-Log-Log4perl.noarch 1.26-1.el5.rf installed
> perl-Proc-Daemon.noarch 0.03-1.2.el5.rf installed
> perl-String-CRC32.x86_64 1.4-2.fc6 installed
> perl-TermReadKey.x86_64 2.30-3.el5.rf installed
> [~] $ yum list installed | grep percona
> percona-release.x86_64 0.0-1 installed
> percona-toolkit.noarch 2.0.4-1 installed
> </pre>
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/percona-toolkit/+bug/965987/+subscriptions
--
Walter Heck
--
follow @walterheck on twitter to see what I'm up to!
--
Check out my new startup: Server Monitoring as a Service @ http://tribily.com
Follow @tribily on Twitter and/or 'Like' our Facebook page at http://www.facebook.com/tribily
I'm pretty sure that it's a bug since I have the same without the 1
hour param. Also, it was running like this for a long time, since it
increases the sleep time by 0.25 seconds every iteration. The log was
just an excerpt. I'm relatively sure I let it run for more then an
hour too.
Besides that, the cluster was quiet at that time, not doing a lot of load.
Do you have any ideas of what I could do to gather more info?
Thanks!
On Tue, Mar 27, 2012 at 21:27, Baron Schwartz <email address hidden> wrote: /bugs.launchpad .net/bugs/ 965987 checksum: 7367 18166 Sleep 44 waiting for chunks checksum: 7340 18166 server01 max chunk: undef checksum: 7340 18166 server02 max chunk: undef checksum: 7340 18166 server03 max chunk: undef checksum: 7340 18166 server04 max chunk: undef checksum: 7367 18166 Sleep 44.25 waiting for chunks checksum: 7340 18166 server01 max chunk: undef checksum: 7340 18166 server02 max chunk: undef checksum: 7340 18166 server03 max chunk: undef checksum: 7340 18166 server04 max chunk: undef checksum: 7367 18166 Sleep 44.5 waiting for chunks checksum: 7340 18166 server01 max chunk: undef checksum: 7340 18166 server02 max chunk: undef checksum: 7340 18166 server03 max chunk: undef checksum: 7340 18166 server04 max chunk: undef checksum: 7367 18166 Sleep 44.75 waiting for chunks checksum: 7340 18166 server01 max chunk: undef checksum: 7340 18166 server02 max chunk: undef checksum: 7340 18166 server03 max chunk: undef checksum: 7340 18166 server04 max chunk: undef checksum: 7367 18166 Sleep 45 waiting for chunks checksum: 7340 18166 server01 max chunk: undef checksum: 7340 18166 server02 max chunk: undef checksum: 7340 18166 server03 max chunk: undef checksum: 7340 18166 server04 max chunk: undef checksum: 7367 18166 Sleep 45.25 waiting for chunks this_cnt- master_ cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM `test`.`checksum` WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db='mysql' AND tbl='columns_priv') checksum: 6667 18166 0 checksum diffs on server01 this_cnt- master_ cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM `test`.`checksum` WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db='mysql' AND tbl='columns_priv') checksum: 6667 18166 0 checksum diffs on server02 this_cnt- master_ cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM `test`.`checksum` WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db='mysql' AND tbl='columns_priv') checksum: 6667 18166 0 checksum diffs on server03 this_cnt- master_ cnt, 0) AS cnt_diff, COALESCE(this_crc <> master_crc OR ISNULL(master_crc) <> ISNULL(this_crc), 0) AS crc_diff, this_cnt, master_cnt, this_crc, master_crc FROM `test`.`checksum` WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc)) AND (db='mysql' AND tbl='columns_priv') checksum: 6667 18166 0 checksum diffs on server04 or:4302 18166 Finish nibble_sth or:4302 18166 Finish explain_nibble_sth checksum: 6805 18166 Exit status 1 oktorun 0 HASH(0x1b4426a0 ) undef HASH(0x1b442360 ) undef HASH(0x1afe0bf0 ) undef HASH(0x1b3fba50 ) undef HASH(0x1afe0c10 ) undef replicate- table -umaatkit --ask-pass --replicate= test.checksum --recurse=1 --max-lag=1h --no-check- replication- filters 192.168.1.1 0-amd64: core-4. 0-noarch: graphics- 4.0-amd64: graphics- 4.0-noarch: printing- 4.0-amd64: printing- 4.0-noarch Diff.noarch 1.1902-1.el5.rf installed Singleton. noarch 1.4-1.el5.rf installed MySQL.x86_ 64 3.0007-2.el5 installed Log4perl. noarch 1.26-1.el5.rf installed Daemon. noarch 0.03-1.2.el5.rf installed CRC32.x86_ 64 1.4-2.fc6 installed y.x86_64 2.30-3.el5.rf installed release. x86_64 0.0-1 installed toolkit. noarch 2.0.4-1 installed /bugs.launchpad .net/percona- toolkit/ +bug/965987/ +subscriptions
> Infinite loop, or one-hour loop? You've set the tool to tolerate a max
> replication lag of an hour, and it looks to me like we're just waiting
> for the checksums to actually appear on the replicas. I can't see
> evidence that it's a bug. Perhaps we should report replication lag in
> the message as well so we have more information on what's happening.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> pt-table-checksum gets stuck in "Waiting to check replicas for
> differences: 0% 00:00 remain"
>
> Status in Percona Toolkit:
> New
>
> Bug description:
> pt-table-checksum ends up in an infinite loop after a bunch of tables:
>
> <pre>
> Waiting to check replicas for differences: 0% 00:00 remain
> # pt_table_
> # pt_table_
> # pt_table_
> # pt_table_
> # pt_table_
> Waiting to check replicas for differences: 0% 00:00 remain
> # pt_table_
> # pt_table_
> # pt_table_
> # pt_table_
> # pt_table_
> Waiting to check replicas for differences: 0% 00:00 remain
> # pt_table_
> # pt_table_
> # pt_table_
> # pt_table_
> # pt_table_
> Waiting to check replicas for differences: 0% 00:00 remain
> # pt_table_
> # pt_table_
> # pt_table_
> # pt_table_
> # pt_table_
> Waiting to check replicas for differences: 0% 00:00 remain
> # pt_table_
> # pt_table_
> # pt_table_
> # pt_table_
> # pt_table_
> Waiting to check replicas for differences: 0% 00:00 remain
> # pt_table_
> </pre>
>
> Then when breaking with CTRL+C:
> <pre>
> # Caught SIGINT.
> # RowChecksum:3483 18166 SELECT CONCAT(db, '.', tbl) AS `table`, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(
> # pt_table_
> # RowChecksum:3483 18166 SELECT CONCAT(db, '.', tbl) AS `table`, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(
> # pt_table_
> # RowChecksum:3483 18166 SELECT CONCAT(db, '.', tbl) AS `table`, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(
> # pt_table_
> # RowChecksum:3483 18166 SELECT CONCAT(db, '.', tbl) AS `table`, chunk, chunk_index, lower_boundary, upper_boundary, COALESCE(
> # pt_table_
> 03-27T08:34:15 0 0 0 1 0 8120.737 mysql.columns_priv
> # OobNibbleIterat
> # OobNibbleIterat
> # pt_table_
> # Cxn:1514 18166 Disconnecting dbh DBI::db=
> # Cxn:1514 18166 Disconnecting dbh DBI::db=
> # Cxn:1514 18166 Disconnecting dbh DBI::db=
> # Cxn:1514 18166 Disconnecting dbh DBI::db=
> # Cxn:1514 18166 Disconnecting dbh DBI::db=
> </pre>
>
> The original command line is:
> # PTDEBUG=1 pt-table-checksum --empty-
>
> some more info:
> <pre>
> [~] $ uname -a
> Linux server0 2.6.18-274.17.1.el5 #1 SMP Tue Jan 10 17:25:58 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
> [~] $ date
> Tue Mar 27 08:36:38 BST 2012
> [~] $ lsb_release -a
> LSB Version: :core-4.
> Distributor ID: CentOS
> Description: CentOS release 5.8 (Final)
> Release: 5.8
> Codename: Final
> [~] $ yum list installed | grep perl
> perl.x86_64 4:5.8.8-38.el5 installed
> perl-Algorithm-
> perl-Class-
> perl-DBD-
> perl-DBI.x86_64 1.52-2.el5 installed
> perl-Git.x86_64 1.7.8.2-2.el5.rf installed
> perl-Log-
> perl-Proc-
> perl-String-
> perl-TermReadKe
> [~] $ yum list installed | grep percona
> percona-
> percona-
> </pre>
>
> To manage notifications about this bug go to:
> https:/
--
Walter Heck
-- tribily. com www.facebook. com/tribily
follow @walterheck on twitter to see what I'm up to!
--
Check out my new startup: Server Monitoring as a Service @ http://
Follow @tribily on Twitter and/or 'Like' our Facebook page at
http://