Hi Frank, let me check to be sure which one I saw it, what I remember it
was pt_table_sync but let me take a look, don't remember in this moment
when I check this was 3 or 4 months ago... thanks
2014-07-30 9:22 GMT-07:00 Frank Cizmich <email address hidden>:
> Sorry, I now see you mentioned 2.2.6 right of the bat.
>
> Still, pt-osc, pt-archiver and pt-heartbeat catch that error in that
> version.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1287253
>
> Title:
> pt-table-checksum deadlock
>
> Status in Percona Toolkit:
> Fix Committed
>
> Bug description:
> Percona Toolkik 2.2.6
>
> We currently using pt-table-checksum to check data integrity on all
> the oltp cluster, we have 6 servers there with 8k qps and we start to
> get deadlocks on one of the biggest tables when this tool try to get
> the checksum on some chunks. I already drop chunk time, but it doesn't
> fix completly the problem :-/.
>
> Command Line used:
>
> pt-table-checksum --host=sql1v-ci.ci.prd.********
> --user=pt_checksum --password=******* --nocheck-binlog-format
> --check-interval=1 --check-plan --check-replication-filters
> --check-slave-tables --chunk-size-limit=1 --chunk-time="0.025"
> --create-replicate-table --databases="tewn"
> --tables="landing_page_analytics" --empty-replicate-table --max-
> lag="1s" --max-load="Threads_running=40" --progress="time,30"
> --replicate="pt_data.checksums" --retries=5
>
> Checksumming tewn.landing_page_analytics: 1% 25:28 remain
> Checksumming tewn.landing_page_analytics: 4% 22:42 remain
> Checksumming tewn.landing_page_analytics: 6% 20:59 remain
> Checksumming tewn.landing_page_analytics: 8% 20:35 remain
> Checksumming tewn.landing_page_analytics: 10% 21:59 remain
> Checksumming tewn.landing_page_analytics: 12% 20:42 remain
> Checksumming tewn.landing_page_analytics: 14% 20:39 remain
> Checksumming tewn.landing_page_analytics: 16% 19:50 remain
> Checksumming tewn.landing_page_analytics: 18% 20:04 remain
> Checksumming tewn.landing_page_analytics: 19% 21:14 remain
> Checksumming tewn.landing_page_analytics: 21% 20:21 remain
> Checksumming tewn.landing_page_analytics: 23% 19:27 remain
> Checksumming tewn.landing_page_analytics: 25% 18:53 remain
> Checksumming tewn.landing_page_analytics: 28% 18:15 remain
> Checksumming tewn.landing_page_analytics: 30% 17:41 remain
> Checksumming tewn.landing_page_analytics: 32% 16:52 remain
> Checksumming tewn.landing_page_analytics: 34% 16:30 remain
> Checksumming tewn.landing_page_analytics: 36% 15:42 remain
> Checksumming tewn.landing_page_analytics: 39% 15:29 remain
> Checksumming tewn.landing_page_analytics: 41% 14:53 remain
> Checksumming tewn.landing_page_analytics: 43% 14:15 remain
> Checksumming tewn.landing_page_analytics: 45% 13:33 remain
> Checksumming tewn.landing_page_analytics: 48% 12:50 remain
> Checksumming tewn.landing_page_analytics: 50% 12:26 remain
> Checksumming tewn.landing_page_analytics: 52% 11:43 remain
> Checksumming tewn.landing_page_analytics: 54% 11:15 remain
> Checksumming tewn.landing_page_analytics: 55% 11:04 remain
> Checksumming tewn.landing_page_analytics: 57% 10:53 remain
> 02-28T15:30:44 Error checksumming table tewn.landing_page_analytics:
> Error executing checksum query: DBD::mysql::st execute failed: Deadlock
> found when trying to get lock; try restarting transaction [for Statement
> "REPLACE INTO `pt_data`.`checksums` (db, tbl, chunk, chunk_index,
> lower_boundary, upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?,
> ?, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#',
> `id`, `domain_bitfield`, `page_id`, `category`, `demographic_sex`,
> `demographic_age`, `content_rating`, `num_ctr`, `num_display`, `weight`,
> `weight_vw`, `created`, `visible`, `type`, `key`, `updated` + 0,
> CONCAT(ISNULL(`domain_bitfield`), ISNULL(`page_id`), ISNULL(`num_ctr`),
> ISNULL(`num_display`), ISNULL(`weight`), ISNULL(`created`)))) AS
> UNSIGNED)), 10, 16)), 0) AS crc FROM `tewn`.`landing_page_analytics` FORCE
> INDEX(`PRIMARY`) WHERE ((`id` >= ?)) AND ((`id` <= ?)) /*checksum chunk*/"
> with ParamValues: 0='tewn', 1='landing_page_analytics', 2=16297,
> 3='PRIMARY', 4='159650735', 5='159662859', 6='159650735', 7='159662859'] at
> /usr/bin/pt-table-checksum line 10459.
>
>
> TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
> 02-28T15:30:44 1 0 80558617 16297 0 886.019
> tewn.landing_page_analytics
>
>
> May suggest something, I was checking this script and MySQL suggest when
> you have this kind of deadlock you must to retry to execute again this
> sentence, but in pt-table-checksum script line 10510 we have "fail"
> function which is evaluating kind of error so, retry argument option for
> the tool will retry to execute the last sql statement, like I said MySQL
> suggest to retry to execute, so with this modification this could be fixed.
>
> fail => sub {
> my (%args) = @_;
> my $error = $args{error};
>
> if ( $error =~ m/Lock wait timeout exceeded/
> || $error =~ m/Query execution was interrupted/
> || $error =~ m/Deadlock found when trying to get lock/
> ) {
> return 1;
> }
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/percona-toolkit/+bug/1287253/+subscriptions
>
Hi Frank, let me check to be sure which one I saw it, what I remember it
was pt_table_sync but let me take a look, don't remember in this moment
when I check this was 3 or 4 months ago... thanks
2014-07-30 9:22 GMT-07:00 Frank Cizmich <email address hidden>:
> Sorry, I now see you mentioned 2.2.6 right of the bat. /bugs.launchpad .net/bugs/ 1287253 sql1v-ci. ci.prd. ******* * binlog- format replication- filters slave-tables --chunk- size-limit= 1 --chunk- time="0. 025" replicate- table --databases="tewn" "landing_ page_analytics" --empty- replicate- table --max- "Threads_ running= 40" --progress= "time,30" "pt_data. checksums" --retries=5 page_analytics: 1% 25:28 remain page_analytics: 4% 22:42 remain page_analytics: 6% 20:59 remain page_analytics: 8% 20:35 remain page_analytics: 10% 21:59 remain page_analytics: 12% 20:42 remain page_analytics: 14% 20:39 remain page_analytics: 16% 19:50 remain page_analytics: 18% 20:04 remain page_analytics: 19% 21:14 remain page_analytics: 21% 20:21 remain page_analytics: 23% 19:27 remain page_analytics: 25% 18:53 remain page_analytics: 28% 18:15 remain page_analytics: 30% 17:41 remain page_analytics: 32% 16:52 remain page_analytics: 34% 16:30 remain page_analytics: 36% 15:42 remain page_analytics: 39% 15:29 remain page_analytics: 41% 14:53 remain page_analytics: 43% 14:15 remain page_analytics: 45% 13:33 remain page_analytics: 48% 12:50 remain page_analytics: 50% 12:26 remain page_analytics: 52% 11:43 remain page_analytics: 54% 11:15 remain page_analytics: 55% 11:04 remain page_analytics: 57% 10:53 remain page_analytics: .`checksums` (db, tbl, chunk, chunk_index, LOWER(CONV( BIT_XOR( CAST(CRC32( CONCAT_ WS('#', ISNULL( `domain_ bitfield` ), ISNULL(`page_id`), ISNULL(`num_ctr`), `num_display` ), ISNULL(`weight`), ISNULL( `created` )))) AS `landing_ page_analytics` FORCE page_analytics' , 2=16297, pt-table- checksum line 10459. page_analytics /bugs.launchpad .net/percona- toolkit/ +bug/1287253/ +subscriptions
>
> Still, pt-osc, pt-archiver and pt-heartbeat catch that error in that
> version.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> pt-table-checksum deadlock
>
> Status in Percona Toolkit:
> Fix Committed
>
> Bug description:
> Percona Toolkik 2.2.6
>
> We currently using pt-table-checksum to check data integrity on all
> the oltp cluster, we have 6 servers there with 8k qps and we start to
> get deadlocks on one of the biggest tables when this tool try to get
> the checksum on some chunks. I already drop chunk time, but it doesn't
> fix completly the problem :-/.
>
> Command Line used:
>
> pt-table-checksum --host=
> --user=pt_checksum --password=******* --nocheck-
> --check-interval=1 --check-plan --check-
> --check-
> --create-
> --tables=
> lag="1s" --max-load=
> --replicate=
>
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> Checksumming tewn.landing_
> 02-28T15:30:44 Error checksumming table tewn.landing_
> Error executing checksum query: DBD::mysql::st execute failed: Deadlock
> found when trying to get lock; try restarting transaction [for Statement
> "REPLACE INTO `pt_data`
> lower_boundary, upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?,
> ?, COUNT(*) AS cnt, COALESCE(
> `id`, `domain_bitfield`, `page_id`, `category`, `demographic_sex`,
> `demographic_age`, `content_rating`, `num_ctr`, `num_display`, `weight`,
> `weight_vw`, `created`, `visible`, `type`, `key`, `updated` + 0,
> CONCAT(
> ISNULL(
> UNSIGNED)), 10, 16)), 0) AS crc FROM `tewn`.
> INDEX(`PRIMARY`) WHERE ((`id` >= ?)) AND ((`id` <= ?)) /*checksum chunk*/"
> with ParamValues: 0='tewn', 1='landing_
> 3='PRIMARY', 4='159650735', 5='159662859', 6='159650735', 7='159662859'] at
> /usr/bin/
>
>
> TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
> 02-28T15:30:44 1 0 80558617 16297 0 886.019
> tewn.landing_
>
>
> May suggest something, I was checking this script and MySQL suggest when
> you have this kind of deadlock you must to retry to execute again this
> sentence, but in pt-table-checksum script line 10510 we have "fail"
> function which is evaluating kind of error so, retry argument option for
> the tool will retry to execute the last sql statement, like I said MySQL
> suggest to retry to execute, so with this modification this could be fixed.
>
> fail => sub {
> my (%args) = @_;
> my $error = $args{error};
>
> if ( $error =~ m/Lock wait timeout exceeded/
> || $error =~ m/Query execution was interrupted/
> || $error =~ m/Deadlock found when trying to get lock/
> ) {
> return 1;
> }
>
> To manage notifications about this bug go to:
> https:/
>