pt-table-checksum doesn't check the size of checksum chunks

Bug #1010232 reported by Daniel Nichter
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
High
Daniel Nichter

Bug Description

pt-table-checksum 2.1.1 explains the next upper boundary statement and single-chunk statements, but it does not explain checksum chunk queries (i.e .when chunking a table). Usually, these are safe, but on occasion (e.g. Percona 23486), MySQL will access too many rows. So all checksum chunk nibbles need to be explained, too, and --chunk-size-limit applied to them.

Related branches

tags: added: safeguard
tags: added: pt-online-schema-change
Changed in percona-toolkit:
status: In Progress → Fix Committed
Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

I've found additional problems with this when I specify a prefix that's larger than the number of columns in the index, in pt-osc.

$ bin/pt-online-schema-change --alter='engine=innodb' h=127.1,P=3306,D=sakila,t=film_actor,u=root --lock-wait-timeout=50 --dry-run --chunk-index PRIMARY:3
Starting a dry run. `sakila`.`film_actor` will not be altered. Specify --execute instead of --dry-run to alter the table.
Creating new table...
Created new table sakila._film_actor_new OK.
Altering new table...
Altered `sakila`.`_film_actor_new` OK.
Not creating triggers because this is a dry run.
Not dropping triggers because this is a dry run.
Dropping new table...
Dropped new table OK.
Dry run complete. `sakila`.`film_actor` was not altered.
Use of uninitialized value in exists at bin/pt-online-schema-change line 1904.

Changed in percona-toolkit:
status: Fix Committed → In Progress
Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

Another problem.

$ bin/pt-online-schema-change --alter='engine=innodb' h=127.1,P=3306,D=sakila,t=film_actor,u=root --lock-wait-timeout=50 --chunk-index PRIMARY:1 --execute --chunk-size 200
Altering `sakila`.`film_actor`...
Creating new table...
Created new table sakila._film_actor_new OK.
Altering new table...
Altered `sakila`.`_film_actor_new` OK.
Creating triggers...
Created triggers OK.
Copying approximately 5143 rows...
Dropping triggers...
Dropped triggers OK.
Dropping new table...
Dropped new table OK.
`sakila`.`film_actor` was not altered.
Error copying rows from `sakila`.`film_actor` to `sakila`.`_film_actor_new`: Use of uninitialized value in numeric lt (<) at bin/pt-online-schema-change line 6372.

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

The previous comment is reproducible also without --chunk-size 200.

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

Running a full prove -rs t:

t/pt-online-schema-change/check_tables............Can't locate object method "new" via package "TableChunker" (perhaps you forgot to load "TableChunker"?) at t/pt-online-schema-change/check_tables.t line 32.

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

Now I'm trying out pt-table-checksum instead of pt-online-schema-change, and I find these two problems:

$ bin/pt-table-checksum --tables sakila.film_actor h=127.1,P=3306,D=sakila,t=film_actor,u=root --lock-wait-timeout=50 --chunk-index PRIMARY:3 --chunk-size 200
06-10T22:43:39 Cannot checksum table sakila.film_actor: Use of uninitialized value in exists at bin/pt-table-checksum line 2305.

[baron@localhost stabilize-test-suite]$ bin/pt-table-checksum --tables sakila.film_actor h=127.1,P=3306,D=sakila,t=film_actor,u=root --lock-wait-timeout=50 --chunk-index PRIMARY:1 --chunk-size 200
06-10T22:43:54 Skipping chunk 1 of sakila.film_actor because MySQL used only 2 bytes of the PRIMARY index instead of 4. See the --[no]check-plan documentation for more information.
            TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
06-10T22:43:54 0 0 0 31 31 0.143 sakila.film_actor

The second problem is a bit subtle. I think it's because we might be using the entire index for the first-lower-boundary query?

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

Wrong number of tests:

t/pt-table-checksum/chunk_index...................ok 14/14# Looks like you planned 14 tests but ran 1 extra.
t/pt-table-checksum/chunk_index...................dubious
 Test returned status 255 (wstat 65280, 0xff00)
DIED. FAILED test 15
 Failed 1/14 tests, 92.86% okay

Changed in percona-toolkit:
status: In Progress → Fix Committed
Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

Still having trouble with pt-osc:

[baron@localhost stabilize-test-suite]$ bin/pt-online-schema-change --alter='engine=innodb' h=127.1,P=3306,D=sakila,t=film_actor,u=root --lock-wait-timeout=50 --chunk-index-columns 1 --execute
Altering `sakila`.`film_actor`...
Creating new table...
Created new table sakila._film_actor_new OK.
Altering new table...
Altered `sakila`.`_film_actor_new` OK.
Creating triggers...
Created triggers OK.
Copying approximately 5143 rows...
Dropping triggers...
Dropped triggers OK.
Dropping new table...
Dropped new table OK.
`sakila`.`film_actor` was not altered.
Error copying rows from `sakila`.`film_actor` to `sakila`.`_film_actor_new`: Use of uninitialized value in numeric lt (<) at bin/pt-online-schema-change line 6367.

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

Still having this pt-table-checksum problem too:

$ bin/pt-table-checksum --tables sakila.film_actor h=127.1,P=3306,D=sakila,t=film_actor,u=root --lock-wait-timeout=50 --chunk-index PRIMARY --chunk-size 200 --chunk-index-columns 1
06-10T23:52:40 Skipping chunk 1 of sakila.film_actor because MySQL used only 2 bytes of the PRIMARY index instead of 4. See the --[no]check-plan documentation for more information.
            TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
06-10T23:52:40 0 0 0 31 31 0.137 sakila.film_actor

Changed in percona-toolkit:
status: Fix Committed → In Progress
Changed in percona-toolkit:
status: In Progress → Fix Committed
status: Fix Committed → In Progress
Changed in percona-toolkit:
status: In Progress → Fix Committed
Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

Works for me now.

Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Marcos Albe (marcos-albe) wrote :

Seems code related to this bug fix has it's own bug?

./pt-online-schema-change --execute
--alter-foreign-keys-method=rebuild_constraints --password=root
--user=root --lock-wait-time=50 --alter="ADD COLUMN
lacolumna TINYINT NOT NULL DEFAULT 1"
D=thedb,t=thetable

I get this:
Error copying rows from `thedb`.`thetable` to
`thedb`.`_thetable_new`: Use of uninitialized value in numeric lt
(<) at ./pt-online-schema-change line 6519.

And code in 6519 is the following:

   # Ensure that MySQL is still using the entire index.
   # https://bugs.launchpad.net/percona-toolkit/+bug/1010232
   if ( !$nibble_iter->one_nibble()
        && $tbl->{key_len}
        && ($expl->{key_len} || 0) < $tbl->{key_len} ) {
      if ( !$tbl->{warned}->{key_len}++
           && $o->get('quiet') < 2 ) {
         die "Error copying rows at chunk " . $nibble_iter->nibble_number()
            . " of $tbl->{db}.$tbl->{tbl} because MySQL used "
            . "only " . ($expl->{key_len} || 0) . " bytes "
            . "of the " . ($expl->{key} || '?') . " index instead of "
            . $tbl->{key_len} . ". See the --[no]check-plan documentation "
            . "for more information.\n";
      }
   }

Brian confirmed 'quiet' has no default set.

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Marcos: I've made your report into bug 1022628.

Revision history for this message
Sheeri K. Cabral (awfief) wrote :

In relation to Baron's comment #8, I'm also having this problem:

# time /usr/bin/pt-table-checksum --user checksum --ask-pass --lock-wait-time=50 --chunk-index=PRIMARY --tables user_group_map
Enter MySQL password:
08-28T15:07:45 Skipping chunk 1 of bugs.user_group_map because MySQL used only 6 bytes of the user_group_map_user_id_idx index instead of 8. See the --[no]check-plan documentation for more information.
            TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
08-28T15:07:46 0 0 0 461 461 1.040 bugs.user_group_map

real 0m3.313s
user 0m0.465s
sys 0m0.042s

Here's what the SHOW CREATE TABLE looks like:
       Table: user_group_map
Create Table: CREATE TABLE `user_group_map` (
  `user_id` mediumint(9) NOT NULL,
  `group_id` mediumint(9) NOT NULL,
  `isbless` tinyint(4) NOT NULL DEFAULT '0',
  `grant_type` tinyint(4) NOT NULL DEFAULT '0',
  UNIQUE KEY `user_group_map_user_id_idx` (`user_id`,`group_id`,`grant_type`,`isbless`),
  KEY `fk_user_group_map_group_id_groups_id` (`group_id`),
  CONSTRAINT `fk_user_group_map_group_id_groups_id` FOREIGN KEY (`group_id`) REFERENCES `groups` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT `fk_user_group_map_user_id_profiles_userid` FOREIGN KEY (`user_id`) REFERENCES `profiles` (`userid`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

I thought "--chunk-index=PRIMARY" was supposed to make it use the PRIMARY key, not the other key?

Revision history for this message
Sheeri K. Cabral (awfief) wrote :

Oh, the version:
# /usr/bin/pt-table-checksum --version
pt-table-checksum 2.1.3

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-318

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.