pt-online-schema-change fails when using --no-drop-old-table after 10 times

Bug #1526105 reported by Agustín
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
Medium
Frank Cizmich

Bug Description

This is somewhat related to https://bugs.launchpad.net/percona-toolkit/+bug/1215587, I guess.

If you run `pt-osc [OPTIONS] --no-drop-old-table` several times over the same DB.table, the names grow indefinitely longer, until the tool errors out due to max length for the table name. Steps to reproduce follow below.

The problem here is that this is done after the new table has been created and the ALTER/copy from the to-be old table has happened already (see attached file pt-osc_t1.out), making the tool fail after it has done everything else.

If this can't be solved, we should at least check this before starting with the ALTER/copy phase.

Steps to reproduce:

1- Create a table

mysql test -e "CREATE TABLE t1 (id int primary key)"

2- Run pt-osc enough times, or use the following to simulate this

for i in $(seq 58); do UNDERSCORE="_"; UNDERSCOREITER="";
  for j in $(seq $i); do UNDERSCOREITER=$UNDERSCORE$UNDERSCOREITER;
  done;
mysql test -e "CREATE TABLE ${UNDERSCOREITER}t1_old (id int primary key)";
done;

3- Run pt-osc after there are enough _.*t1_old tables so that creating one with one more '_' will fail

PTDEBUG=1 pt-online-schema-change h=127.0.0.1,P=3306,u=root,D=test,t=t1 --alter "ADD COLUMN field1 int" --execute --no-drop-old-table > /tmp/pt-osc_t1.out 2> /tmp/pt-osc_t1.out.err

4- Check /tmp/pt-osc_t1.out to see how the tool run everything, but errored out after not being able to rename the table. Example:

[...]
Altering `test`.`t1`...
Creating new table...
Created new table test._t1_new OK.
Altering new table...
Altered `test`.`_t1_new` OK.
2015-12-14T23:40:20 Creating triggers...
2015-12-14T23:40:20 Created triggers OK.
2015-12-14T23:40:20 Copying approximately 1 rows...
2015-12-14T23:40:20 Copied rows OK.
2015-12-14T23:40:20 Analyzing new table...
2015-12-14T23:40:20 Swapping tables...
2015-12-14T23:40:20 Dropping triggers...
2015-12-14T23:40:20 Dropped triggers OK.
Not dropping the new table `test`.`_t1_new` because --swap-tables failed. To drop the new table, execute:
DROP TABLE IF EXISTS `test`.`_t1_new`;
`test`.`t1` was not altered.

Revision history for this message
Agustín (agustin-gallego) wrote :
Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

Confirmed exactly as described:

...
Altering `test`.`t1`...
Creating new table...
Created new table test._t1_new OK.
Altering new table...
Altered `test`.`_t1_new` OK.
2015-12-15T10:20:03 Creating triggers...
2015-12-15T10:20:03 Created triggers OK.
2015-12-15T10:20:03 Copying approximately 1 rows...
2015-12-15T10:20:03 Copied rows OK.
2015-12-15T10:20:03 Analyzing new table...
2015-12-15T10:20:03 Swapping tables...
2015-12-15T10:20:03 Dropping triggers...
2015-12-15T10:20:03 Dropped triggers OK.
Not dropping the new table `test`.`_t1_new` because --swap-tables failed. To drop the new table, execute:
DROP TABLE IF EXISTS `test`.`_t1_new`;
`test`.`t1` was not altered.

Changed in percona-toolkit:
status: New → Confirmed
Revision history for this message
Agustín (agustin-gallego) wrote :
Download full text (4.7 KiB)

After inspecting the code (sub swap_tables) I see that the title is wrong, since there is a length check in the table name:

if ( length($table_name) > 64 ) {
            my $truncated_table_name = substr($table_name, 0, 64);
            PTDEBUG && _d($table_name, 'is > 64 chars, truncating to',
                          $truncated_table_name);
            $table_name = $truncated_table_name;
         }

I'm not sure I understand the underlying problem yet, but we hardcode the number of tries to 10:

   my $name_tries = 10; # don't try forever
...
      while ( $name_tries-- ) {
...
      }

      # This shouldn't happen.
      # Here and in the attempt to find a new table name we probably ought to
      # use --tries (and maybe a Retry object?)
      die ts("Failed to find a unique old table name after "
         . "serveral attempts.\n");
   }

And then, for some reason, the "Try X of 10" instead of showing 1, 2, 3, etc, is always showing 1.

# cat /tmp/pt-osc_t1.out.err | grep "Try code did not succeed" | wc -l
10

# cat /tmp/pt-osc_t1_2.out.err | grep "Try"
# Retry:3745 24763 Try 1 of 10
# Retry:3762 24763 Try code succeeded
# Retry:3745 24763 Try 1 of 10
# Retry:3762 24763 Try code succeeded
# Retry:3745 24763 Try 1 of 10
# Retry:3762 24763 Try code succeeded
# Retry:3745 24763 Try 1 of 10
# Retry:3762 24763 Try code succeeded
# Retry:3745 24763 Try 1 of 10
# Retry:3762 24763 Try code succeeded
# Retry:3745 24763 Try 1 of 10
# Retry:3751 24763 Try code failed: DBD::mysql::db do failed: Table '_t1_old' already exists [for Statement "RENAME TABLE `test`.`t1` TO `test`.`_t1_old`, `test`.`__t1_new` TO `test`.`t1`"] at /usr/bin/pt-online-schema-change line 10134.
# Retry:3767 24763 Try code did not succeed
# Retry:3745 24763 Try 1 of 10
# Retry:3751 24763 Try code failed: DBD::mysql::db do failed: Table '__t1_old' already exists [for Statement "RENAME TABLE `test`.`t1` TO `test`.`__t1_old`, `test`.`__t1_new` TO `test`.`t1`"] at /usr/bin/pt-online-schema-change line 10134.
# Retry:3767 24763 Try code did not succeed
# Retry:3745 24763 Try 1 of 10
# Retry:3751 24763 Try code failed: DBD::mysql::db do failed: Table '___t1_old' already exists [for Statement "RENAME TABLE `test`.`t1` TO `test`.`___t1_old`, `test`.`__t1_new` TO `test`.`t1`"] at /usr/bin/pt-online-schema-change line 10134.
# Retry:3767 24763 Try code did not succeed
# Retry:3745 24763 Try 1 of 10
# Retry:3751 24763 Try code failed: DBD::mysql::db do failed: Table '____t1_old' already exists [for Statement "RENAME TABLE `test`.`t1` TO `test`.`____t1_old`, `test`.`__t1_new` TO `test`.`t1`"] at /usr/bin/pt-online-schema-change line 10134.
# Retry:3767 24763 Try code did not succeed
# Retry:3745 24763 Try 1 of 10
# Retry:3751 24763 Try code failed: DBD::mysql::db do failed: Table '_____t1_old' already exists [for Statement "RENAME TABLE `test`.`t1` TO `test`.`_____t1_old`, `test`.`__t1_new` TO `test`.`t1`"] at /usr/bin/pt-online-schema-change line 10134.
# Retry:3767 24763 Try code did not succeed
# Retry:3745 24763 Try 1 of 10
# Retry:3751 24763 Try code failed: DBD::mysql::db do failed: Table '______t1_old' already exists [for Statement "RENAME TABLE `test`.`t1` TO `test`.`____...

Read more...

Revision history for this message
Frank Cizmich (frank-cizmich) wrote :

The problem is not the table name length, since there actually is code that tests this and truncates the name if it exceeds 64 characters.
The issue arises when there is an accumulation of tables that have already have had their names extended, the code retries ten times to append an underscore, each time finding an old table with that number appended undersocres.

This was the result I got starting with a table with a a name of exactly 64 characters:

show tables:
__________test12345678901234567890123456789012345678901234567890
_________test123456789012345678901234567890123456789012345678901
________test1234567890123456789012345678901234567890123456789012
_______test12345678901234567890123456789012345678901234567890123
______test123456789012345678901234567890123456789012345678901234
_____test1234567890123456789012345678901234567890123456789012345
____test12345678901234567890123456789012345678901234567890123456
___test123456789012345678901234567890123456789012345678901234567
__test1234567890123456789012345678901234567890123456789012345678
_test12345678901234567890123456789012345678901234567890123456789
test123456789012345678901234567890123456789012345678901234567890

It ran correctly 10 times, each time truncating and then appending an underscore. At the tenth run it failed because it assumes something weird is going on when it can't find a viable name after ten variations.

The debug output Agustin posted is indeed wrong in that it always says "Try 1 of 10" , when it should be incrementing "2 of 10", "3 of 10" etc. That should be corrected but it's not a serious problem.

So the question might be, is it reasonable for the tool to fail after 10 tries in this scenario?

Or maybe, should the tool create a quasi-unique random identifier, such as "4G6N8O_test" ?

Changed in percona-toolkit:
assignee: nobody → Frank Cizmich (frank-cizmich)
Revision history for this message
Agustín (agustin-gallego) wrote :

Hi Frank!

My take on this is that what is important here is that the tool does not fail at this point, so we should either:

- check before the "alter" is done (and lock the table name in some way for using it at this stage), or
- try 10 times with underscores, and if this fails, generate a quasi-unique random name (try as many times as needed)

Changed in percona-toolkit:
status: Confirmed → In Progress
milestone: none → 2.2.17
Revision history for this message
Frank Cizmich (frank-cizmich) wrote :

Agustín.

Ok. We're going for the second option. Try 10 times and if still no unique name use a random prefix.
This has the advantage of mostly preserving old behavior while still addressing this edge case.

PD: I changed the title of the issue to more accurately explain the problem.

summary: - pt-online-schema-change does not check maximum name length when creating
- tables
+ pt-online-schema-change fails when using --no-drop-old-table after 10
+ times
tags: added: pt-online-schema-change
Changed in percona-toolkit:
status: In Progress → Fix Committed
importance: Undecided → Medium
Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-698

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.