pt-online-schema-change blocks on metadata locks

Reported by Ovais Tariq on 2013-02-02
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Percona Toolkit
High
Daniel Nichter

Bug Description

# pt-online-schema-change --version
pt-online-schema-change 2.1.8

The tool pt-online-schema-change when run against a MySQL 5.5 server can block indefinitely on metadata locks in certain conditions, rendering the table in accessible. The metadata locks need to be acquired at various stages of the online schema change, such as when creating triggers, or when renaming table, etc.

Please see the test cases below, they all involve the following table:
mysql [localhost] {msandbox} (test) > show create table test\G
*************************** 1. row ***************************
       Table: test
Create Table: CREATE TABLE `test` (
  `id` int(10) unsigned NOT NULL,
  `x` char(3) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `x` (`x`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

-- Test case for pt-online-schema-change blocking during trigger creation:
session1 > start transaction;
Query OK, 0 rows affected (0.00 sec)

session1 > select * from test;
+----+------+
| id | x |
+----+------+
| 2 | bar |
| 1 | foo |
+----+------+
2 rows in set (0.00 sec)

# pt-online-schema-change --alter "add column c char(32) default 'dummy_test'" --execute h=localhost,u=msandbox,p=msandbox,S=/tmp/mysql_sandbox5529.sock,D=test,t=test > pt-osc.out 2> pt-osc.err
# /usr/bin/perl 5.010001
# Linux ovaistariq-net 2.6.32-279.2.1.el6.x86_64 #1 SMP Fri Jul 20 01:55:29 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
# Arguments: _[--alter]_ _[add column c char(32) default 'dummy_test']_ _[--execute]_ _[h=localhost,u=msandbox,p=msandbox,S=/tmp/mysql_sandbox5529.sock,D=test,t=test]_
Altering `test`.`test`...
Creating new table...
Created new table test._test_new OK.
Altering new table...
Altered `test`.`_test_new` OK.
Creating triggers... <-- blocks here

session2 > show processlist;
+----+----------+-----------+------+---------+------+---------------------------------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+----------+-----------+------+---------+------+---------------------------------+------------------------------------------------------------------------------------------------------+
| 20 | msandbox | localhost | test | Sleep | 245 | | NULL |
| 21 | msandbox | localhost | test | Query | 0 | NULL | show processlist |
| 24 | msandbox | localhost | test | Query | 115 | Waiting for table metadata lock | CREATE TRIGGER `pt_osc_test_test_del` AFTER DELETE ON `test`.`test` FOR EACH ROW DELETE IGNORE FROM |
+----+----------+-----------+------+---------+------+---------------------------------+------------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

-- Test case for pt-online-schema-change blocking during the step where tables are swapped:
session1 > select sleep(3);start transaction;select * from test;
+----------+
| sleep(3) |
+----------+
| 0 |
+----------+
1 row in set (3.00 sec)

Query OK, 0 rows affected (0.00 sec)

+----+------+
| id | x |
+----+------+
| 2 | bar |
| 1 | foo |
+----+------+
2 rows in set (0.00 sec)

[root@ovaistariq-net msb_5_5_29]# pt-online-schema-change --alter "add column c char(32) default 'dummy_test'" --execute h=localhost,u=msandbox,p=msandbox,S=/tmp/mysql_sandbox5529.sock,D=test,t=test
Altering `test`.`test`...
Creating new table...
Created new table test._test_new OK.
Altering new table...
Altered `test`.`_test_new` OK.
Creating triggers...
Created triggers OK.
Copying approximately 2 rows...
Copied rows OK.
Swapping tables... <-- blocks here

session1 > show processlist;
+----+----------+-----------+------+---------+------+---------------------------------+---------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+----------+-----------+------+---------+------+---------------------------------+---------------------------------------------------------------------------------------+
| 20 | msandbox | localhost | test | Query | 0 | NULL | show processlist |
| 30 | msandbox | localhost | test | Query | 209 | Waiting for table metadata lock | RENAME TABLE `test`.`test` TO `test`.`_test_old`, `test`.`_test_new` TO `test`.`test` |
| 31 | msandbox | localhost | test | Query | 4 | Waiting for table metadata lock | select * from test |
+----+----------+-----------+------+---------+------+---------------------------------+---------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

Indefinite metadata locking can be prevented by setting the MySQL session variable lock_wait_timeout to a low value, however in that case pt-osc will simply fail:

# pt-online-schema-change --alter "add column c char(32) default 'dummy_test'" --set-vars "lock_wait_timeout=5" --execute h=localhost,u=msandbox,p=msandbox,S=/tmp/mysql_sandbox5529.sock,D=test,t=test > pt-osc.out 2> pt-osc.err
# /usr/bin/perl 5.010001
# Linux ovaistariq-net 2.6.32-279.2.1.el6.x86_64 #1 SMP Fri Jul 20 01:55:29 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
# Arguments: _[--alter]_ _[add column c char(32) default 'dummy_test']_ _[--set-vars]_ _[lock_wait_timeout=5]_ _[--execute]_ _[h=localhost,u=msandbox,p=msandbox,S=/tmp/mysql_sandbox5529.sock,D=test,t=test]_
Altering `test`.`test`...
Creating new table...
Created new table test._test_new OK.
Altering new table...
Altered `test`.`_test_new` OK.
Creating triggers...
Dropping triggers...
Dropped triggers OK.
Dropping new table...
Dropped new table OK.
`test`.`test` was not altered.

The ideal solution would be to let the user specify lock_wait_timeout via another variable metadata_lock_wait_timeout, and let the user specify the number of retries to be done when metadata locks cannot be taken immediately via another variable metadat_lock_retries perhaps.

I am also wondering if the pt-osc variable --lock-wait-timeout should actually be called innodb-lock-wait-timeout, because from the perspective of MySQL lock-wait-timeout and innodb-lock-wait-timeout are two different variables, and this can be confusing.

Please see this blog post for details on MDL:
http://www.mysqlperformanceblog.com/2013/02/01/implications-of-metadata-locking-changes-in-mysql-5-5/

Daniel Nichter (daniel-nichter) wrote :

Thanks Ovais. We've been testing on 5.5 and 5.6 for awhile, but not specifically for/against meta-locks. We'll look into this new feature of MySQL more closely.

tags: added: locking mysql-5.5 pt-online-schema-change
Changed in percona-toolkit:
milestone: none → 2.2.1
importance: Undecided → High
status: New → Triaged
Changed in percona-toolkit:
assignee: nobody → Daniel Nichter (daniel-nichter)
Ovais Tariq (ovais-tariq) wrote :

Thanks Daniel :)

Changed in percona-toolkit:
status: Triaged → In Progress
Daniel Nichter (daniel-nichter) wrote :

Yes, in 2.2 we can add --innodb-lock-wait-timeout and make each option set its respective var, i.e. --innodb-lock-wait-timeout sets innodb_lock_wait_timeout and --lock-wait-timeout would now set lock_wait_timeout.

However, I'm wondering if we should just enhance --set-vars. The reason for having a separate --lock-wait-timeout option at present is so that we can set it conditionally, because innodb_lock_wait_timeout is only dynamic with the InnoDB plugin. So there's a big code block around it that checks the current value and if the current value is <= the requested value (i.e. <= --lock-wait-timeout) then it doesn't do anything because a shorter timeout is ok.

It's special parsing and handling of --set-vars vs. another option, --innodb-lock-wait-timeout. I'll ask other people and think about it more.

summary: - pt-online-schema change blocks indefinitely on metadata locks, rendering
- the target table inaccessible
+ pt-online-schema-change blocks on metadata locks
Daniel Nichter (daniel-nichter) wrote :

We decided to remove --lock-wait-timeout and change how --set-vars works because we didn't want to introduce another one-off option (i.e. --innodb-lock-wait-timeout and swap the meaning of the two options to match their corresponding sys var).

--set-vars no longer has a "default: ...", which OptionParser automatically applies to the option if not specified on the command line. We could have done "default: wait_timeout=1000,innodb_lock_wait_timeout=1,lock_wait_timeout=1", but if the user wanted to set another var and did "--set-vars foo=bar", then that would clobber the entire default.

So the tool's defaults for --set-vars are now listed under MAGIC_set_vars, and the tool combines these plus whatever, if anything, the user gives on the command line, the user's vars override any tool defaults. Then it sets these vars one-by-one which is better than previous because "SET foo=1,bar=2" will not set bar if setting foo fails. Now failures are isolated.

Another change: --lock-wait-timeout (which used to set innodb_lock_wait_timeout) was magical to avoid warnings: if it failed to set and the current value was <= the requested value, then there was no warning because, the thinking was, "a lower innodb_lock_wait_timeout is always better". This is no longer the case: the tool always warns if it can't set a var with either the default or user-specified value because the user might want a high value for, say, lock_wait_timeout, so the tool waits awhile for metalocks to be released.

Not favoring smaller lock wait timeout values make more sense in combination with --retires because the user may want to wait awhile but only retry a few times. I.e. to wait for a total of 30s if --retries=3 and <some lock timeout value>=10. But if that lock timeout value was 5 and the tool quietly accepts that as better, then the user only get 15s total of waiting before the tool fails--not what they wanted.

Finally, I chose to warn instead of die on failure to set because usually the vars set just affect the tool's resiliency. So it's important to know that the user didn't get what they wanted, but the tool can still continue and still probably succeed. All the same, if it doesn't, the user may now know why: some vars weren't set like they wanted. Warning rather than dying is also necessary, else pt-osc and pt-table-checksum on MySQL 5.1 would always die failing to set innodb_lock_wait_timeout, and would require a manual set and restart of MySQL--no good.

Daniel Nichter (daniel-nichter) wrote :

So the specific fix for this bug has become:

=item --set-vars

type: Array

Set the MySQL variables in this comma-separated list of C<variable=value> pairs.

By default, the tool sets:

=for comment ignore-pt-internal-value
MAGIC_set_vars

   wait_timeout=10000
   innodb_lock_wait_timeout=1
   lock_wait_timeout=60

So pt-osc will set those vars to those values by default (except lock_wait_timeout for < MySQL 5.5). If you want a different value for the now removed --lock-wait-timeout, say, 10 instead of 1, then specify:

--set-vars innodb_lock_wait_timeout=10

That will only override the default innodb_lock_wait_timeout=1.

Ovais Tariq (ovais-tariq) wrote :

Hi Daniel,

Using --set-vars of course is much better and I agree with you, there is no need to introduce yet another variable now that I see your point.

I have one another request regarding the variable --retries. I see that currently the documentation says that it does the following:
"Retry a chunk this many times when there is a nonfatal error. Nonfatal errors are problems such as a lock wait timeout or the query being killed. This option applies to the data copy operation."

However, in the context of this bug report, I believe that the end user might want to retry metadata locking part many times as well, for example suppose a rename fails because of metadata locking and low value of lock_wait_timeout, then the user might want to have that retried x number of times. So I was wondering if we could have this retry behavior for more parts of the online-schema-change?

Daniel Nichter (daniel-nichter) wrote :

Ovais, yes, I'll apply --retries to spots where the tool can meta-block.

tags: added: mysql-5.6
Daniel Nichter (daniel-nichter) wrote :

=item --retries

type: int; default: 10

Retry critical operations and recover from non-fatal errors. The tool
retries these operations:

   Creating triggers
   Dropping triggers
   Copying chunks
   Swapping tables
   Rebuilding foreign key constraints

For creating and dropping triggers, the number of retries applies to each
C<CREATE TRIGGER> and C<DROP TRIGGER> statement for each trigger.
For copying chunks, the number of retries applies to each chunk, not the
entire table. For swapping tables, the number of retries usually applies
once because there is usually only one C<RENAME TABLE> statement.
For rebuilding foreign key constraints, the number of retries applies to
each statment (C<ALTER> statements for the C<rebuild_constraints>
L<"--alter-foreign-keys-method">; other statements for the C<drop_swap>
method).

The tool retries each operation if these errors occur:

   Lock wait timeout (innod_lock_wait_timeout and lock_wait_timeout)
   Deadlock found
   Query is killed (KILL QUERY <thread_id>)
   Connection is killed (KILL CONNECTION <thread_id)
   Lost connection to MySQL

In the case of lost and killed connections, the tool will automatically
reconnect.

To alter extremely busy tables, it may be necessary to increase L<"--retries">,
and also C<innodb_lock_wait_timeout> and (for MySQL 5.5 and newer)
C<lock_wait_timeout> by specifying higher values with L<"--set-vars">.

Failures and retries are recorded in the L<"--statistics">.

Changed in percona-toolkit:
status: In Progress → Fix Committed
Ovais Tariq (ovais-tariq) wrote :

Hi Daneil,

Is it also possible to configure the delay between retries. For example, if I set the number of retries to 10 and if the retry is done every second, and there is a blocking operation that completes in 30 seconds, then the retries would not be of much help. If however a retry-delay can be specified then it would be easy to get around this problem.

Ok, I had to change --retries to --tries ("retries" was misleading as it wasn't really the number of retries but actually the total number of tries), and enhance it thusly:

"""
=item --tries

type: array

How many times to try critical operations. If certain operations fail due
to non-fatal, recoverable errors, the tool waits and tries the operation
again. These are the operations that are retried, with their default number
of tries and wait time between tries (in seconds):

=for comment ignore-pt-internal-value
MAGIC_tries

   OPERATION TRIES WAIT
   =================== ===== ====
   create_triggers 10 1
   drop_triggers 10 1
   copy_rows 10 0.25
   swap_tables 10 1
   update_foreign_keys 10 1

To change the defaults, specify the new values like:

   --tries create_triggers:5:0.5,drop_triggers:5:0.5

That makes the tool try C<create_triggers> and C<drop_triggers> 2 times
with a 0.5 second wait between tries. So the format is:

   operation:tries:wait[,operation:tries:wait]

All three values must be specified.

...
"""

sheyda amini (sheyda-a) wrote :

@Daniel: Is this blocking fixed? I experienced the same thing using pt-online-schema for alter table and triggers basically locked the table and created an outage for us. database server percona 5.5.28

Sheyda, yes it's fixed in 2.2.1 which will be released sometime this week. Until then, you can download pt-osc 2.2 directly from http://bazaar.launchpad.net/~percona-toolkit-dev/percona-toolkit/2.2/view/head:/bin/pt-online-schema-change

sheyda amini (sheyda-a) wrote :

Thanks Daniel. I downloaded it, however the latest version on percona site is 2.1.9 and what your link takes me to is 2.1.8.

Is this one has some enhancements that percona 2.1.9 doesn't?

Thanks a lot.

sheyda amini (sheyda-a) wrote :

@Daniel,

Please read my question above and let me know if the version you are referring me to launchpad.net/~percona-toolkit-dev/percona-toolkit/2.2/view/head:/bin/pt-online-schema-change is more advanced than the existing percona 2.1.9?

Sheyda,

The link is correct, the version in the tool just hasn't been updated. The relevant new option is:

http://bazaar.launchpad.net/~percona-toolkit-dev/percona-toolkit/2.2/view/head:/bin/pt-online-schema-change#L10895

Or, just download the 2.2 preview: http://www.percona.com/downloads/TESTING/percona-toolkit/2.2.1/

sheyda amini (sheyda-a) wrote :

@Daniel: Thanks for the quick response.

Just a general question: Does Percona has any plan in the future releases to make the 'alter table' for add nullable columns to make it only a data dictionary change like Oracle and not copying to a temp table? This is a big problem for high transactional. 24/7 environments.

Thanks a lot!

Changed in percona-toolkit:
status: Fix Committed → Fix Released
sheyda amini (sheyda-a) wrote :

What is the link to latest?

sheyda amini (sheyda-a) wrote :

This process still created the metadata lock and I had to kill the process and cant get rod of triggers - reamin

IT IS OUR VOICE PRODUCTION - CAN U HELP PLEASE?

sheyda amini (sheyda-a) wrote :

Basically I had to revert the whole process and waited for an hour to drop the triggers which were hanging on metadata lock.

It took an hour just to drop the triggers.

This is really bad as it is the second time that we have encountered an outage using this tool for alter. I had tested this in my test server with no issue, but in Production in a high transaction env. this tool doesn't work at least with percona 5.5 version.

I am using percona 5.5.27.

I can provide you the command that I used.

Thanks.

Sheyda, I'm sorry you experienced locking issues, however a bug report is not the place for general support, especially urgent support. Percona has a 24x7x365 support: http://www.percona.com/mysql-support

If you have found and can reproduce a bug, please open a new bug with the command line, MySQL version, method to reproduce the bug, etc. and we'll be happy to fix it.

If you're having (meta)locking issues in general due to a high write workload, then you should contact support so someone can work with you, your system, and the tool in a timely and efficient fashion.

Thanks,

Daniel

sheyda amini (sheyda-a) wrote :

Daniel, Thanks. At this point if I want to reproduce the incident, I have to create another outage, which the mgmt will not go for it.

I have rolled back, however I want to introduce it as an existing bug, since the fix has not made a difference in the behavior of

meta(locking) issues. We don't have support with percona and at this point I am not trying to reproduce the error.

This tool works for me just fine in my test server, but does not work with high write transactions in production.

sheyda amini (sheyda-a) wrote :

Daniel,

I am going to use pt-online-schema-change to optimize a very large table which has lots of data deleted from. I did try it on a test server and seems to be working, but my worry is as in the past by inception of the triggers, the table got locked and in our environment which has high QPS , will generate an outage. Anything more to use this tool to avoid this scenario. I am running percona 5.5.34.

Thanks!

Sheyda, I think http://www.percona.com/forums/questions-discussions/percona-toolkit would be the best place to ask because Percona support staff watch the forms, as well as other non-Percona employee with experience with the tools. The short answer to your question is: there are no special options or anything in the tool to make it magically work on a very busy server. Maybe try really low lock wait timeouts and really high numbers of retries, but the tradeoff for doing this is that the tool make take a very long time to complete because it keeps being the victim of deadlocks and such. Maybe 5.6 online DDL is better in this case, but it's not perfect either. For example, it has no progress indicator so you're left wondering for a long time if it's working and when it will complete.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers