pt-online-schema-change should set session-level lock_wait_timeout for schema change operations

Reported by Geoffrey Anderson on 2013-01-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit
Undecided
Unassigned

Bug Description

Version: pt-online-schema-change 2.1.2
Mysqld version: Percona-Server-server-55-5.5.28-rel29.2.360.rhel5

Description:
When running pt-online-schema-change against a mysql instance, some of the operations it performs require having a table metadata lock. This cannot be granted until all locks (including read locks) are released on the table. Since operations in pt-online-schema change are affecting the schema of "live" tables, if it gets blocked waiting for a metadata lock, INSERT statements to the same table will be blocked and stack up behind it. If there's a long-running transaction (longer than 60 seconds, for instance), a busy mysql server will quickly stack up connections and potentially blow out its connection limit. I've observed that pt-online-schema-change requires an exclusive metadata lock when doing the following operations:
* creating triggers on source table that is being altered
* renaming the "new" and "old" tables after a schema change is complete

This is further exacerbated by the default setting of "lock_wait_timeout" being set to "31536000" (365 days). Since pt-online-schema-change obeys lock_wait_timeout and the fact that lock_wait_timeout is also a session-level variable, a suggested fix is to have pt-online-schema-change set a session level "lock_wait_timeout" to a relatively low number (say....60 seconds) and additionally have a parameter to change this value on the command-line.

Steps to reproduce:
1) Create a test table:
CREATE DATABASE test;
CREATE TABLE test2 (
  id int(10) unsigned not null auto_increment,
  c int(11) default null,
  primary key (id)
) engine=innodb auto_increment=5102195 default charset=latin1;

2) Start a transaction that will take a read lock on this table:
BEGIN;
USE test;
SELECT * FROM test2 LIMIT 1;

3) In a different window, begin concurrent inserts (these will insert and complete without being blocked by the transaction above):
for i in {1..1000}; do usleep 500; (mysql -S /var/lib/mysql/mysql.sock test -e "insert into test2 (c) VALUES(${i});" &) ; done

4) In a third window, while the concurrent inserts from above are running:
pt-online-schema-change --alter "add index idx_c (c)" --execute h=127.0.0.1,P=3306,u=root,p=testpass,D=test,t=test2 2>&1 | tee pt-online-schema-change.log

5) If you check the processlist on the server, you'll see inserts from step 2 stacking up behind the "CREATE TRIGGER" operation being performed by pt-online-schema-change in step 3.

Yes, the default value of lock_wait_timeout is 365 days.

However, to set any session level variable you can use --set-vars
(http://www.percona.com/doc/percona-toolkit/2.1/pt-online-schema-change.html )

Since lock_wait_timeout is also a session level variable, it
should do.

Daniel Nichter (daniel-nichter) wrote :

Where do you all see that the default for lock_wait_timeout is 365 days?

The tool sets lock wait timeout to 1s by default:

=item --lock-wait-timeout

type: int; default: 1

Set the session value of C<innodb_lock_wait_timeout>.

And for MySQL: http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_lock_wait_timeout

tags: added: pt-online-schema-change
Changed in percona-toolkit:
status: New → Triaged
tags: added: locking

@Raghavendra D Prabhu: Ah, that may actually provide a solution to what I'm seeing, I'll try that out and report back. Whether or not it works, I feel that the tool should perhaps be a little more proactive in this scenario and set the variable automatically since some of the operations the tool does can cause some nasty locking in a highly concurrent environment.

@Daniel Nichter: There are 2 "lock wait timeout" variables in play in the server. I experienced this problem on Percona Server 5.5.13-20 and found the following DEFINE in source is what "lock_wait_timeout" is set to: http://bazaar.launchpad.net/~percona-core/percona-server/5.5/view/head:/Percona-Server/sql/sql_const.h#L225

This variable doesn't seem to appear anywhere in the documentation. The other variable is "innodb_lock_wait_timeout" which is set to 50s by default (as per the documentation). I guess I'm curious if the tool is setting "innodb_lock_wait_timeout" and then watching "lock_wait_timeout" to determine when it should timeout.

I can confirm that setting the lock_wait_timeout to a low value using the '--set-vars' option works the expected behavior. I followed the same steps in the bug description and simply added "--set-vars 'lock_wait_timeout=5'" as a parameter to pt-online-schema-change. Output below:

ganderson@db01:~$ pt-online-schema-change --alter "add index idx_c (c)" --set-vars 'lock_wait_timeout=5' --execute h=127.0.0.1,P=3306,u=root,p=testpass,D=test,t=test2 2>&1| tee pt-online-schema-change.log
Altering `test`.`test2`...
Creating new table...
Created new table test._test2_new OK.
Altering new table...
Altered `test`.`_test2_new` OK.
Creating triggers...
Dropping triggers...
Dropped triggers OK.
Dropping new table...
Dropped new table OK.
`test`.`test2` was not altered.
Error creating triggers: DBD::mysql::db do failed: Lock wait timeout exceeded; try restarting transaction at /usr/bin/pt-online-schema-change line 7020.

I've also confirmed that setting "innodb_lock_wait_timeout" INSTEAD of "lock_wait_timeout" doesn't fix this issue. Is it possible/safe to have the tool default to setting the session level variable "lock_wait_timeout" to a relatively low number (e.g. 5 seconds or less)?

An additional thought -- using this workaround also more readily exposes bug #1093016 (https://bugs.launchpad.net/percona-toolkit/+bug/1093016) since some of these metadata-locking changes won't be retried (e.g. dropping triggers).

Daniel Nichter (daniel-nichter) wrote :

This is fixed in 2.2.1. See bug 1113301.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers