pt-online-schema-change gives spurious error messages on failure if DROP TRIGGER also fails
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
New
|
Undecided
|
Unassigned |
Bug Description
If pt-osc fails during a "--alter-
...
Copying `db_prod`
2015-02-15T09:29:30 Copied rows OK.
2015-02-15T09:29:30 Drop-swapping tables...
2015-02-15T09:29:49 Dropping triggers...
2015-02-15T09:30:08 Error dropping trigger: 2015-02-15T09:30:08 DBD::mysql::db do failed: Lock wait timeout exceeded; try restarting transaction at /usr/bin/
2015-02-15T09:30:27 Error dropping trigger: 2015-02-15T09:30:27 DBD::mysql::db do failed: Lock wait timeout exceeded; try restarting transaction at /usr/bin/
2015-02-15T09:30:46 Error dropping trigger: 2015-02-15T09:30:46 DBD::mysql::db do failed: Lock wait timeout exceeded; try restarting transaction at /usr/bin/
2015-02-15T09:30:46 To try dropping the triggers again, execute:
DROP TRIGGER IF EXISTS `db_prod`
...
Not dropping the new table `db_prod`
DROP TRIGGER IF EXISTS `db_prod`
...
These triggers must be dropped before dropping `db_prod`
Altered `db_prod`.`tblname` but there were errors or warnings.
Error updating foreign key constraints: 2015-02-15T09:29:49 DBD::mysql::db do failed: Lock wait timeout exceeded; try restarting transaction at /usr/bin/
The output suggests that pt-osc only failed while trying to drop triggers, but the old and new tables are still in place, and the drop and rename are supposed to happen before the dropping of triggers. Based on the last message about line 10018 one can tell 'DROP TABLE tblname' failed and pt-osc immediately went into cleanup mode by (attempting to) drop triggers.
I was able to repeat a slightly different failure getting a read lock on the old table during the copy process and causing it to time out and the drop triggers to also fail. The log was:
2015-02-16T14:41:29 Copying approximately 2164 rows...
2015-02-16T14:42:01 Dropping triggers...
2015-02-16T14:51:01 Error dropping trigger: 2015-02-16T14:51:01 DBD::mysql::db do failed: Lock wait timeout exceeded; try restarting transaction [for Statement "DROP TRIGGER IF EXISTS `test`.
...
2015-02-16T15:09:01 To try dropping the triggers again, execute:
DROP TRIGGER IF EXISTS `test`.
...
Again it looks like it failed because it was trying to drop triggers and couldn't, but it failed before it could copy data first.
So it looks like the pattern is that the failure happens at the last step before "Dropping triggers...", but this isn't obvious from the output or the final error message.
Is it possible to print an error message at the point of pt-osc failing, or to tell at which step it occurred in the final line's error message, even when the cleanup also fails?
This was found in versions 2.2.7 and 2.2.13.
Original command: pt-online-
Note this is unlikely related to drop_swap method or foreign keys, they just existed in the examples.
description: | updated |
summary: |
- pt-online-schema-change gives spurious error messages on failure + pt-online-schema-change gives spurious error messages on failure if DROP + TRIGGER also fails |
Changed in percona-toolkit: | |
status: | Incomplete → New |
Unable to reproduce the same with 2.2.13.
nilnandan@ desktop: ~$ pt-online- schema- change -uroot -proot --max-load Threads_running=50 --critical-load Threads_running=75 --set-vars lock_wait_timeout=1 --alter "drop column city" --alter- foreign- keys-method= drop_swap --execute D=nil,t=test1 foreign_ keys, 10, 1 desktop: ~$
No slaves found. See --recursion-method if host desktop has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
copy_rows, 10, 0.25
create_triggers, 10, 1
drop_triggers, 10, 1
swap_tables, 10, 1
update_
Child tables:
`nil`.`test2` (approx. 14 rows)
Will use the drop_swap method to update foreign keys.
Altering `nil`.`test1`...
Creating new table...
Created new table nil._test1_new OK.
Altering new table...
Altered `nil`.`_test1_new` OK.
2015-03-02T09:25:13 Creating triggers...
2015-03-02T09:25:13 Created triggers OK.
2015-03-02T09:25:13 Copying approximately 1047066 rows...
2015-03-02T09:25:17 Copied rows OK.
2015-03-02T09:25:17 Drop-swapping tables...
2015-03-02T09:25:17 Dropped and swapped tables OK.
Not dropping old table because --no-drop-old-table was specified.
2015-03-02T09:25:17 Dropping triggers...
2015-03-02T09:25:17 Dropped triggers OK.
Successfully altered `nil`.`test1`.
nilnandan@
Is it possible for you provide table structures/small data set to reproduce this? Also run it with PTDEBUG=1 and provide us the output.
i.e schema- change -u... -p... --max-load ....
PTDEBUG=1 pt-online-