pt-online-schema-change should reconnect to slaves
Bug #1402051 reported by
Frank Cizmich
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
Fix Released
|
Medium
|
Frank Cizmich |
Bug Description
pt-online-
This is problematic for very long running schema changes.
Some sort of fault tolerant behavior, optional or otherwise would be useful.
It should skip checks for slaves that don't respond and eventually include them again if they become available.
Changed in percona-toolkit: | |
status: | New → Triaged |
assignee: | nobody → Frank Cizmich (frank-cizmich) |
importance: | Undecided → Medium |
tags: | added: i49004 pt-online-schema-change |
Changed in percona-toolkit: | |
milestone: | none → 2.2.14 |
Changed in percona-toolkit: | |
milestone: | 2.2.14 → none |
Changed in percona-toolkit: | |
status: | Triaged → In Progress |
milestone: | none → 2.3.1 |
Changed in percona-toolkit: | |
status: | In Progress → Fix Committed |
summary: |
- [Feature] pt-osc fault tolerance if slave disconnects + pt-online-schema-change should try to reconnect to slaves |
summary: |
- pt-online-schema-change should try to reconnect to slaves + pt-online-schema-change should reconnect to slaves |
Changed in percona-toolkit: | |
milestone: | 2.3.1 → 2.2.16 |
Changed in percona-toolkit: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
I altered table on master server via pt-online- schema- change tool and killed mysqld on slave2 during p-osc tool is in progress to simulate slave network connectivity issues/mysqld disappeared I found that killing mysqld process on slave aborts the pt-osc tool and in result table is not altered no where neither master nor slave2.
root@master:~# ./pt-online- schema- change --execute --nodrop-old-table --alter "ADD COLUMN line_number VARCHAR(10) DEFAULT NULL" u=root, p=p3rc0na123, D=world= test &>> ptosc9.log
Found 2 slaves: foreign_ keys, 10, 1 .`pt_osc_ world_test_ del`; .`pt_osc_ world_test_ upd`; .`pt_osc_ world_test_ ins`; .`_test_ new`; .`_test_ new`: Lost connection to replica slave2 while attempting to get its lag (DBI connect( 'world; host=slave2; mysql_read_ default_ group=client' ,'root' ,...) failed: Can't connect to MySQL server on 'slave2' (111) at ./pt-online- schema- change line 2261)
slave2
slave1
Will check slave lag on:
slave2
slave1
Operation, tries, wait:
copy_rows, 10, 0.25
create_triggers, 10, 1
drop_triggers, 10, 1
swap_tables, 10, 1
update_
Altering `world`.`test`...
Creating new table...
Created new table world._test_new OK.
Altering new table...
Altered `world`.`_test_new` OK.
2014-12-11T15:59:35 Creating triggers...
2014-12-11T15:59:35 Created triggers OK.
2014-12-11T15:59:35 Copying approximately 58402 rows...
Not dropping triggers because the tool was interrupted. To drop the triggers, execute:
DROP TRIGGER IF EXISTS `world`
DROP TRIGGER IF EXISTS `world`
DROP TRIGGER IF EXISTS `world`
Not dropping the new table `world`.`_test_new` because the tool was interrupted. To drop the new table, execute:
DROP TABLE IF EXISTS `world`
`world`.`test` was not altered.
(in cleanup) 2014-12-11T15:59:44 Error copying rows from `world`.`test` to `world`
Not dropping triggers because the tool was interrupted. To drop the triggers, execute: .`pt_osc_ world_test_ del`; .`pt_osc_ world_test_ upd`; .`pt_osc_ world_test_ ins`; .`_test_ new`;
DROP TRIGGER IF EXISTS `world`
DROP TRIGGER IF EXISTS `world`
DROP TRIGGER IF EXISTS `world`
Not dropping the new table `world`.`_test_new` because the tool was interrupted. To drop the new table, execute:
DROP TABLE IF EXISTS `world`
`world`.`test` was not altered.
As you can see from the output that world.test is not altered. pt-osc behavior doesn't seems to be user friendly as tool aborted and failed because of temporal mysqld disappeared.