pt-online-schema-change --alter-foreign-keys-method=drop_swap can indirectly cause slave errors
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
Triaged
|
Medium
|
Carlos Salguero | ||
2.1 |
Won't Fix
|
Undecided
|
Unassigned | ||
2.2 |
Triaged
|
Medium
|
Carlos Salguero |
Bug Description
pt-online-
DROP TABLE parent;
UPDATE child SET fk_col="foo"
RENAME TABLE _parent_new TO parent
The 2nd query will cause a slave error like:
Last_SQL_Errno: 1452
Last_SQL_Error: Error 'Cannot add or update a child row: a foreign key constraint fails ... FOREIGN KEY (`fk_col`) REFERENCES `parent` (`fk_col`) on query. ... Query: 'UPDATE child SET fk_col="foo"'
Because DROP causes and implicit commit, and because it took awhile, the UPDATE was able to execute between it and the RENAME.
As the docs say about drop_swap:
"""
This method is faster and does not block, but it is riskier for two reasons. First, for a short time between dropping the original table and renaming the temporary table, the table to be altered simply does not exist, and queries against it will result in an error. Secondly, if there is an error and the new table cannot be renamed into the place of the old one, then it is too late to abort, because the old table is gone permanently.
"""
However, the situation is worse with replication and slaves because it can indirectly result in a slave error.
One solution is to lock all child tables before the DROP TABLE then unlock them after the RENAME. Consequently, all child tables are locked for however long the DROP takes. If the DROP takes a long time, then the purpose of drop_swap is defeated and --alter-
We just ran into this yesterday. Quite a mess to clean up manually using pt-table-sync. Do you think this is something that can get fixed within pt-osc?