pt-online-schema-change truncates recipient tables in XtraDB Cluster with Galera

Bug #928961 reported by Patrick Zoblisein
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Invalid
Undecided
Unassigned

Bug Description

Hi,

I have a 3-node Percona XtraDB Cluster with Galera - 5.5.17-22.1-log with wsrep_OSU_method='TOI'.

Was testing pt-online-schema-change (v2.0.3) and it works on the "primary" node (node1) but it truncates / deletes all the data from the table on the recipient nodes (node2 and node3). Schema *IS* changed successfully on all 3 nodes.

From node1:
mysql> select count(*) from n126db1.t1;
+----------+
| count(*) |
+----------+
| 7417938 |
+----------+
1 row in set (1.62 sec)
(This count matches on node2 and node3).

./pt-online-schema-change h=localhost,D=n126db1,t=t1 --alter "ADD COLUMN c10 varchar(10)" --ask-pass --drop-old-table
# 2012-02-08T18:01:02 ./pt-online-schema-change started
Enter password:
# 2012-02-08T18:01:06 USE `n126db1`
# 2012-02-08T18:01:06 Alter table t1 using temporary table __tmp_t1
# 2012-02-08T18:01:06 Checking if table t1 can be altered
# 2012-02-08T18:01:06 SHOW TRIGGERS FROM `n126db1` LIKE 't1'
# 2012-02-08T18:01:06 Table t1 can be altered
# 2012-02-08T18:01:06 Chunk column id, index PRIMARY
# 2012-02-08T18:01:06 Chunked table t1 into 7416 chunks
# 2012-02-08T18:01:06 Starting online schema change
# 2012-02-08T18:01:06 CREATE TABLE `n126db1`.`__tmp_t1` LIKE `n126db1`.`t1`
# 2012-02-08T18:01:06 ALTER TABLE `n126db1`.`__tmp_t1` ADD COLUMN c10 varchar(10)
# 2012-02-08T18:01:06 Shared columns: id, hostname, port, instime, c5
# 2012-02-08T18:01:06 Calling OSCCaptureSync::capture()
# 2012-02-08T18:01:06 CREATE TRIGGER mk_osc_del AFTER DELETE ON `n126db1`.`t1` FOR EACH ROW DELETE IGNORE FROM `n126db1`.`__tmp_t1` WHERE `n126db1`.`__tmp_t1`.id = OLD.id
# 2012-02-08T18:01:06 CREATE TRIGGER mk_osc_upd AFTER UPDATE ON `n126db1`.`t1` FOR EACH ROW REPLACE INTO `n126db1`.`__tmp_t1` (id, hostname, port, instime, c5) VALUES (NEW.id, NEW.hostname, NEW.port, NEW.instime, NEW.c5)
# 2012-02-08T18:01:06 CREATE TRIGGER mk_osc_ins AFTER INSERT ON `n126db1`.`t1` FOR EACH ROW REPLACE INTO `n126db1`.`__tmp_t1` (id, hostname, port, instime, c5) VALUES(NEW.id, NEW.hostname, NEW.port, NEW.instime, NEW.c5)
# 2012-02-08T18:01:07 Calling CopyRowsInsertSelect::copy()
Copying rows: 15% 02:40 remain
Copying rows: 33% 02:00 remain
Copying rows: 51% 01:26 remain
Copying rows: 68% 00:54 remain
Copying rows: 85% 00:24 remain
# 2012-02-08T18:04:01 Calling OSCCaptureSync::sync()
# 2012-02-08T18:04:01 Renaming tables
# 2012-02-08T18:04:01 RENAME TABLE `n126db1`.`t1` TO `n126db1`.`__old_t1`, `n126db1`.`__tmp_t1` TO `n126db1`.`t1`
# 2012-02-08T18:04:01 Original table t1 renamed to __old_t1
# 2012-02-08T18:04:01 Calling CopyRowsInsertSelect::cleanup()
# 2012-02-08T18:04:01 Calling OSCCaptureSync::cleanup()
# 2012-02-08T18:04:01 DROP TRIGGER IF EXISTS `n126db1`.`mk_osc_del`
# 2012-02-08T18:04:01 DROP TRIGGER IF EXISTS `n126db1`.`mk_osc_ins`
# 2012-02-08T18:04:01 DROP TRIGGER IF EXISTS `n126db1`.`mk_osc_upd`
# 2012-02-08T18:04:01 DROP TABLE `n126db1`.`__old_t1`
# 2012-02-08T18:04:02 ./pt-online-schema-change ended, exit status 0

PTDEBUG output file is attached from a second test - row count is 2 million less and column name being added was different than above (c11).
PTDEBUG=1 ./pt-online-schema-change h=localhost,u=root,p-password,D=n126db1,t=t1 --alter "ADD COLUMN c11 varchar(10)" --drop-old-table > pt-osc-debug.out_pzoblisein 2>&1

Thanks
Patrick

Revision history for this message
Patrick Zoblisein (patrickzoblisein) wrote :
tags: added: pt-online-schema-change risk
Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

I don't think it's clear that this tool is the right approach for schema changes in Galera/XtraDB Cluster.

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

pt-online-schema-change is not even tested with basic replication, as noted several times in its documentation. In the RISKS section: "This tool has not been tested with replication; it may break replication."

tags: added: galera xtradb-cluster
Changed in percona-toolkit:
status: New → Triaged
Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

This bug will probably go away after point #1 in http://www.percona.com/docs/wiki/ptdev:blueprints:redesign-pt-online-schema-change is implemented.

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Patrick, I think this bug was originally reported using pt-osc 2.0 or older. If that's true, have you tried pt-osc 2.1?

Revision history for this message
Patrick Zoblisein (patrickzoblisein) wrote :

Hi Daniel,

Have finally gotten around to looking at this again.

v2.1.2 does indeed seem to have fixed this truncation issue.

Performed a quick test on a 3 node cluster:
(Server version: 5.5.24-23.6-log Percona XtraDB Cluster (GPL) 5.5.24-23.6, Revision 342, wsrep_23.6.r342)

Node1 wsrep_OSU_method = TOI, node2 and node3 wsrep_OSU_method=RSU

Static test system with no Reads or Writes.

Next round of testing will be with a more active system.

Is pt-online-schema-change considered valid against XtraDB Cluster?

Thanks much
Patrick

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Patrick, thanks for reporting back. Glad to hear that 2.1.x is working so far. We're currently in the process of testing and "validating" all tools with PXC (Percona XtraDB Cluster), but at present pt-osc + PXC is "unknown". So be careful, and please blog your results. :-) I wish I could report more but, like I said, we're still working on pt-* + PXC as well as other technologies. Let's keep this bug open until the ultimate fate of pt-osc + PXC is known.

Changed in percona-toolkit:
status: Triaged → In Progress
assignee: nobody → Daniel Nichter (daniel-nichter)
Changed in percona-toolkit:
assignee: Daniel Nichter (daniel-nichter) → nobody
status: In Progress → Triaged
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Patrick, we've increased testing for tools and PXC and indeed pt-osc does work with PXC. However, you may want to watch bug 1079867 because there might be a bug with tables with foreign keys. As for this bug, as Baron mentioned, it was more than likely fixed in pt-osc 2.1, so I'll close this bug. Thanks for your feedback.

tags: added: pxc
removed: galera xtradb-cluster
Changed in percona-toolkit:
status: Triaged → Invalid
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-922

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.