pt-table-sync does not use the character set for the table it is synchronizing
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Percona Toolkit moved to https://jira.percona.com/projects/PT |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
pt-table-sync 2.2.12
Percona Server 5.5.24
pt-table-sync does not pick up the character set from the table definition that it is using and instead uses the character set that is specified by the client. This can actually cause data to be modified on the master to the wrong character set. For example:
1. Create a utf8 table on the master
create table t1 (a int not null primary key, b varchar(200)) character set=utf8;
2. Insert utf8 data:
insert into t1 (a,b) values (1,'ᅢᄈ');
3. Remove the data on the slave only (sql_log_bin is only for master-master):
set sql_log_bin=0; delete from t1 where a=1;
4. Run pt-table-sync on the slave without specifying the character set:
pt-table-sync --print --sync-to-master h=localhost,
REPLACE INTO `test`.`t1`(`a`, `b`) VALUES ('1', '??') /*percona-toolkit src_db:test src_tbl:t1 src_dsn:
This will overwrite the data on the master with ??. Specifying --charset=utf8 fixes the problem for this table:
pt-table-sync --print --sync-to-master h=localhost,
REPLACE INTO `test`.`t1`(`a`, `b`) VALUES ('1', 'ᅢᄈ') /*percona-toolkit src_db:test src_tbl:t1 src_dsn:
Unfortunately that forces the same character set on all tables for that command which makes it harder to run on servers with multiple character sets. Ideally pt-table-sync would figure out what the character set the table in question is and generate the statements using the correct character set. To allow the output to contain multiple character sets the character set for non-default character set strings should be specified like this: _utf8'string' although I can see how this would be difficult for tables with multiple character sets and when output is captured in a file.
Able to reproduce the same with PS 5.6.21 and pt-table-sync 2.2.12
nilnandan@ Dell-XPS: ~$ pt-table-sync --version Dell-XPS: ~$
pt-table-sync 2.2.12
nilnandan@
mysql> create table t1 (a int not null primary key, b varchar(200)) character set=utf8;
Query OK, 0 rows affected (0.07 sec)
mysql> insert into t1 (a,b) values (1,'ᅢᄈ');
Query OK, 1 row affected (0.01 sec)
mysql> Dell-XPS: ~$ pt-table-sync --print --sync-to-master h=localhost, D=test, t=t1,S= /tmp/mysql_ sandbox20887. sock D=test, P=20886, S=/tmp/ mysql_sandbox20 887.sock, h=127.0. 0.1,t=t1 dst_db:test dst_tbl:t1 dst_dsn: D=test, S=/tmp/ mysql_sandbox20 887.sock, h=localhost, t=t1 lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:24865 user:nilnandan host:Dell-XPS*/; Dell-XPS: ~$
nilnandan@
REPLACE INTO `test`.`t1`(`a`, `b`) VALUES ('1', '??') /*percona-toolkit src_db:test src_tbl:t1 src_dsn:
nilnandan@
nilnandan@ Dell-XPS: ~$ pt-table-sync --print --sync-to-master h=localhost, D=test, t=t1,S= /tmp/mysql_ sandbox20887. sock --charset=utf8 A=utf8, D=test, P=20886, S=/tmp/ mysql_sandbox20 887.sock, h=127.0. 0.1,t=t1 dst_db:test dst_tbl:t1 dst_dsn: A=utf8, D=test, S=/tmp/ mysql_sandbox20 887.sock, h=localhost, t=t1 lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:24880 user:nilnandan host:Dell-XPS*/; Dell-XPS: ~$
REPLACE INTO `test`.`t1`(`a`, `b`) VALUES ('1', 'ᅢᄈ') /*percona-toolkit src_db:test src_tbl:t1 src_dsn:
nilnandan@