pt-table-sync does not use the character set for the table it is synchronizing

Bug #1400472 reported by Pim van der Wal
40
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Confirmed
Undecided
Unassigned

Bug Description

pt-table-sync 2.2.12
Percona Server 5.5.24

pt-table-sync does not pick up the character set from the table definition that it is using and instead uses the character set that is specified by the client. This can actually cause data to be modified on the master to the wrong character set. For example:

1. Create a utf8 table on the master
create table t1 (a int not null primary key, b varchar(200)) character set=utf8;

2. Insert utf8 data:
insert into t1 (a,b) values (1,'ᅢᄈ');

3. Remove the data on the slave only (sql_log_bin is only for master-master):
set sql_log_bin=0; delete from t1 where a=1;

4. Run pt-table-sync on the slave without specifying the character set:
pt-table-sync --print --sync-to-master h=localhost,D=test,t=t1
REPLACE INTO `test`.`t1`(`a`, `b`) VALUES ('1', '??') /*percona-toolkit src_db:test src_tbl:t1 src_dsn:D=test,P=3306,h=test-host-1,t=t1 dst_db:test dst_tbl:t1 dst_dsn:D=test,h=localhost,t=t1 lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:15724 user:root host:test-host-2*/;

This will overwrite the data on the master with ??. Specifying --charset=utf8 fixes the problem for this table:
pt-table-sync --print --sync-to-master h=localhost,D=test,t=t1 --charset=utf8
REPLACE INTO `test`.`t1`(`a`, `b`) VALUES ('1', 'ᅢᄈ') /*percona-toolkit src_db:test src_tbl:t1 src_dsn:A=utf8,D=test,P=3306,h=test-host-1,t=t1 dst_db:test dst_tbl:t1 dst_dsn:A=utf8,D=test,h=localhost,t=t1 lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:16095 user:root host:test-host-2*/;

Unfortunately that forces the same character set on all tables for that command which makes it harder to run on servers with multiple character sets. Ideally pt-table-sync would figure out what the character set the table in question is and generate the statements using the correct character set. To allow the output to contain multiple character sets the character set for non-default character set strings should be specified like this: _utf8'string' although I can see how this would be difficult for tables with multiple character sets and when output is captured in a file.

Revision history for this message
Nilnandan Joshi (nilnandan-joshi) wrote :

Able to reproduce the same with PS 5.6.21 and pt-table-sync 2.2.12

nilnandan@Dell-XPS:~$ pt-table-sync --version
pt-table-sync 2.2.12
nilnandan@Dell-XPS:~$

mysql> create table t1 (a int not null primary key, b varchar(200)) character set=utf8;
Query OK, 0 rows affected (0.07 sec)

mysql> insert into t1 (a,b) values (1,'ᅢᄈ');
Query OK, 1 row affected (0.01 sec)

mysql>
nilnandan@Dell-XPS:~$ pt-table-sync --print --sync-to-master h=localhost,D=test,t=t1,S=/tmp/mysql_sandbox20887.sock
REPLACE INTO `test`.`t1`(`a`, `b`) VALUES ('1', '??') /*percona-toolkit src_db:test src_tbl:t1 src_dsn:D=test,P=20886,S=/tmp/mysql_sandbox20887.sock,h=127.0.0.1,t=t1 dst_db:test dst_tbl:t1 dst_dsn:D=test,S=/tmp/mysql_sandbox20887.sock,h=localhost,t=t1 lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:24865 user:nilnandan host:Dell-XPS*/;
nilnandan@Dell-XPS:~$

nilnandan@Dell-XPS:~$ pt-table-sync --print --sync-to-master h=localhost,D=test,t=t1,S=/tmp/mysql_sandbox20887.sock --charset=utf8
REPLACE INTO `test`.`t1`(`a`, `b`) VALUES ('1', 'ᅢᄈ') /*percona-toolkit src_db:test src_tbl:t1 src_dsn:A=utf8,D=test,P=20886,S=/tmp/mysql_sandbox20887.sock,h=127.0.0.1,t=t1 dst_db:test dst_tbl:t1 dst_dsn:A=utf8,D=test,S=/tmp/mysql_sandbox20887.sock,h=localhost,t=t1 lock:1 transaction:1 changing_src:1 replicate:0 bidirectional:0 pid:24880 user:nilnandan host:Dell-XPS*/;
nilnandan@Dell-XPS:~$

Changed in percona-toolkit:
status: New → Confirmed
Revision history for this message
Vadim Surzhik (ironreality) wrote :

Can reproduce the bug with pt-table-sync 2.2.20 and Mysql 5.7.17.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-1256

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.