pt-online-schema-change creates temporal tables in Latin1 despite specifying UTF8 charset

Reported by Ivan Marcin on 2013-04-22
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit
Medium
Daniel Nichter

Bug Description

Attempting to use pt-online-schem-change to alter a table. the temporal table is created with this options:
 ENGINE=InnoDB AUTO_INCREMENT=5024719 DEFAULT CHARSET=latin1

So the data copy fails given the data attempted to be copied is in UTF8.

I'm trying to override the charset in the command line, and setting the defaults on server, specified for the temp table w/o success.

Command executed:
sudo pt-online-schema-change --alter "MODIFY email VARCHAR(128) CHARACTER SET utf8 COLLATE utf8_unicode_ci" D=flexd_production,t=users_benchmark_locale --alter-foreign-keys-method auto --defaults-file "./debian.cnf" --charset utf8 --execute --print

Error:
`flexd_production`.`users_benchmark_locale` was not altered.
 (in cleanup) Error copying rows from `flexd_production`.`users_benchmark_locale` to `flexd_production`.`_users_benchmark_locale_new`: Copying rows caused a MySQL error 1366:
    Level: Warning
     Code: 1366
  Message: Incorrect string value: '\xF1ejo@g...' for column 'email' at row 2498

I believe the fix is to set the temp copy table encoding to UTF8, the question is how if the charset parameter is being ignored.

tags: added: charset pt-online-schema-change
Changed in percona-toolkit:
status: New → Confirmed
milestone: none → 2.2.3
Brian Fraser (fraserbn) wrote :

--charset is, unfortunately, a bit of a misnomer -- it should really be called --connection-charset. I think the solution here is simply changing the --alter statement to

--alter "CHARACTER SET utf8, MODIFY email VARCHAR(128) CHARACTER SET utf8 COLLATE utf8_unicode_ci"

Which will modify the charset of the temp table to utf8.

Brian Fraser (fraserbn) on 2013-05-04
Changed in percona-toolkit:
status: Confirmed → Incomplete
Changed in percona-toolkit:
importance: Undecided → Medium
Changed in percona-toolkit:
status: Incomplete → In Progress
assignee: nobody → Daniel Nichter (daniel-nichter)
Daniel Nichter (daniel-nichter) wrote :

Ivan,

Brian's suggestion works: --alter "CHARACTER SET utf8, MODIFY email VARCHAR(128) CHARACTER SET utf8 COLLATE utf8_unicode_ci". This will make the charset of the new table utf8. I've tested and confirmed this.

From your original report, that the temp (i.e. the new) table is created with latin1 and that you want to do --alter "MODIFY email VARCHAR(128) CHARACTER SET utf8 COLLATE utf8_unicode_ci" leads me to believe that you have a latin1 table (i.e. SHOW CREATE TABLE lists DEFAULT ENGINE=latin1) but you want to make a single column utf8. There are two issues here:

1) pt-osc creates the new table with the original table's charset--latin1 in this case. I've also tested and confirmed this. This is how it should work.

2) Having mixed charsets is complicated, if even possible. If you can make the whole table utf8 as per Brian's suggestion, or --alter "CONVERT TO CHARACTER SET charset_name" should work too (see http://dev.mysql.com/doc/refman/5.1/en/alter-table.html), then --charset=utf8 will also be appropriate. Otherwise, if the table is latin1 but a single column is utf8 (e.g. the email column in this case), then I'm not sure what --charset should be: latin1 or utf8? Any other program or tool using this database will probably have the same problem. I don't know if there's a solution to this; maybe http://mysql.rjweb.org/doc.php/charcoll has some insights.

In any case, I don't see a bug here, just a careful use of mixed or conversion of charsets that you'll need to research. If I'm wrong and you can demonstrate that the tool isn't doing something it should, let me know and we'll re-open this issue.

Changed in percona-toolkit:
status: In Progress → Invalid
milestone: 2.2.4 → none
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers