Percona Toolkit moved to https://jira.percona.com/projects/PT

Quoter (de)serialize UTF8 data fails on CentOS 5.6

Bug #932327 reported by Daniel Nichter on 2012-02-14

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Percona Toolkit moved to https://jira.percona.com/projects/PT	Invalid	Medium	Brian Fraser

Bug Description

CentOS 5.6, Perl 5.8.8:

not ok 46 - Serialize [ニ\,è,a][!!!!__!*`,`\]
# Failed test 'Serialize [ニ\,è,a][!!!!__!*`,`\]'
# in lib/Quoter.t at line 197.
# Structures begin differing at:
# $got->[0] = 'ã\,Ã¨,a'
# $expected->[0] = 'ニ\,è,a'

That's the test with UTF8 data. Maybe it's a false-positive due to how Perl 5.8 or CentOS 5.6 handles UTF8, but in any case the test is failing.

Tags:

Related branches

lp:~percona-toolkit-dev/percona-toolkit/possible-fix-bug-932327

lp:~percona-toolkit-dev/percona-toolkit/possible-fix-925781-932327

On hold for merging into lp:percona-toolkit/2.1

Percona Toolkit developers: Pending requested 2012-08-30

Daniel Nichter (daniel-nichter) on 2012-02-14

tags:

added: charset

Revision history for this message

Daniel Nichter (daniel-nichter) wrote on 2012-02-14:

DBD::mysql 3.0007
DBI 1.52

Daniel Nichter (daniel-nichter) on 2012-02-21

Changed in percona-toolkit:
status:	New → Confirmed
milestone:	none → 2.0.4

Daniel Nichter (daniel-nichter) on 2012-03-01

tags:

added: all-tools

Daniel Nichter (daniel-nichter) on 2012-03-06

Changed in percona-toolkit:
status:	Confirmed → In Progress

Revision history for this message

Daniel Nichter (daniel-nichter) wrote on 2012-03-06:

The root problem, iirc, is that DBD::mysql 3.x does not properly encode or set a flag for utf8 data (Brian knows the details). So utf8 goes into MySQL one ways and comes out another, hence the failing tests. DBD::mysql 4.x does not have this problem.

A working although not perfect solution is:

- if DBD::mysql::VERSION ge '4.000' then just quotemeta (the original code), no encode/decode because it's not needed because DBD::mysql 4+ and quotemeta work with utf8
- else (DBD::mysql 3.x): encode if value ($res) is_utf8 and then always decode the $part

So this solution really only applies to DBD::mysql 3.x with utf8 encoded strings. It seems to work because tests show that decoding a string even if it was not encoded and even if it's latin1 did not garble the string. There was debate whether this was reliable. Imo and based on my understanding of utf8, a latin1 string cannot be mistaken for utf8 because of the way utf8 uses leading and trailing bytes with special high-order bits. But, iirc, Brian thinks it is possible that just the right combination of latin1 chars could be mistaken for a utf8 char.

In any case, this seems to be the only simple, non-invasive solution for DBD::mysql 3.x and the tests work so I think it's worth trying. Plus, the current code (with only quotemeta) is clearly failing on DBD::mysql 3.x with utf8 strings, so even if this solution isn't perfect, it's slightly better.

tags:

added: dbd-mysql utf8

Revision history for this message

Daniel Nichter (daniel-nichter) wrote on 2012-03-06:

I'm going to untarget this from 2.0.4 because the issue it too subtle to fix easily. Baron noted: "I think the issue is that people can put binary data into what we think is a latin1 character. A latin1 character can't be mistaken for a utf8 character, but a lot of people put non-characters into their "character" columns."

Changed in percona-toolkit:
milestone:	2.0.4 → none

Daniel Nichter (daniel-nichter) on 2012-04-18

tags:

removed: utf8

Revision history for this message

Baron Schwartz (baron-xaprb) wrote on 2012-06-04:

See also revision 164 in http://bazaar.launchpad.net/~percona-toolkit-dev/percona-toolkit/stabilize-test-suite/

Revision history for this message

Daniel Nichter (daniel-nichter) wrote on 2013-03-13:

This is actually a non-issue of sorts: http://www.mysqlperformanceblog.com/2013/02/22/centos-5-users-your-utf-8-data-is-in-peril-with-perl/

Changed in percona-toolkit:
status:	In Progress → Invalid

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-24:

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-472

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.