CONCAT() and numeric arguments

Bug #1389743 reported by Markus Falb
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Confirmed
Undecided
Unassigned

Bug Description

# rpm -q percona-toolkit
percona-toolkit-2.2.11-1.noarch

pt-table-checksum gives me false diffs and I think it's because of changed behaviour of CONCAT()
I have a master with MySQL 5.1 and a slave with MariaDB 5.5

...
mysql> select CRC32(concat(name)),CRC32(concat(id)),CRC32(concat(adresse)),CRC32(concat(id,name,adresse)) from ro_firma where id=14\G
*************************** 1. row ***************************
           CRC32(concat(name)): 3880936933
             CRC32(concat(id)): 2788221432
        CRC32(concat(adresse)): 563603919
CRC32(concat(id,name,adresse)): 11548701
1 row in set (0.00 sec)
...

...
MariaDB [rolloutdb]> select CRC32(concat(name)),CRC32(concat(id)),CRC32(concat(adresse)),CRC32(concat(id,name,adresse)) from ro_firma where id=14\G
*************************** 1. row ***************************
           CRC32(concat(name)): 3880936933
             CRC32(concat(id)): 2788221432
        CRC32(concat(adresse)): 563603919
CRC32(concat(id,name,adresse)): 2182846900
1 row in set (0.00 sec)
...

please note that the checksum of single fields is the same but the checksum of the 3 fields concatenated differs.
The MySQL manual tells

http://dev.mysql.com/doc/refman/5.5/en/string-functions.html#function_concat
...
Returns the string that results from concatenating the arguments. May have one or more arguments. If all arguments are nonbinary strings, the result is a nonbinary string. If the arguments include any binary strings, the result is a binary string. A numeric argument is converted to its equivalent string form. This is a nonbinary string as of MySQL 5.5.3. Before 5.5.3, it is a binary string; to avoid that and produce a nonbinary string, you can use an explicit type cast
...

Please see attached my attempt for a fix for pt-table-checksum. My idea basically was to enforce a binary string by casting concat_ws separator to binary. Maybe other tools needs a fix too, pt-table-sync for instance

Revision history for this message
Markus Falb (markus-falb) wrote :
Revision history for this message
Peiran Song (peiran-song) wrote :

Markus,

Is that possible for you to provide the data content for this row? Also do you get inconsistent crc32(concat()) values on other rows? I am asking because I didn't reproduce this problem in my test.

Master: PS 5.1.71rel14.9-log
Slave1: PS 5.5.36-34.2-rel34.2
Slave2: 5.5.40-MariaDB

Data:
> select * from t1 where id = 14\G
     id: 14
   name: percona
address: 400 West Main Street, Suite 204, Post Office Box 1126, Durham, North Carolina 27702 USA

Master > select CRC32(concat(name)),CRC32(concat(id)),CRC32(concat(address)),CRC32(concat(id,name,address)) from t1 where id=14\G
*************************** 1. row ***************************
           CRC32(concat(name)): 33349892
             CRC32(concat(id)): 2788221432
        CRC32(concat(address)): 2013723334
CRC32(concat(id,name,address)): 2614596401

Slave1 > select CRC32(concat(name)),CRC32(concat(id)),CRC32(concat(address)),CRC32(concat(id,name,address)) from t1 where id=14\G
*************************** 1. row ***************************
           CRC32(concat(name)): 33349892
             CRC32(concat(id)): 2788221432
        CRC32(concat(address)): 2013723334
CRC32(concat(id,name,address)): 2614596401

Slave2 > select CRC32(concat(name)),CRC32(concat(id)),CRC32(concat(address)),CRC32(concat(id,name,address)) from t1 where id=14\G
*************************** 1. row ***************************
           CRC32(concat(name)): 33349892
             CRC32(concat(id)): 2788221432
        CRC32(concat(address)): 2013723334
CRC32(concat(id,name,address)): 2614596401

Changed in percona-toolkit:
status: New → Incomplete
Revision history for this message
Markus Falb (markus-falb) wrote :

Hi,

create table t1 (id int, name varchar(100) charset latin1, adresse varchar(100)) default charset=utf8;
insert into t1 values(14, binary("LATIN SMALL LETTER U WITH DIAERESIS ü"),"test adress");

mysql> select * from t1\G
*************************** 1. row ***************************
     id: 1
   name: LATIN SMALL LETTER U WITH DIAERESIS ü
adresse: test adress
1 row in set (0.00 sec)

Have a look at the table structure and note that the name field is latin1. LATIN SMALL LETTER U WITH DIAERESIS is encoded as 0xC3BC in UTF-8. With latin1 this is interpreted as two characters ü which is LATIN CAPITAL LETTER A WITH TILDE and VULGAR FRACTION ONE QUARTER. Somehow the application did not handle it's input correctly, or the name field should not be latin1. It's ugly, I know. I did emulate this uglyness with the cast to binary in my INSERT.

now the crc32 sum differs

mysql> select crc32(concat(id,name,adresse)) as MySQL_51 from t1\G
*************************** 1. row ***************************
MySQL_51: 2119000944
1 row in set (0.00 sec)

MariaDB [test]> select crc32(concat(id,name,adresse)) as MariaDB_55 from t1\G
*************************** 1. row ***************************
MariaDB_55: 2051066414
1 row in set (0.00 sec)

Revision history for this message
Peiran Song (peiran-song) wrote :

Markus,

Thanks for the additional detail. Verified as described. The differences of converting numeric type to binary string prior to 5.5.3 vs non-binary string as of 5.5.3 and after indeed causes different results when concat with multi-bytes string values.

my.cnf
-------------
character-set-server=utf8
skip-character-set-client-handshake

In 5.1, concat(id, name) caused the name column to be converted to binary string:

> select concat(id, name) from t1;
+------------------------------------------+
| concat(id, name) |
+------------------------------------------+
| 14LATIN SMALL LETTER U WITH DIAERESIS ü |
+------------------------------------------+

In 5.5, concat(id, name) makes no cast on name column:
> select concat(id, name) from t1;
+--------------------------------------------+
| concat(id, name) |
+--------------------------------------------+
| 14LATIN SMALL LETTER U WITH DIAERESIS ü |
+--------------------------------------------+

Changed in percona-toolkit:
status: Incomplete → Confirmed
Revision history for this message
Steffen Pohlen (s-pohlen) wrote :

I ran into the same problem in a configuration of MySQL 5.1 Master and MySQL 5.5.41 Slave, using pt-table-checksum 2.2.12.
Thanks for reporting the bug, it brought a lot of clarity to a very confusing problem.

I tried a workaround by enforcing the "CAST ... AS char" mentioned here in the script:

< ? "$func(CONCAT_WS('$sep', " . join(', ', @{$cols->{select}}) . '))'
---
> ? "$func(CONCAT_WS('$sep', " . join(', ', 'CAST('.@{$cols->{select}}.' AS char)') . '))'

Use at your own risk. So far it seems to work. It's probably a rather brute force solution, but it allows to continue using pt-table-checksum in a configuration with before and after 5.5.3 master and slave without the false positives.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-1251

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.