pt-table-checksum checksum error when default-character-set = utf8

Reported by Heather Sullivan on 2012-02-02
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit
Undecided
Unassigned

Bug Description

Received Error executing checksum query messages on several tables. Was working successfully from a remote server but not on localhost.

Failing hosts had "default-character-set = utf8" defined
Once "loose-default-character-set = utf8" was used, toolkit worked successfully on localhost.

-bash-3.2$ pt-table-checksum --version
pt-table-checksum 2.0.2

mysql> select version();
+------------+
| version() |
+------------+
| 5.5.17-log |
+------------+
1 row in set (0.00 sec)

-bash-3.2$ pt-table-checksum h=c1-dbdev103,u=perconaadm --ask-pass --recursion-method=hosts
Enter MySQL password:
            TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
02-02T13:47:50 0 0 3 1 0 1.018 HMS_CampaignAPI.DATABASECHANGELOG
02-02T13:47:51 0 0 1 1 0 0.279 HMS_CampaignAPI.DATABASECHANGELOGLOCK
02-02T13:47:51 Error checksumming table HMS_CampaignAPI.campaign: Error executing checksum query: DBD::mysql::st execute failed: Incorrect string value: '\xB1T\\xAA\-...' for column 'lower_boundary' at row 96 [for Statement "REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?, ?, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `campaign_buuid`, `marketing_plan_buuid`, `account_id`, `name`, `description`, `input_source`, `is_default`, `current_status`, `created_at`, `updated_at`, `campaign_type`, CONCAT(ISNULL(`name`), ISNULL(`description`), ISNULL(`input_source`), ISNULL(`is_default`), ISNULL(`created_at`), ISNULL(`updated_at`), ISNULL(`campaign_type`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `HMS_CampaignAPI`.`campaign` FORCE INDEX(`PRIMARY`) WHERE ((`campaign_buuid` >= ?)) AND ((`campaign_buuid` <= ?)) /*checksum chunk*/" with ParamValues: 6='..±Tª-K..]â#r.z.', 4='\.\.\±T\ª\-K\.\.\]\â\#r\.z\.', 1='campaign', 3='PRIMARY', 0='HMS_CampaignAPI', 7='.±t.TMh..Ê..+n4', 2=1, 5='\.\±t\\.TMh\.\.\Ê\.\.\+n4'] at /usr/bin/pt-table-checksum line 6830, <STDIN> line 1.

02-02T13:47:51 1 0 0 1 0 0.004 HMS_CampaignAPI.campaign
02-02T13:47:51 Error checksumming table HMS_CampaignAPI.campaign_activity: Error executing checksum query: DBD::mysql::st execute failed: Incorrect string value: '\x8A\\xEB\\xA1\...' for column 'lower_boundary' at row 96 [for Statement "REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?, ?, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `campaign_activity_buuid`, `campaign_buuid`, `account_id`, `name`, `description`, `activity_type`, `document_collection_id`, `assignment_list_id`, `input_source`, `source_name`, `provider_name`, `start_at`, `end_at`, `time_zone`, `is_default`, `current_status`, `created_at`, `updated_at`, CONCAT(ISNULL(`name`), ISNULL(`description`), ISNULL(`document_collection_id`), ISNULL(`assignment_list_id`), ISNULL(`input_source`), ISNULL(`source_name`), ISNULL(`provider_name`), ISNULL(`start_at`), ISNULL(`end_at`), ISNULL(`time_zone`), ISNULL(`is_default`), ISNULL(`created_at`), ISNULL(`updated_at`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `HMS_CampaignAPI`.`campaign_activity` FORCE INDEX(`PRIMARY`) WHERE ((`campaign_activity_buuid` >= ?)) AND ((`campaign_activity_buuid` <= ?)) /*checksum chunk*/" with ParamValues: 6='..ë¡ÒO..æù©cf. ', 4='\.\.\\ë\¡\ÒO\.\.\æ\ù\©cf\.\ ', 1='campaign_activity', 3='PRIMARY', 0='HMS_CampaignAPI', 7='.wyÎKíI..Òþî.. ', 2=1, 5='\.wy\ÎK\íI\.\.\Ò\þ\î\\.\.\ '] at /usr/bin/pt-table-checksum line 6830, <STDIN> line 1.

02-02T13:47:51 1 0 0 1 0 0.003 HMS_CampaignAPI.campaign_activity
02-02T13:47:51 0 0 0 1 0 0.012 HMS_CampaignAPI.campaign_activity_exec_list_ids
02-02T13:47:51 0 0 0 1 0 0.011 HMS_CampaignAPI.campaign_activity_properties
02-02T13:47:51 Error checksumming table HMS_CampaignAPI.campaign_activity_status_history: Error executing checksum query: DBD::mysql::st execute failed: Incorrect string value: '\x8A\\xEB\\xA1\...' for column 'lower_boundary' at row 96 [for Statement "REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?, ?, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `campaign_activity_buuid`, `created_at`, `status_code`, `change_source`, CONCAT(ISNULL(`change_source`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `HMS_CampaignAPI`.`campaign_activity_status_history` FORCE INDEX(`PRIMARY`) WHERE ((`campaign_activity_buuid` > ?) OR (`campaign_activity_buuid` = ? AND `created_at` >= ?)) AND ((`campaign_activity_buuid` < ?) OR (`campaign_activity_buuid` = ? AND `created_at` <= ?)) /*checksum chunk*/" with ParamValues: 6='..ë¡ÒO..æù©cf. ', 11='2011-12-12 15:53:11', 3='PRIMARY', 7='..ë¡ÒO..æù©cf. ', 9='.wyÎKíI..Òþî.. ', 2=1, 8='2011-12-08 17:08:33', 1='campaign_activity_status_history', 4='\.\.\\ë\¡\ÒO\.\.\æ\ù\©cf\.\ ,\.\.\\ë\¡\ÒO\.\.\æ\ù\©cf\.\ ,2011\-12\-08\ 17\:08\:33', 0='HMS_CampaignAPI', 10='.wyÎKíI..Òþî.. ', 5='\.wy\ÎK\íI\.\.\Ò\þ\î\\.\.\ ,\.wy\ÎK\íI\.\.\Ò\þ\î\\.\.\ ,2011\-12\-12\ 15\:53\:11'] at /usr/bin/pt-table-checksum line 6830, <STDIN> line 1.

02-02T13:47:51 1 0 0 1 0 0.003 HMS_CampaignAPI.campaign_activity_status_history
02-02T13:47:51 Error checksumming table HMS_CampaignAPI.campaign_properties: Error executing checksum query: DBD::mysql::st execute failed: Incorrect string value: '\xB1T\\xAA\-...' for column 'lower_boundary' at row 96 [for Statement "REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?, ?, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `campaign_buuid`, `property_name`, `property_value`)) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `HMS_CampaignAPI`.`campaign_properties` FORCE INDEX(`PRIMARY`) WHERE ((`campaign_buuid` > ?) OR (`campaign_buuid` = ? AND `property_name` >= ?)) AND ((`campaign_buuid` < ?) OR (`campaign_buuid` = ? AND `property_name` <= ?)) /*checksum chunk*/" with ParamValues: 6='..±Tª-K..]â#r.z.', 11='TEMPLATE_COLLECTION_ID', 3='PRIMARY', 7='..±Tª-K..]â#r.z.', 9='.±t.TMh..Ê..+n4', 2=1, 8='TEMPLATE_COLLECTION_ID', 1='campaign_properties', 4='\.\.\±T\ª\-K\.\.\]\â\#r\.z\.,\.\.\±T\ª\-K\.\.\]\â\#r\.z\.,TEMPLATE_COLLECTION_ID', 0='HMS_CampaignAPI', 10='.±t.TMh..Ê..+n4', 5='\.\±t\\.TMh\.\.\Ê\.\.\+n4,\.\±t\\.TMh\.\.\Ê\.\.\+n4,TEMPLATE_COLLECTION_ID'] at /usr/bin/pt-table-checksum line 6830, <STDIN> line 1.

02-02T13:47:51 1 0 0 1 0 0.003 HMS_CampaignAPI.campaign_properties
02-02T13:47:51 Error checksumming table HMS_CampaignAPI.campaign_status_history: Error executing checksum query: DBD::mysql::st execute failed: Incorrect string value: '\xB1T\\xAA\-...' for column 'lower_boundary' at row 96 [for Statement "REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT ?, ?, ?, ?, ?, ?, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `campaign_buuid`, `created_at`, `status_code`, `change_source`, CONCAT(ISNULL(`change_source`)))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `HMS_CampaignAPI`.`campaign_status_history` FORCE INDEX(`PRIMARY`) WHERE ((`campaign_buuid` > ?) OR (`campaign_buuid` = ? AND `created_at` >= ?)) AND ((`campaign_buuid` < ?) OR (`campaign_buuid` = ? AND `created_at` <= ?)) /*checksum chunk*/" with ParamValues: 6='..±Tª-K..]â#r.z.', 11='2011-12-13 08:23:05', 3='PRIMARY', 7='..±Tª-K..]â#r.z.', 9='Nü}Á¥Lþ£Ï½pCrÓM', 2=1, 8='2011-12-12 23:38:08', 1='campaign_status_history', 4='\.\.\±T\ª\-K\.\.\]\â\#r\.z\.,\.\.\±T\ª\-K\.\.\]\â\#r\.z\.,2011\-12\-12\ 23\:38\:08', 0='HMS_CampaignAPI', 10='Nü}Á¥Lþ£Ï½pCrÓM', 5='\N\ü\}\Á\¥L\þ\£\Ï\½pCr\ÓM,\N\ü\}\Á\¥L\þ\£\Ï\½pCr\ÓM,2011\-12\-13\ 08\:23\:05'] at /usr/bin/pt-table-checksum line 6830, <STDIN> line 1.

02-02T13:47:51 1 0 0 1 0 0.003 HMS_CampaignAPI.campaign_status_history
02-02T13:47:51 0 0 0 1 0 0.259 HMS_CampaignAPI.marketing_plan
02-02T13:48:56 0 0 0 1 0 0.005 mysql.columns_priv
02-02T13:48:56 0 0 13 1 0 0.004 mysql.db
02-02T13:48:56 0 0 0 1 0 0.006 mysql.event
02-02T13:48:56 0 0 0 1 0 0.007 mysql.func
02-02T13:48:56 0 0 38 1 0 0.005 mysql.help_category
02-02T13:48:56 0 0 453 1 0 0.006 mysql.help_keyword
02-02T13:48:56 0 0 992 1 0 0.006 mysql.help_relation
02-02T13:48:56 0 0 505 1 0 0.009 mysql.help_topic
02-02T13:48:56 0 0 0 1 0 0.005 mysql.host
02-02T13:48:56 0 0 0 1 0 0.005 mysql.ndb_binlog_index
02-02T13:48:56 0 0 0 1 0 0.006 mysql.plugin
02-02T13:48:56 0 0 0 1 0 0.005 mysql.proc
02-02T13:48:56 0 0 0 1 0 0.006 mysql.procs_priv
02-02T13:48:56 0 0 2 1 0 0.007 mysql.proxies_priv
02-02T13:48:56 0 0 0 1 0 0.005 mysql.servers
02-02T13:48:56 0 0 3 1 0 0.005 mysql.tables_priv
02-02T13:48:56 0 0 0 1 0 0.006 mysql.time_zone
02-02T13:48:56 0 0 0 1 0 0.005 mysql.time_zone_leap_second
02-02T13:48:56 0 0 0 1 0 0.005 mysql.time_zone_name
02-02T13:48:56 0 0 0 1 0 0.006 mysql.time_zone_transition
02-02T13:48:56 0 0 0 1 0 0.006 mysql.time_zone_transition_type
02-02T13:48:56 0 0 52 1 0 0.008 mysql.user
-bash-3.2$

Heather Sullivan (hsullivan) wrote :
Heather Sullivan (hsullivan) wrote :

I have also tried adding DNS A=utf8 as mentioned in their documentation, but this parameter returns the same errors whether default-character-set = utf8 or loose-default-character-set = utf8 is used.
http://www.percona.com/doc/percona-toolkit/2.0/dsn_data_source_name_specifications.html

Tests:
Failed: pt-table-checksum h=c1-dbdev103,u=perconaadm --ask-pass --recursion-method=hosts ##with default-character-set = utf8
Failed: pt-table-checksum h=c1-dbdev103,u=perconaadm,A=utf8 --ask-pass --recursion-method=hosts ##with default-character-set = utf8
Failed: pt-table-checksum h=c1-dbdev103,u=perconaadm,A=utf8 --ask-pass --recursion-method=hosts ##with loose-default-character-set = utf8
Succeeded: pt-table-checksum h=c1-dbdev103,u=perconaadm --ask-pass --recursion-method=hosts ##with loose-default-character-set = utf8

tags: added: charset pt-table-checksum wrong-sql
Brian Fraser (fraserbn) wrote :

Hey Heather,

Thank you for your bug report! Without having really dug into it, it strikes me that this might get fixed alongside https://bugs.launchpad.net/percona-toolkit/+bug/932327
The bad news is that we are still discussing how to fix that one, but that bug report does have an attached branch with a tentative fix, if you feel like giving it a spin.

Daniel Nichter (daniel-nichter) wrote :

Which OS and version of DBD::mysql are you using?

Changed in percona-toolkit:
status: New → Triaged
Heather Sullivan (hsullivan) wrote :

Hi Daniel,

I've since upgraded since I encountered my original issue, but I still see the error when default-character-set = utf8, no error with loose-default-character-set = utf8.

Checks tested, both with errors when default-character-set = utf8:
pt-table-checksum u=perconaadm --ask-pass --recursion-method=hosts
pt-table-checksum u=perconaadm,A=utf8 --ask-pass --recursion-method=hosts

OS: CentOS release 5.7
MySQL: 5.5.17
DBD::mysql: perl-DBD-MySQL-3.0007-2.el5

I had been previously running MySQL 5.5.8 on CentOS release 5.5 & 5.6, same DBD::mysql module.

Brian Fraser (fraserbn) on 2012-08-30
Changed in percona-toolkit:
assignee: nobody → Brian Fraser (fraserbn)
Brian Fraser (fraserbn) wrote :

Augh, I wrote a long reply here but then firefox crashed : /

While I've not been able to reproduce this, I've finally figured out what is causing it. The "Incorrect string value" error shows up if you try inserting non-BMP characters into a column with charset utf8, because MySQL's utf8 is stupid and broken (it only covers the BMP, text and varchar columns are truncated on insert at the first astral character, so it's not even a legal UTF-8 encoding). What I don't understand is why the loose- version works; Maybe it disables encoding errors? (I'm told that it shouldn't, but...)
The good news is that there's a couple of workarounds (I think. I've only tried the third):

Since you're on mysql +5.5.3, you can change the default-character-set to utf8mb4, which is the non-broken UTF-8 encoding. I don't think that all the toolkit deals with utf8mb4 yet (pt-archiver won't recognize it as a valid charset for the --file option) but I'll update it soon so that it will. That should be about it;

You could also alter percona.checksum so that the lower_boundary and upper_boundary columns are CHARSET utf8mb4.

Otherwise you could try the code in https://code.launchpad.net/~percona-toolkit-dev/percona-toolkit/possible-fix-925781-932327, which I just worked into dealing with this (and other UTF-8 errors). Do keep in mind that it will invalidate your previous percona.checksums table, since it uses a different serialization for the lower and upper boundary columns.

Changed in percona-toolkit:
status: Triaged → In Progress
Kolbe (kolbe) wrote :

I think the suggestion to use utf8mb4 is misguided. The data going into lower_boundary and upper_boundary never needs to be manipulated as character data. If someone is using a BINARY PK, there will always be sequences that are illegal in both utf8 and utf8mb4.

Why not use BLOB instead of TEXT for these columns? That should resolve the issue completely.

Kolbe (kolbe) wrote :

I see that one of the changes in the proposed patch is to change TEXT to "text CHARACTER SET binary", which is the same thing as BLOB, so it looks like this is the path being chosen despite the comments discussing utf8mb4.

Daniel Nichter (daniel-nichter) wrote :

Heather, Kolbe:

Using A= utf8mb4 and changing the table's default charset to the same should work. But yes, using BLOB for the lower_boundary and upper_boundary columns should also work. So far I can't think of why not to use BLOB for these columns and let the connection and result charsets handle interpreting the data.

Changed in percona-toolkit:
assignee: Brian Fraser (fraserbn) → nobody
tags: added: i37064
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments