locale and charset problem in mysql - default character set should be set to utf8

Bug #34181 reported by nahs
72
This bug affects 7 people
Affects Status Importance Assigned to Milestone
mysql-dfsg-5.1 (Ubuntu)
Triaged
Wishlist
Unassigned
Declined for Feisty by Mathias Gug
Declined for Karmic by Mathias Gug
Declined for Lucid by Mathias Gug

Bug Description

in dapper, mysql has no default charset setting in /etc/mysql/my.cnf. So. mysql select latin1 for default charset. but it has problem with input cjk into database.

and mysql has it's own localization files in /usr/share/mysql/. but in korean. it was encoded with euc-kr. so i must changed my terminal encode to euc-kr for use it.

since ubuntu recommend utf8 environment, i think that mysql also support utf8 by default is better.

Tags: trivial
Revision history for this message
atie (atie-at-matrix) wrote :

I have checked mysql-server 4.1 (4.1.15) and 5.0 (5.0.18) source packages in Dapper. Indeed files such as errmsg in sql/share/korean are encoded with euc-kr.

Revision history for this message
ZhengPeng Hou (zhengpeng-hou) wrote :

I agree on useing utf8 as default . Guys from china has figure this out , and hope use utf8.

Revision history for this message
Abel Cheung (abelcheung) wrote :

Agreed, and I think it doesn't pose too much problem for existing databases/tables, since they should already have been using certain charset. After changing charset, only new tables/dbs will use new charset, old ones are not affected, do they?

Revision history for this message
vince (vido) wrote :

This problem still occur in 5.0.19

Revision history for this message
Olivier Cortès (olive) wrote :

confirming this problem with PhpMyAdmin :
phpmyadmin is utf-8, my database was converted from utf-8 at import, but when i use phpmyadmin to insert values in any tables, they are "automagically" converted to latin1.
all default COLLATE values are latin1-swedish in mysql, but i didn't set this anywhere (i'm using utf-8 everywhere possible, and speak french, so in worst case it would have been latin-1-french, or something like that).

using recent dapper, mysql-server 5.0.21-3 .

Revision history for this message
Abel Cheung (abelcheung) wrote :

I'm using the following settings in /etc/my.cnf to force everything in UTF-8 encoding:

[mysqld]
character-set-server=utf8

[mysql.server]
character-set-server=utf8

[mysqld_safe]
character-set-server=utf8

[mysql]
default-character-set=utf8

Though I think only adding [mysqld] and [mysql] sections are enough.

However, are we, the users, simply keep discussing among ourselves?

Revision history for this message
Sebastian Bertho (sbertho) wrote :

One year and three Ubuntu distributions are gone now without any progress. All are "broken", because the shell is set to utf8 and mysql assumes it to be in latin1 - so every call to mysql oder mysqldump will result in character conversion errors and broken data.

So what is keeping this bug from being fixed?

Revision history for this message
Abel Cheung (abelcheung) wrote :

Sebastian, just calm down. Not worth discussing, let's just fix it in our own servers.

Revision history for this message
Alan Tam (at) wrote :

Since the package is in main, I am not so sure how to notify the Ubuntu Core team of this.
Anyway, I tagged it "trivial" as per https://help.launchpad.net/TaggingLaunchpadBugs

description: updated
Revision history for this message
Alan Tam (at) wrote :

Perhaps the reason is that it was filed against a non-existent package!

description: updated
Revision history for this message
Abel Cheung (abelcheung) wrote :

Alan, you can be correct, the bug is reported such a long time ago that I don't even notice the name of source is mysql-dfsg now.

Revision history for this message
Alan Tam (at) wrote :

Perhaps unrelated to the length in time. The mysql source package bears "-dfsg" since Nov 2004.
    http://packages.qa.debian.org/m/mysql-dfsg-5.0.html
    http://packages.qa.debian.org/m/mysql-dfsg-4.1.html
    http://packages.qa.debian.org/m/mysql-dfsg.html
    http://packages.qa.debian.org/m/mysql.html
I think it is a bug of launchpad that fails to recognize a mis-typed source package.

Revision history for this message
Alan Tam (at) wrote :

Let me attempt to analyze the impact of the proposed fix on changing the default character set.

* Every database created stores its own default character set, hence this change does not affect them.

* Since utf8 is a superset of latin1, every newly created database should be able to store the same kind of information.

* The reason that upstream decides latin1 as default is real mystery. Perhaps it is due to legacy reason from version on or before 4.0.

Let's attempt to ask for inclusion in Feisty.

Revision history for this message
Emmet Hikory (persia) wrote :

That patch will work well for new installs, but /etc/mysql/my.cnf is registered as a conffile, and so will not automatically be updated on upgrade (without user interaction). It is probably additionally worth adding a note to README.Debian or NEWS.Debian indicating that existing users will need to appropriately modify the conffile to take advantage of this fix.

Furthermore, there seem to be two issues in the bug report, firstly that the databases are not created in UTF8, and secondly that the localisation data in /usr/share/mysql/$language/errmsg.sys are not in utf8. Is this automatically fixed by this patch? I'm not able to replicate for ja_JP.UTF-8, as the strings appear not to be translated (or at least all my test errors resulted in English error messages).

Revision history for this message
Alan Tam (at) wrote :

Emmet,

MySQL defines a lot of variables for character sets:
mysql> show variables;
+---------------------------------+---------------------------------+
| Variable_name | Value |
+---------------------------------+---------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
+---------------------------------+---------------------------------+

Does it help if you switch all of them to utf8?

I prefer forking this issue into another bug, since it is a localization issue, rather than the internationalization issue that most comments are about.

Revision history for this message
Emmet Hikory (persia) wrote :

Alan,
    My difficulty with replication is related to the relative lack of translated content in /usr/share/mysql/japanese/errmsg.sys, more than with the encoding of the messages. I agree that these are probably separate bugs: more I just didn't want the second issue to get lost, and don't feel appropriate forking it myself, as I cannot replicate in my test locale.

Korean Testers,
    Could someone try to replicate the EUC-KR localisation issue for the translated messages, and open a separate bug for that? /usr/share/mysql/korean/errmsg.sys appears to contain more translated strings (although I'm only seeing mojibake).

Revision history for this message
Alan Tam (at) wrote :

From what I read from http://bugs.mysql.com/5484 , it seems that MySQL does not do any output conversion for errmsg.sys. Perhaps it doesn't even know the charset it is encoded. If this is the case, probably it is better we fire a bug upstream instead.

Revision history for this message
Emmet Hikory (persia) wrote :

Looking at mysql-dfsg-5.0-5.0.38/sql/share/errmsg.txt, it appears that encodings are hardcoded for translations in a single multiple-encoding file. These strings are then directly mapped into the various errmsg.sys files. Looking more deeply, it appears that there is about 10% coverage in Japanese in the error strings, so I should be able to replicate with a bit more effort. Once done, I'll create new launchpad and upstream bugs for mysql-server to use the default encoding of the server (or better, the client) when presenting error messages.

This bug should only track the default database encoding, so that stored strings round-trip properly from a UTF8 locale.

Mathias Gug (mathiaz)
Changed in mysql-dfsg-5.0:
importance: Medium → Wishlist
status: Confirmed → Triaged
Revision history for this message
®om (rom1v) wrote :

Same problem in ubuntu hardy repositories...

The default charset should be utf8...

Revision history for this message
Jean-Max Reymond (jmreymond-free) wrote :

very annoying bug because mysqldump is running as utf-8 (system language) but your server in running as latin1 (default for Ubuntu) somysqldump is doing a conversion which fails

Chuck Short (zulcss)
affects: mysql-dfsg-5.0 (Ubuntu) → mysql-dfsg-5.1 (Ubuntu)
Revision history for this message
dernasherbrezon (rodionovamp) wrote :

mysql in enterprise server should have utf8 charset by default. UTF8 is used throughout the system and it is really confusing to have latin1 for mysql clients.

Revision history for this message
Johan Ramm-Ericson (johanre) wrote :

Mathias, do you mind explaining why you keep rejecting this request? From my - and several others - perspective it seems quite natural for utf8 to be the default. Particularly in view of the fact that complete internationalization has been a longstanding goal for the Ubuntu distribution.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.