Support Unicode/UTF8 by default

Bug #496746 reported by Liraz
30
This bug affects 6 people
Affects Status Importance Assigned to Milestone
TurnKey Linux
Fix Released
Low
Alon Swartz

Bug Description

Issues with character sets seem to be a frequent complaint on the forums:

http://www.turnkeylinux.org/forum/support/20091204/unicode-characters-turns-redmine
http://www.turnkeylinux.org/forum/support/20091211/change-character-set-tracks-appliance

By default MySQL uses latin1 encoding but that's not a very good default. We should be supporting UTF8 wherever possible.

Revision history for this message
Liraz (liraz-siri) wrote :

Meanwhile for those suffering from this issue there is a nice bit of information about converting character sets in MySQL here:

http://www.mysqlperformanceblog.com/2009/03/17/converting-character-sets/

Revision history for this message
Raumkraut (raumkraut) wrote :

Here's a direct link (via the first link in the bug description) to a sequence of shell commands which will UTF8-ify the databases (worked for me):
http://www.turnkeylinux.org/forum/support/20091204/unicode-characters-turns-redmine#comment-1761

NB. If your mysql root user has a password (which it should!), you will need to modify all the `mysql` and `mysqldump` commands to add a `-p` argument.

Revision history for this message
Jonathan-David Schroder (myselfhimself) wrote :

Hello I have a similar issue with the just Debian Lenny turnkey core linux.

running "locale" gives :
LANG=
LC_CTYPE="POSIX"
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

and "locale -a" to list available locales gives :
C
POSIX

For building an openstreetmap postgresql database using postgresql 8.4 (no issue with postgresql 8.3 I think), I need to tell postgresql to create using encoding UTF8, though it will raise an error to it telling that the default template database is in SQL_ASCII already. This depends on the locale settings used when apt-get installing postgresql-8.4 etc.

According to this blog article : http://bjeanes.com/2010/07/08/fix-encoding-errors-preventing-postgresql-database-creation
I would need to run the following prior to creating any database with encoding different from SQL_ASCII (or something linked to the system's current locale) :
pg_dropcluster --stop 8.4 main
pg_createcluster --start -e UTF-8 8.4 main

There's more info also here :
http://ubuntuforums.org/showthread.php?t=138022 , see post #6
In short, I should edit 2-3 files and run : locale-gen; dpkg-reconfigure locales to set UTF8 as my default system encoding.
I must mention, that contrary to the /etc/locale.def file shown in that list, my Turn Key Core's (debian lenny) /etc/locale.def has lots of similar lines but all of them were commented (none is enabled).

What's the policy for your turn key core developers as to encoding, can we have it as UTF-8 by default ?
I don't know the debian world well; does the vanilla debian lenny distribution ship with UTF-8 or POSIX/C encoding as default ?

Thanks so much your help. I'm staying tuned for collaboration on this issue.

Revision history for this message
Jonathan-David Schroder (myselfhimself) wrote :

Some additional info on my issue, probably this was my fault.

I got the above issue while working in a chrooted vanilla extracted turn key core debian lenny .iso image.
That means that I did not have an initial turnkey install/auto boot setup which might fix my encoding issues..

I've also just run the debian lenny Turn Key core in Try out / live cd mode.
On it, "locale" yields :
LANG=en_GB
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=C
And have tried following this example : http://www.turnkeylinux.org/forum/support/20091224/how-set-locale-japanese
essentially :
locale-gen en_US.UTF-8 => gives no error
dpkg-reconfigure locales
=> enable en_US.UTF-8 in the list
=> set en_US.UTF-8 as the default locale
=> something gets run :
Generating locales (this might take a while)...
  en_US.UTF-8...locale alias file `/usr/share/locale/locale.alias' not found: No such file or directory
 done
Generation complete.
#I land back in bash terminal

Thanks for your help.

Revision history for this message
Alon Swartz (alonswartz) wrote :

Unicode/UTF8 will be set as the default in all upcoming TurnKey appliances

Changed in turnkeylinux:
assignee: nobody → Alon Swartz (alonswartz)
importance: Undecided → Low
status: New → Fix Committed
Revision history for this message
Aleksandar Pavić (acosonic) wrote :

When you install redmine, it offers UTF8 as default, please use it. Thx.

Revision history for this message
Aleksandar Pavić (acosonic) wrote :

I've downloaded RC11 version, and it's not UTF-8 by default, can't use cyrilic...

Revision history for this message
Alon Swartz (alonswartz) wrote :

Hi Aleksandar,

Thanks for reporting the issue. It seems that our force-utf8 configuration was run after the redmine databases were created, so the collation_database was set to latin1, which in turn configured the character_set_database to latin as well.

I've committed a fix for this issue which will fix redmine, and any other builds which had the same fate. It will be included in the upcoming 11.0 stable release.

Changed in turnkeylinux:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.