Support for contractions between non-ASCII characters and Croatian collation

Bug #488040 reported by Ante Karamatić
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MariaDB
Fix Released
Medium
Michael Widenius

Bug Description

From Neven Jacmenovic:

The feature we desperately need in MariaDB is proper support for Croatian utf8 collation based on Croatian alphabet (http://en.wikipedia.org/wiki/Gajica) so we can finally sort croatian words (names etc) properly. MySQL don't have support for it, without this, we can't consider MySQL server or MariaDB for that matter, a choice for eg. government migration to open-source platform in near future. Most, if not all of those organizations now use MS SQL instead of open source solutions.

AFAIK the countries which would benefit from the same implementation (alongside Croatia) are: Bosnia, Serbia (for latin charset) and Monte Negro (for latin charset).

There already is built in latin2 Croatian collation (latin2_croatian_ci) and CP1250 Croatian collation (cp1250_croatian_ci) in MySQL but those implementations lack digraph support - single letters consisted of two letters (http://www.collation-charts.org/mysql60/mysql604.latin2_croatian_ci.html) and they are useless. And without proper support for diagraphs, we will never be able to use ORDER BY properly (a-b-c-č-ć-d-dž-đ-e-f-g-h-i-j-k-l-lj-m-n-nj-o-p-r-s-š-t-u-v-z-ž).

Closest to Croatian is Slovenian collation (utf8_slovenian_ci) support built-in in MySQL, but it also lacks digraphs so it's not possible to adapt it (http://www.collation-charts.org/mysql60/mysql604.utf8_slovenian_ci.html).

Right now, we are forced to use utf8_general_ci collation, which off course, doesn't know how to order Croatian alphabet properly. I've attached mysqldump with Croatian alphabet. Valid ordering should be: a-b-c-č-ć-d-dž-đ-e-f-g-h-i-j-k-l-lj-m-n-nj-o-p-r-s-š-t-u-v-z-ž.
"DŽ", "NJ" and "LJ" are SINGLE letters.

I've submitted S4 feature request to MySQL some time ago, and MySQL dev team started talking about it, but nothing happened (http://bugs.mysql.com/44523).

Please MariaDB developers, make our native language suck less! :)

Tags: 5.1
Revision history for this message
Ante Karamatić (ivoks) wrote :
Kurt von Finck (mneptok)
Changed in maria:
status: New → In Progress
importance: Undecided → Medium
Revision history for this message
Ante Karamatić (ivoks) wrote :

As explained at:

http://www.collation-charts.org/articles/croatian.htm

this patch does more than just add support for Croatian UTF8 collation. It was based on Alexander's patch for mysql 5.1 (http://www.collation-charts.org/articles/utf8_croatian_ci.diff) and you could probably get it by pulling from mysql 6.

Revision history for this message
Michael Widenius (monty) wrote : re: [Bug 488040] [NEW] Support for contractions between non-ASCII characters and Croatian collation

Hi!

>>>>> "Ante" == Ante Karamati <Ante> writes:

Ante> Public bug reported:
>> From Neven Jacmenovic:

Ante> The feature we desperately need in MariaDB is proper support for
Ante> Croatian utf8 collation based on Croatian alphabet
Ante> (http://en.wikipedia.org/wiki/Gajica) so we can finally sort croatian
Ante> words (names etc) properly. MySQL don't have support for it, without
Ante> this, we can't consider MySQL server or MariaDB for that matter, a
Ante> choice for eg. government migration to open-source platform in near
Ante> future. Most, if not all of those organizations now use MS SQL instead
Ante> of open source solutions.

<cut>

Croatian character sets are pushed into MariaDB 5.1-merge and should
be in default MariaDB 5.1 tomorrow.

Regards,
Monty

Changed in maria:
status: In Progress → Fix Committed
Revision history for this message
Michael Widenius (monty) wrote :

Croatian character sets are pushed into MariaDB 5.1-merge and should be in default MariaDB 5.1 tomorrow.

Revision history for this message
Ante Karamatić (ivoks) wrote :

There's an update for this bug. Patch is attached. Explained at:

http://www.collation-charts.org/

"Dec 2, 2009. An updated version of the Croatian collation patch for MySQL-5.1 is available. It works a little bit more accurate when optimizing a LIKE query for UCS2 columns, in case of non-ASCII contractions:

  SELECT a FROM t1 WHERE a LIKE 'dž%';

The previous version could potentially lose some rows."

tags: added: 5.1
Changed in maria:
assignee: nobody → Michael Widenius (monty)
Changed in maria:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.