Bug #1388808 “Request for new language packages for Kurdish Sora...” : Bugs : langpack-locales package : Ubuntu

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2009-02-02:

#9

Kurdish (Iraq) needs an own locale definition, since Kurdish is an official
language in Iraq since 200x

duo to:
https://bugs.launchpad.net/ubuntu/+source/langpack-locales/+bug/266975

the locale are defined: http://www.zkurd.org/aras/ckb_IQ.txt

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2009-02-02:

#10

Created attachment 3704
localedata for Kurdish Sorani (CKB)

the localedata for Kurdish Sorani (CKB)

Revision history for this message

In Sourceware.org Bugzilla #9809, Erdal Ronahi (erdalronahi) wrote on 2009-02-02:

#11

Download full text (6.0 KiB)

Comment on attachment 3704
localedata for Kurdish Sorani (CKB)

comment_char %
escape_char /
% Kurdish (Sorani) language locale for Iraq and Iran.
% Contributed by Aras Noori <email address hidden> and
% Erdal Ronahi<email address hidden>.
% Contact: Aras
% Language: ku
% Date: 2009-01-29
% Distribution and use is free, also
% for commercial purposes.
% History:
%

LC_IDENTIFICATION
title "Kurdish language locale for Sorani dialects"
source ""
address ""
contact "Aras"
email "<email address hidden>, <email address hidden>"
tel ""
fax ""
language "Kurdish"
territory "Iraq"
revision "1.0"
date "2009-01-29"
%
category "ckb_IQ:2000";LC_IDENTIFICATION
category "ckb_IQ:2000";LC_CTYPE
category "ckb_IQ:2000";LC_COLLATE
category "ckb_IQ:2000";LC_TIME
category "ckb_IQ:2000";LC_NUMERIC
category "ckb_IQ:2000";LC_MONETARY
category "ckb_IQ:2000";LC_MESSAGES
category "ckb_IQ:2000";LC_PAPER
category "ckb_IQ:2000";LC_NAME
category "ckb_IQ:2000";LC_ADDRESS
category "ckb_IQ:2000";LC_TELEPHONE
category "ku_IQ:2000";LC_MEASUREMENT

END LC_IDENTIFICATION

LC_CTYPE
copy "i18n"
END LC_CTYPE

LC_COLLATE

% Copy the template from ISO/IEC 14651
copy "iso14651_t1"

END LC_COLLATE

LC_MONETARY
% This is the POSIX Locale definition the LC_MONETARY category.
% These are generated based on XML base Locale difintion file
% for IBM Class for Unicode/Java
%
int_curr_symbol "<U0049><U0051><U0044><U0020>"
currency_symbol "<U062F><U002E><U0639><U002E>"
mon_decimal_point "<U002E>"
mon_thousands_sep "<U002C>"
mon_grouping 3
positive_sign ""
negative_sign "<U002D>"
int_frac_digits 3
frac_digits 3
p_cs_precedes 1
p_sep_by_space 1
n_cs_precedes 1
n_sep_by_space 1
p_sign_posn 1
n_sign_posn 2
%
END LC_MONETARY

LC_NUMERIC
% This is the POSIX Locale definition for the LC_NUMERIC category.
%
decimal_point "<U002E>"
thousands_sep "<U002C>"
grouping 3
%
END LC_NUMERIC

LC_TIME
% This is the POSIX Locale definition for the LC_TIME category.
% These are generated based on XML base Locale difintion file
% for IBM Class for Unicode/Java
%
% Abbreviated weekday names (%a)
abday "<U062D>";"<U0646>";/
     "<U062B>";"<U0631>";/
     "<U062E>";"<U062C>";/
     "<U0633>"
%
% Full weekday names (%A)
day "<U06CC><U06D5><U0643><U0634><U06D5><U0645><U0645><U06D5>";/
     "<U062F><U0648><U0648><U0634><U06D5><U0645><U0645><U06D5>";/
     "<U0633><U06CE><U0634><U06D5><U0645><U0645><U06D5>";/
     "<U0686><U0648><U0624><U0631><U0634><U06D5><U0645><U0645><U06D5>";/
     "<U067E><U06CE><U0646><U062C><U0634><U06D5><U0645><U0645><U06D5>";/
     "<U0647><U06D5><U06CC><U0646><U06CC>";/
     "<U0634><U06D5><U0645><U0645><U06D5>";/
%
% Abbreviated month names (%b)
abmon "<U064A><U0646><U0627>";"<U0641><U0628><U0631>";/
     "<U0645><U0627><U0631>";"<U0623><U0628><U0631>";/
     "<U0645><U0627><U064A>";"<U064A><U0648><U0646>";/
     "<U064A><U0648><U0644>";"<U0623><U063A><U0633>";/
     "<U0633><U0628><U062A>";"<U0623><U0643><U062A>";/
     "<U0646><U0648><U0641>";"<U062F><U064A><U0633>"
%
% Full month names (%B)
mon "<U064A><U0646><U0627><U064A><U0631>";/
  ...

Comment on attachment 3704
localedata for Kurdish Sorani (CKB)

escape_char  /
comment_char  %
% Kurdish (Sorani) language locale for Iraq and Iran.
% Contributed by Aras Noori <aras.noori@gmal.com> and
% Erdal Ronahi<erdal.ronahi@gmail.com>.
% Contact: Aras Noori
% Language: ku
% Date: 2009-04-14
% Distribution and use is free, also
% for commercial purposes.
% History:
% January 2009: Defining CKB locale
% March 2009: Adding rule for CKB
%

LC_IDENTIFICATION
title	   "Kurdish language locale for Sorani dialects - Central Kurdish"
source	   ""
address    ""
contact    "Aras"
email	   "aras.noori@gmail.com"
tel	   ""
fax	   ""
language   "Kurdish"
territory  "Iraq"
revision   "1.1"
date	   "2009-04-15"
%

category  "ckb_IQ:2000";LC_IDENTIFICATION
category  "ckb_IQ:2000";LC_CTYPE
category  "ckb_IQ:2000";LC_COLLATE
category  "ckb_IQ:2000";LC_TIME
category  "ckb_IQ:2000";LC_NUMERIC
category  "ckb_IQ:2000";LC_MONETARY
category  "ckb_IQ:2000";LC_MESSAGES
category  "ckb_IQ:2000";LC_PAPER
category  "ckb_IQ:2000";LC_NAME
category  "ckb_IQ:2000";LC_ADDRESS
category  "ckb_IQ:2000";LC_TELEPHONE
category  "ckb_IQ:2000";LC_MEASUREMENT

END LC_IDENTIFICATION

LC_CTYPE
copy "i18n"
END LC_CTYPE

LC_COLLATE
% The Sorani Kurdish dialect is mainly written using a modified Arabic-based
alphabet with 33 letters. 
% Unlike the regular Arabic alphabet, which is an abjad, Sorani is an alphabet
in which vowels are mandatory, making the script easy to read.
%
% The CKB (Sorani) alphabet order is: 
% in Latin: a, b, c, ç, d, e, ê, f, g, h, i, î, j, k, l, ll, m, n, o, p, q,
r, rr, s, sh, t, u, uu, v, w, x, y, z
% ئ، ب، پ، ت، ج، چ، ح، خ، د، ر، ڕ، ز، ژ، س، ش،
ف، ڤ، ق، ع، غ، ك، گ، ل، ڵ، م، ن، و، وو، ۆ، هـ،
ی، ێ
% vowels: A, E, I, O, U, UU
% پیتەبزوێنەكان ئەمانەن: ئ، ا، ە، و، وو، ۆ،
ی، ێ،
%
% Copy the template from ISO/IEC 14651
copy "iso14651_t1"

collating-element <ئا> from <U0626><U0627>
collating-element <وو> from <U0648><U0648>
collating-element <لا> from <U0644><U0627>

collating-symbol <U0628>
collating-symbol <U062C>
collating-symbol <U0631>
collating-symbol <U0632>
collating-symbol <U0641>
collating-symbol <U0643>
collating-symbol <U0644>
collating-symbol <U0648>
collating-symbol <U06CC>

reorder-after <U0628> <U067E>
reorder-after <U062C><U0686>
reorder-after <U0631><U0695>
reorder-after <U0632><U0698>
reorder-after <U0641><U06A4>
reorder-after <U0643><U06AF>
reorder-after <U0644><U06B5>
reorder-after <U0648><U06C6>
reorder-after <U06CC><U06CE>

% Kurdish digits same as Arabic ones: they are the basic forms.
reorder-after <U0660>
<U0660> <0>;<PCL>;<MIN>;IGNORE
<U0661> <1>;<PCL>;<MIN>;IGNORE
<U0662> <2>;<PCL>;<MIN>;IGNORE
<U0663> <3>;<PCL>;<MIN>;IGNORE
<U0664> <4>;<PCL>;<MIN>;IGNORE
<U0665> <5>;<PCL>;<MIN>;IGNORE
<U0666> <6>;<PCL>;<MIN>;IGNORE
<U0667> <7>;<PCL>;<MIN>;IGNORE
<U0668> <8>;<PCL>;<MIN>;IGNORE
<U0669> <9>;<PCL>;<MIN>;IGNORE

reorder-end

END LC_COLLATE

LC_MONETARY
% This is the POSIX Locale definition the LC_MONETARY category.
% These are generated based on XML base Locale difintion file
% for IBM Class for Unicode/Java
%
int_curr_symbol       "<U0049><U0051><U0044><U0020>"
currency_symbol       "<U062F><U002E><U0639><U002E>"
mon_decimal_point     "<U002E>"
mon_thousands_sep     "<U002C>"
mon_grouping	      3
positive_sign	      ""
negative_sign	      "<U002D>"
int_frac_digits       3
frac_digits	      3
p_cs_precedes	      1
p_sep_by_space	      1
n_cs_precedes	      1
n_sep_by_space	      1
p_sign_posn	      1
n_sign_posn	      2
%
END LC_MONETARY

LC_NUMERIC
% This is the POSIX Locale definition for the LC_NUMERIC  category.
%
decimal_point	       "<U002E>"
thousands_sep	       "<U002C>"
grouping	       3
%
END LC_NUMERIC

LC_TIME
% This is the POSIX Locale definition for the LC_TIME category.
% These are generated based on XML base Locale difintion file
%
% Abbreviated weekday names (%a)
abday	    "<U06CC><U06D5><U0626><U0634>";"<U062F><U0648><U0648><U0634>";/
	    "<U0633><U0626><U0634>";"<U0686><U0648><U0631><U0634>";/
	    "<U067E><U0626><U0634>";"<U0647><U0647>";/
	    "<U0634><U06D5><U0645>"
%
% Full weekday names (%A)
day	    "<U06CC><U06D5><U0643><U0634><U06D5><U0645><U0645><U06D5>";/
	    "<U062F><U0648><U0648><U0634><U06D5><U0645><U0645><U06D5>";/
	    "<U0633><U06CE><U0634><U06D5><U0645><U0645><U06D5>";/
	    "<U0686><U0648><U0624><U0631><U0634><U06D5><U0645><U0645><U06D5>";/
	    "<U067E><U06CE><U0646><U062C><U0634><U06D5><U0645><U0645><U06D5>";/
	    "<U0647><U06D5><U06CC><U0646><U06CC>";/
	    "<U0634><U06D5><U0645><U0645><U06D5>";/
%
% Abbreviated month names (%b)
abmon	    "<U064A><U0646><U0627>";"<U0641><U0628><U0631>";/
	    "<U0645><U0627><U0631>";"<U0623><U0628><U0631>";/
	    "<U0645><U0627><U064A>";"<U064A><U0648><U0646>";/
	    "<U064A><U0648><U0644>";"<U0623><U063A><U0633>";/
	    "<U0633><U0628><U062A>";"<U0623><U0643><U062A>";/
	    "<U0646><U0648><U0641>";"<U062F><U064A><U0633>"
%
% Full month names (%B)
mon	    "<U064A><U0646><U0627><U064A><U0631>";/
	    "<U0641><U0628><U0631><U0627><U064A><U0631>";/
	    "<U0645><U0627><U0631><U0633>";/
	    "<U0623><U0628><U0631><U064A><U0644>";/
	    "<U0645><U0627><U064A><U0648>";/
	    "<U064A><U0648><U0646><U064A><U0648>";/
	    "<U064A><U0648><U0644><U064A><U0648>";/
	    "<U0623><U063A><U0633><U0637><U0633>";/
	    "<U0633><U0628><U062A><U0645><U0628><U0631>";/
	    "<U0623><U0643><U062A><U0648><U0628><U0631>";/
	    "<U0646><U0648><U0641><U0645><U0628><U0631>";/
	    "<U062F><U064A><U0633><U0645><U0628><U0631>"
%
% Equivalent of AM PM
am_pm	    "<U0635>";"<U0645>"
%
% Appropriate date and time representation
% %d %b, %Y%Z %I:%M:%S
d_t_fmt     "<U0025><U0064><U0020><U0025><U0062><U002C><U0020><U0025>/
<U0059><U0020><U0025><U005A><U0020><U0025><U0049><U003A><U0025><U004D>/
<U003A><U0025><U0053><U0020><U0025><U0070>"
%
% Appropriate date representation
% %d %b, %Y
d_fmt	    "<U0025><U0064><U0020><U0025><U0062><U002C><U0020><U0025><U0059>"
%
% Appropriate time representation
% %Z %I:%M:%S
t_fmt	    "<U0025><U005A><U0020><U0025><U0049><U003A><U0025><U004D>/
<U003A><U0025><U0053><U0020>"
%
% Appropriate 12 h time representation (%r)
t_fmt_ampm  "<U0025><U005A><U0020><U0025><U0049><U003A><U0025><U004D>/
<U003A><U0025><U0053><U0020><U0025><U0070>"
%
% Appropriate date representation (date(1))   "%a %b %e %H:%M:%S %Z %Y"
date_fmt	"<U0025><U0061><U0020><U0025><U0062><U0020><U0025><U0065>/
<U0020><U0025><U0048><U003A><U0025><U004D><U003A><U0025><U0053><U0020>/
<U0025><U005A><U0020><U0025><U0059>"
%  FIXME: found in CLDR
first_weekday 7
END LC_TIME

LC_MESSAGES
copy "ckb_IQ"
END LC_MESSAGES

LC_PAPER
% This is the ISO_IEC TR14652 Locale definition for the
copy "ckb_IQ"
height	    297
width	    210

END LC_PAPER

LC_NAME
% This is the CKB Locale definition for the
% LC_NAME category.
%
name_fmt    "<U0025><U0069><U0025><U0074><U0025><U0066><U0025><U0074>/
<U0025><U0067>"
name_gen    "<U002D><U0073><U0061><U006E>"
name_mr     "<U0643><U0627><U0643>"
name_mrs    "<U062E><U0627><U062A><U0648>"
name_miss   "<U062E><U0627><U062A><U0648>"
name_ms     "<U062E><U0627><U062A><U0648>"

END LC_NAME

LC_ADDRESS
% This is the IQ CKB Locale definition for the
% LC_ADDRESS
postal_fmt  "<U0025><U007A><U0025><U0063><U0025><U0049><U0025><U0073>/
<U0025><U0062><U0025><U0065><U0025><U0072>"
country_ab2 "<U0049><U0052><U0051>"
country_ab3 "<U0049><U0052><U0051>"
country_post "<U0049><U0052><U0051>"
country_num 364
country_car "<U0049><U0052><U0051>"
% "kurd<U00EE>"
lang_name   "<U0643><U0648><U0631><U062F><U06CC>"
lang_ab     "<U0643><U0648>"
lang_term   "<U0066><U0061><U0073>"
lang_lib    "<U006B><U0075><U0072>"

END LC_ADDRESS

LC_TELEPHONE
% This is the IQ CKB Locale definition for the
%
tel_int_fmt "<U002B><U0025><U0063><U0020><U003B><U0025><U0061><U0020>/
<U003B><U0025><U006C>"
int_prefix  "<U0039><U0036><U0034>"

END LC_TELEPHONE

LC_MEASUREMENT
% This is the ISO_IEC TR14652  Locale definition for the
% 
measurement 1

END LC_MEASUREMENT

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2009-04-16:

#16

I updated the file. please check if still has a bad format.

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2009-06-19:

#17

Subject: Re: Please add Kurdish locale for Kurdish
Sorani (CKB)

Hi
I update the locale couple weeks ago, would you please check it again

http://www.sourceware.org/ml/libc-locales/2009-q2/msg00021.html

Its my pleasure to hear from you a feedback.

Regards
Aras

On Sat, Feb 7, 2009 at 5:53 AM, drepper at redhat dot
com<email address hidden> wrote:
>
> ------- Additional Comments From drepper at redhat dot com 2009-02-07 03:53 -------
> The file is ill-formed. The file is named ckb_IQ and you use it in "copy" in
> various categories? You also have copy and definitions in categories like
> LC_PAPER. You have to fix it up.
>
> --
> What |Removed |Added
> ----------------------------------------------------------------------------
> Status|NEW |WAITING
>
>
> http://sourceware.org/bugzilla/show_bug.cgi?id=9809
>
> ------- You are receiving this mail because: -------
> You reported the bug, or are watching the reporter.
>

Revision history for this message

In Sourceware.org Bugzilla #9809, Martin Pitt (pitti) wrote on 2009-06-22:

#18

Can you please attach the current version instead of adding it as a comment? The
latter destroys all the non-ASCII characters.

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2009-06-22:

#19

Created attachment 4013
new Locale info for CKB

hier is the new local info for CKB

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2009-09-03:

#20

Created attachment 4168
new fixed Locale info for CKB

fixed bugs in Message category

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2009-09-24:

#21

Created attachment 4228
Locale info for CKB

Revision history for this message

In Sourceware.org Bugzilla #9809, Martin Pitt (pitti) wrote on 2009-10-06:

#22

Info is provided, changing back to NEW

Revision history for this message

In Sourceware.org Bugzilla #9809, Martin Pitt (pitti) wrote on 2009-10-06:

#23

I get a lot of errors when I try to build this locale:

locales/ckb_IQ:62: LC_COLLATE: syntax error
locales/ckb_IQ:63: LC_COLLATE: syntax error
locales/ckb_IQ:64: LC_COLLATE: syntax error
locales/ckb_IQ:66: LC_COLLATE: syntax error
locales/ckb_IQ:67: LC_COLLATE: syntax error
locales/ckb_IQ:68: LC_COLLATE: syntax error
locales/ckb_IQ:69: LC_COLLATE: syntax error
locales/ckb_IQ:70: LC_COLLATE: syntax error
locales/ckb_IQ:71: LC_COLLATE: syntax error
locales/ckb_IQ:72: LC_COLLATE: syntax error
locales/ckb_IQ:73: LC_COLLATE: syntax error
locales/ckb_IQ:74: LC_COLLATE: syntax error
locales/ckb_IQ:76: trailing garbage at end of line
locales/ckb_IQ:77: trailing garbage at end of line
locales/ckb_IQ:78: trailing garbage at end of line
locales/ckb_IQ:79: trailing garbage at end of line
locales/ckb_IQ:80: trailing garbage at end of line
locales/ckb_IQ:81: trailing garbage at end of line
locales/ckb_IQ:82: trailing garbage at end of line
locales/ckb_IQ:83: trailing garbage at end of line
locales/ckb_IQ:84: LC_COLLATE: cannot reorder after U000006CC: symbol not known
locales/ckb_IQ:155: extra trailing semicolon
LC_NAME: invalid escape sequence in field `name_fmt'
LC_ADDRESS: invalid escape `%I' sequence in field `postal_fmt'
LC_ADDRESS: `lang_ab' value does not match `lang_term' value
LC_ADDRESS: `lang_lib' value does not match `lang_term' value
LC_ADDRESS: `country_ab2' value does not match `country_num' value
LC_ADDRESS: `country_ab3' value does not match `country_num' value

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2009-10-06:

#24

I am analyzing the Errors now and try to fix them as soon as I can.

Thanks & Regards
Aras

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2009-11-04:

#25

Created attachment 4357
CKB locale - updated

Hi,
I fixed some errors due to the occured Errors. How can I test it by myself
before release to bugzilla?

Regards

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2009-12-10:

#26

Any progress?

(In reply to comment #16)
> Created an attachment (id=4357)
> CKB locale - updated
>
> Hi,
> I fixed some errors due to the occured Errors. How can I test it by myself
> before release to bugzilla?
>
> Regards
>

Revision history for this message

In Sourceware.org Bugzilla #9809, Petr Baudis (pasky) wrote on 2010-06-01:

#27

Use localedef to compile your locale, and $LOCPATH if you don't want to install
it system-wide in order to test it.

Revision history for this message

In Sourceware.org Bugzilla #9809, Drepper-fsp (drepper-fsp) wrote on 2011-05-15:

#28

The file isn't usable as-is, there are many problems when compiling it.

First, it must be in UTF-8.

Second, the collation rules seem all pretty bogus since there already are rules for all the characters defined. If needed, you have to redefine the relocation.

Third, there are many syntax errors.

Fourth, all the values for the fields must use the <U....> notation, not real strings.

Fifth, the values for some fields is plain wrong. localedef will tell you.

I did add the language code to localedef now.

Just run localedef like

localedef -i ./YOURFILE -f UTF-8 ./SOMEDIR

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2011-05-16:

#29

Created attachment 5727
CKB-IQ locale info (Kurdish Sorani)

CKB-IQ locale info (Kurdish Sorani)

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2011-05-16:

#30

Hi,
I updated the file, it was full of syntax Errors, I repair most of them, hope its works now.
I observed many locale files used slach / others using backslash \, compiler distinguished between them!, I learned its should not be so. The file is saved as UTF-8 also with Unix format of EOL.

best regards
Aras

Revision history for this message

In Sourceware.org Bugzilla #9809, Drepper-fsp (drepper-fsp) wrote on 2011-05-16:

#31

You haven't fixed the collation information. You have to use the generic collation data (include it, do not copy it) and then, if necessary at all, define modifications using reorder_after etc.

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2011-05-16:

#32

(In reply to comment #22)
> You haven't fixed the collation information. You have to use the generic
> collation data (include it, do not copy it) and then, if necessary at all,
> define modifications using reorder_after etc.

did you mean
% collating-element <LAM WITH SMALL V-ALEF> from <U06B5><U0627>

they are already commented.

regards
Aras

Revision history for this message

In Sourceware.org Bugzilla #9809, Erdal Ronahi (erdalronahi) wrote on 2011-05-18:

#33

Hallo Aras,

gut zu sehen, dass Du wieder an der Datei arbeitest.

Weißt Dul was "collation" ist und wozu es gut ist? Es geht dabei um die
Reihenfolge der alphabetischen Sortierung.

Liebe Grüße
Erdal

On 16 May 2011 17:24, aras.noori at gmail dot com <
<email address hidden>> wrote:

> http://sourceware.org/bugzilla/show_bug.cgi?id=9809
>
> --- Comment #23 from Aras Noori <aras.noori at gmail dot com> 2011-05-16
> 15:23:50 UTC ---
> (In reply to comment #22)
> > You haven't fixed the collation information. You have to use the generic
> > collation data (include it, do not copy it) and then, if necessary at
> all,
> > define modifications using reorder_after etc.
>
> did you mean
> % collating-element <LAM WITH SMALL V-ALEF> from <U06B5><U0627>
>
> they are already commented.
>
> regards
> Aras
>
> --
> Configure bugmail: http://sourceware.org/bugzilla/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are on the CC list for the bug.
>

Revision history for this message

In Sourceware.org Bugzilla #9809, Erdal Ronahi (erdalronahi) wrote on 2011-05-18:

#34

Sorry for posting in German here, wasn't aware that replies go directly to
the bugmail.

Revision history for this message

In Sourceware.org Bugzilla #9809, Drepper-fsp (drepper-fsp) wrote on 2011-05-28:

#35

(In reply to comment #23)
> (In reply to comment #22)
> > You haven't fixed the collation information. You have to use the generic
> > collation data (include it, do not copy it) and then, if necessary at all,
> > define modifications using reorder_after etc.
>
> did you mean
> % collating-element <LAM WITH SMALL V-ALEF> from <U06B5><U0627>
>
> they are already commented.

No. Look at the other files. There are some broken ones but those I caught in time are using

copy "iso14651_t1"

and then use if necessary reorder_after.

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2011-06-28:

#36

Yes Erdal I also defined 3 collations, they were had syntax Error as Mr. Ulrich Drepper says. I would upload the new version later. Thanks for your efforts.

Revision history for this message

In Sourceware.org Bugzilla #9809, Petr Baudis (pasky) wrote on 2012-04-04:

#37

WAITING for almost a year now. Please reopen the bug when you have a new patch.

Revision history for this message

In Sourceware.org Bugzilla #9809, Petr Baudis (pasky) wrote on 2012-04-04:

#38

Sorry for the clicko; this was not fixed.

Revision history for this message

In Sourceware.org Bugzilla #9809, Jwtiyar Nariman (jwtiyar) wrote on 2012-11-04:

#39

Can you Post problems?

Revision history for this message

In Sourceware.org Bugzilla #9809, Jackie-rosen (jackie-rosen) wrote on 2014-02-16:

#40

*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen from the domain http://volichat.com
Page where seen: http://volichat.com/adult-chat-rooms
Marked for reference. Resolved as fixed @bugzilla.

Revision history for this message

Aras (aras-noori) wrote on 2014-11-03:

#2

ckb_iq.dat Edit (10.9 KiB, application/x-ns-proxy-autoconfig)

description:

updated

Revision history for this message

Gunnar Hjalmarsson (gunnarhj) wrote on 2014-11-03:

#3

Hi Aras, and thanks for the locale definition file. I could successfully compile it.

However, just like last time (https://launchpad.net/bugs/266975) we would like you to submit it upstream too before we add it to Ubuntu. So can you please file an upstream bug report and post the URL to it here.

Changed in langpack-locales (Ubuntu):
status:	New → Incomplete

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2014-11-03:

#41

Created attachment 7887
CKB-IQ locale info (Kurdish Sorani)

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2014-11-03:

#42

Hi All,
I attached the new version, bug free.

I also renewed the Bug on Launchpad at:
https://bugs.launchpad.net/ubuntu/+source/langpack-locales/+bug/1388808

Best regards for all your efforts in past.

Regards
Aras

Revision history for this message

Aras (aras-noori) wrote on 2014-11-03:

#4

Hello Gunnar,
sure, I update the Bug (https://sourceware.org/bugzilla/show_bug.cgi?id=9809) and attached the new file.

Thank you
Aras

Revision history for this message

In Sourceware.org Bugzilla #9809, Gunnar Hjalmarsson (gunnarhj) wrote on 2014-11-04:

#43

Please see comment #33 by Aras Noori.

Revision history for this message

Gunnar Hjalmarsson (gunnarhj) wrote on 2014-11-04:

#5

add-ckb_IQ-locale.patch Edit (14.0 KiB, text/plain)

tags:	added: patch
Changed in langpack-locales (Ubuntu):
assignee:	nobody → Gunnar Hjalmarsson (gunnarhj)
importance:	Undecided → Medium
status:	Incomplete → In Progress

Gunnar Hjalmarsson (gunnarhj) on 2014-11-04

Changed in langpack-o-matic:
assignee:	nobody → Gunnar Hjalmarsson (gunnarhj)
status:	New → In Progress

Revision history for this message

Martin Pitt (pitti) wrote on 2014-11-04:

#6

Uploaded the locale, thanks!

Changed in langpack-locales (Ubuntu):
status:	In Progress → Fix Committed

Revision history for this message

Launchpad Janitor (janitor) wrote on 2014-11-04:

#7

This bug was fixed in the package langpack-locales - 2.13+git20120306-18

---------------
langpack-locales (2.13+git20120306-18) vivid; urgency=low

* debian/patches/ubuntu-ckb_IQ-new_locale.patch:
Addition of the ckb_IQ locale (LP: #1388808).
-- Gunnar Hjalmarsson <email address hidden> Tue, 04 Nov 2014 02:45:00 +0100

Changed in langpack-locales (Ubuntu):
status:	Fix Committed → Fix Released

Revision history for this message

Martin Pitt (pitti) wrote on 2014-11-04:

#8

Merged and rolled out langpack-o-matic. Thanks!

Changed in langpack-o-matic:
status:	In Progress → Fix Released

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike Frysinger (vapier) wrote on 2016-06-07:

#44

first, please update the header of the file to match all the current locales. the first ~10 lines should be the same (e.g. as en_US). you should sync to latest git as the last release is out of date already.

> title "Kurdish language locale based on Arabic letters"

this should be:
Central Kurdish language locale for Iraq

> tel "+49 17629857380"

leave this field blank

> language "Kurdish"

change to "Central Kurdish"

> territory "Iraq, Iran"

drop Iran. this locale is only for Iraq.

> category "ckb_IQ:2000";LC_IDENTIFICATION

you'll need to fix all these category fields. copy them from en_US for their correct values.

> LC_COLLATE

please rebase this to start with:
copy "iso14651_t1"

> % This is the POSIX Locale definition for the LC_NUMERIC category.

delete these old comments from the LC_NUMERIC, LC_TIME, LC_NAME, and LC_ADDRESS categories

> LC_TIME

are you sure about the day/abday/mon/abmon translations ? CLDR says they're different.

make sure day/abday start on Sunday

> am_pm

does Iraq really use am/pm notation ? if not, leave these fields blank.

> first_workday 7

change this to 2 and add this line:
week 7;19971130;1

> yesexpr "<U0628><U06D5><U06B5><U06CE>"
> noexpr "<U0646><U06D5><U062E><U06CE><U0631>"

these need to be updated. these should be regular expressions to match a yes/no answer. see the current en_US value as an example.

please also provide yesstr/nostr translations

> LC_PAPER
> LC_MEASUREMENT

change both of these categories to simply:
copy "ar_IQ"

> name_gen "<U002D><U0073><U0061><U006E>"

this is "-san". is that correct ?

> country_car "<U0049><U0051>"

shouldn't this be "IRQ" instead of "IQ" ?

> LC_ADDRESS

please define country_name (localized translation for Iraq)

> tel_int_fmt "+%c ;%a ;%l"

pretty sure this should be:
+%c %a%t%l

> tel_dom_fmt "<U202A><U0025><U0041><U2012><U0025><U006C><U202C>"

are you sure this is correct ?

Bug Watch Updater (bug-watch-updater) on 2017-10-21

Changed in glibc:
importance:	Unknown → Wishlist
status:	Unknown → Incomplete

Revision history for this message

In Sourceware.org Bugzilla #9809, Jwtiyar Nariman (jwtiyar) wrote on 2020-01-07:

#45

Created attachment 12173
Localedata file for ckb_IQ

Here is new version of localedata file for ckb.thanks

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2020-01-07:

#46

(In reply to Mike Frysinger from comment #35)
> first, please update the header of the file to match all the current
> locales. the first ~10 lines should be the same (e.g. as en_US). you
> should sync to latest git as the last release is out of date already.
>
> > title "Kurdish language locale based on Arabic letters"
>
> this should be:
> Central Kurdish language locale for Iraq
>
> > tel "+49 17629857380"
>
> leave this field blank
>
> > language "Kurdish"
>
> change to "Central Kurdish"
>
> > territory "Iraq, Iran"
>
> drop Iran. this locale is only for Iraq.
>
> > category "ckb_IQ:2000";LC_IDENTIFICATION
>
> you'll need to fix all these category fields. copy them from en_US for
> their correct values.
>
> > LC_COLLATE
>
> please rebase this to start with:
> copy "iso14651_t1"
>
> > % This is the POSIX Locale definition for the LC_NUMERIC category.
>
> delete these old comments from the LC_NUMERIC, LC_TIME, LC_NAME, and
> LC_ADDRESS categories
>
> > LC_TIME
>
> are you sure about the day/abday/mon/abmon translations ? CLDR says they're
> different.
>
> make sure day/abday start on Sunday
>
> > am_pm
>
> does Iraq really use am/pm notation ? if not, leave these fields blank.
>
> > first_workday 7
>
> change this to 2 and add this line:
> week 7;19971130;1
>
> > yesexpr "<U0628><U06D5><U06B5><U06CE>"
> > noexpr "<U0646><U06D5><U062E><U06CE><U0631>"
>
> these need to be updated. these should be regular expressions to match a
> yes/no answer. see the current en_US value as an example.
>
> please also provide yesstr/nostr translations
>
> > LC_PAPER
> > LC_MEASUREMENT
>
> change both of these categories to simply:
> copy "ar_IQ"
>
> > name_gen "<U002D><U0073><U0061><U006E>"
>
> this is "-san". is that correct ?
>
> > country_car "<U0049><U0051>"
>
> shouldn't this be "IRQ" instead of "IQ" ?
>
> > LC_ADDRESS
>
> please define country_name (localized translation for Iraq)
>
> > tel_int_fmt "+%c ;%a ;%l"
>
> pretty sure this should be:
> +%c %a%t%l
>
> > tel_dom_fmt "<U202A><U0025><U0041><U2012><U0025><U006C><U202C>"
>
> are you sure this is correct ?

Thank you for your tipps, @Jwtiayr and I fixed the bugs.

(In reply to Mike Frysinger from comment #35)
> first, please update the header of the file to match all the current
> locales.  the first ~10 lines should be the same (e.g. as en_US).  you
> should sync to latest git as the last release is out of date already.
> 
> > title      "Kurdish language locale based on Arabic letters"
> 
> this should be:
> Central Kurdish language locale for Iraq
> 
> > tel	 	   "+49 17629857380"
> 
> leave this field blank
> 
> > language   "Kurdish"
> 
> change to "Central Kurdish"
> 
> > territory  "Iraq, Iran"
> 
> drop Iran.  this locale is only for Iraq.
> 
> > category  "ckb_IQ:2000";LC_IDENTIFICATION
> 
> you'll need to fix all these category fields.  copy them from en_US for
> their correct values.
> 
> > LC_COLLATE
> 
> please rebase this to start with:
> copy "iso14651_t1"
> 
> > % This is the POSIX Locale definition for the LC_NUMERIC  category.
> 
> delete these old comments from the LC_NUMERIC, LC_TIME, LC_NAME, and
> LC_ADDRESS categories
> 
> > LC_TIME
> 
> are you sure about the day/abday/mon/abmon translations ?  CLDR says they're
> different.
> 
> make sure day/abday start on Sunday
> 
> > am_pm
> 
> does Iraq really use am/pm notation ?  if not, leave these fields blank.
> 
> > first_workday 7
> 
> change this to 2 and add this line:
> week 7;19971130;1
> 
> > yesexpr  "<U0628><U06D5><U06B5><U06CE>"
> > noexpr   "<U0646><U06D5><U062E><U06CE><U0631>"
> 
> these need to be updated.  these should be regular expressions to match a
> yes/no answer.  see the current en_US value as an example.
> 
> please also provide yesstr/nostr translations
> 
> > LC_PAPER
> > LC_MEASUREMENT
> 
> change both of these categories to simply:
> copy "ar_IQ"
> 
> > name_gen    "<U002D><U0073><U0061><U006E>"
> 
> this is "-san".  is that correct ?
> 
> > country_car "<U0049><U0051>"
> 
> shouldn't this be "IRQ" instead of "IQ" ?
> 
> > LC_ADDRESS
> 
> please define country_name (localized translation for Iraq)
> 
> > tel_int_fmt "+%c ;%a ;%l"
> 
> pretty sure this should be:
>   +%c %a%t%l
> 
> > tel_dom_fmt    "<U202A><U0025><U0041><U2012><U0025><U006C><U202C>"
> 
> are you sure this is correct ?

Thank you for your tipps, @Jwtiayr and I fixed the bugs.

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2020-01-07:

#47

(In reply to Jwtiayr Nariman from comment #36)
> Created attachment 12173 [details]
> Localedata file for ckb_IQ
>
> Here is new version of localedata file for ckb.thanks

Great Work.

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-08:

#48

> LC_COLLATE
> % The Kurdish Sorani, Bahdini, and others dialects is mainly written using a modified (Arabic-based alphabet) with 33 letters.
> % Unlike the regular Arabic alphabet, which is an abjad, kurdish is an alphabet in which vowels are mandatory, making the script easy to read.
> %
> % The kurdish alphabet order is:
> % in Latin: a, b, c, ç, d, e, ê, f, g, h, i, î, j, k, l, ll, m, n, o, p, q, r, rr, s, sh, t, u, uu, v, w, x, y, z
> % vowels: A, E, I, O, U, UU
> %
> % Copy the template from ISO/IEC 14651
>
> order_start forward; forward
> %
> % Kurdish numeric characters.
> %
> <U0660> <U0660>

You still did not base the collation on iso14651_t1.

Your LC_COLLATE section should start like this:

LC_COLLATE
copy "iso14651_t1"

and then you should only reorder the characters which are not correctly
ordered already, i.e. you should only do modifications to the default
collation order comming from "iso14651_t1", *not* write everything from
scratch.

I can try to help you with that and try to rewrite your LC_COLLATE.

Revision history for this message

In Sourceware.org Bugzilla #9809, Jwtiyar Nariman (jwtiyar) wrote on 2020-01-08:

#49

T(In reply to Mike FABIAN from comment #39)
> > LC_COLLATE
> > % The Kurdish Sorani, Bahdini, and others dialects is mainly written using a modified (Arabic-based alphabet) with 33 letters.
> > % Unlike the regular Arabic alphabet, which is an abjad, kurdish is an alphabet in which vowels are mandatory, making the script easy to read.
> > %
> > % The kurdish alphabet order is:
> > % in Latin: a, b, c, ç, d, e, ê, f, g, h, i, î, j, k, l, ll, m, n, o, p, q, r, rr, s, sh, t, u, uu, v, w, x, y, z
> > % vowels: A, E, I, O, U, UU
> > %
> > % Copy the template from ISO/IEC 14651
> >
> > order_start forward; forward
> > %
> > % Kurdish numeric characters.
> > %
> > <U0660> <U0660>
>
> You still did not base the collation on iso14651_t1.
>
> Your LC_COLLATE section should start like this:
>
> LC_COLLATE
> copy "iso14651_t1"
>
> and then you should only reorder the characters which are not correctly
> ordered already, i.e. you should only do modifications to the default
> collation order comming from "iso14651_t1", *not* write everything from
> scratch.
>
> I can try to help you with that and try to rewrite your LC_COLLATE.

Thank you mike, its little complicated i think i don't understand your point.
But if you can do its really appreciated.

Revision history for this message

In Sourceware.org Bugzilla #9809, Jwtiyar Nariman (jwtiyar) wrote on 2020-01-12:

#50

Created attachment 12190
attachment-64689-0.html

What we do now dear mike?

On Wed, Jan 8, 2020, 20:35 maiku.fabian at gmail dot com <
<email address hidden>> wrote:

> https://sourceware.org/bugzilla/show_bug.cgi?id=9809
>
> --- Comment #39 from Mike FABIAN <maiku.fabian at gmail dot com> ---
> > LC_COLLATE
> > % The Kurdish Sorani, Bahdini, and others dialects is mainly written
> using a modified (Arabic-based alphabet) with 33 letters.
> > % Unlike the regular Arabic alphabet, which is an abjad, kurdish is an
> alphabet in which vowels are mandatory, making the script easy to read.
> > %
> > % The kurdish alphabet order is:
> > % in Latin: a, b, c, ç, d, e, ê, f, g, h, i, î, j, k, l, ll, m, n, o, p,
> q, r, rr, s, sh, t, u, uu, v, w, x, y, z
> > % vowels: A, E, I, O, U, UU
> > %
> > % Copy the template from ISO/IEC 14651
> >
> > order_start forward; forward
> > %
> > % Kurdish numeric characters.
> > %
> > <U0660> <U0660>
>
> You still did not base the collation on iso14651_t1.
>
> Your LC_COLLATE section should start like this:
>
> LC_COLLATE
> copy "iso14651_t1"
>
> and then you should only reorder the characters which are not correctly
> ordered already, i.e. you should only do modifications to the default
> collation order comming from "iso14651_t1", *not* write everything from
> scratch.
>
> I can try to help you with that and try to rewrite your LC_COLLATE.
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#51

You have

<U0640> IGNORE

in your sort order.

U+0640 ARABIC TATWEEL

Why IGNORE?

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#52

  %
  %
  % Other control characters etc. upto order_end
  %

Why do you sort control characters? These have nothing to do with
the Kurdish Sorani language.

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#53

Created attachment 12192
0001-Add-ckb_IQ-locale.patch

That is your original locale file as a patch

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#54

Created attachment 12193
0002-Fix-ckb_IQ-Add-ckb_IQ-to-SUPPORTED-file-Add-ckb_IQ.U.patch

My suggested changes.

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#55

    LC_MONETARY
   -int_curr_symbol "<U0049><U0051><U0044><U0020>"
   +int_curr_symbol "IQD "
    currency_symbol "<U062F><U002E><U0639>"
   -mon_decimal_point "<U002E>"
   -mon_thousands_sep "<U002C>"
   +mon_decimal_point "."
   +mon_thousands_sep ","
    mon_grouping 3
    positive_sign ""
   -negative_sign "<U002D>"
   +negative_sign "-"
    int_frac_digits 3
    frac_digits 3
    p_cs_precedes 1

For everything which is ASCII, it is allowed (and preferred) to write
the ASCII directly and not the code points.

I.e. it is better (because more readable) to write "-" instead of "<U002D>".

I hope in future this will be allowed also for non-ASCII characters,
at the moment it is only allowed for ASCII.

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#56

     LC_MESSAGES
    -yesexpr "<U0628><U06D5><U06B5><U06CE>"
    -noexpr "<U0646><U06D5><U062E><U06CE><U0631>"
    +yesexpr "^[+1yY<U0628>]"
    +noexpr "^[-0nN<U0646>]"
     yesstr "<U0628><U06D5><U06B5><U06CE>"
     nostr "<U0646><U06D5><U062E><U06CE><U0631>"
     END LC_MESSAGES

"yesstr" and "nostr" are the words for "yes" and "no" in your language.

"yesexpr" should *not* be the same as "yesstr".

"yesexpr" should be a regular expression matching single letters
which could be typed as the response for "yes" when you get a prompt asking something like:

"Do you want ...? (y/n)"

and when you type "y" in English, this means yes.

In *all* glibc locales we include +1yY to the "yesexpr" as long as this does not conflict with the language of that locale.
If "y" would suggest "no" in that language we can not add it to "yesexpr" but in all other cases we add it.

Similar ofr "noexpr".

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#57

     LC_ADDRESS
     postal_fmt "%z%c%T%s%b%e%r"
    -country_name "Iraq"
    -country_ab2 "<U0049><U0051>"
    -country_ab3 "<U0049><U0052><U0051>"
    -country_post "<U0049><U0052><U0051>"
    +country_name "<U0639><U06CE><U0631><U0627><U0642>"
    +country_ab2 "IQ"
    +country_ab3 "IRQ"
    +country_post "IRQ"
     country_num 368
    -country_car "<U0049><U0051>"
    +country_car "IQ"
    +lang_name "<U06A9><U0648><U0631><U062F><U06CC><U06CC> <U0646><U0627><U0648><U06D5><U0646><U
    +lang_term "ckb"
    +lang_lib "ckb"
     %
     END LC_ADDRESS

country_name should be the name of the country in your language (Sorani), *not* in English.

The English name is already in:

territory "Iraq"

lang_name should be the the name of your language in your language.

The English name is already in:

language "Central Kurdish"

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#58

I rewrote the LC_COLLATE section to contain only the absolutely necessary stuff. Now it looks like this:

   LC_COLLATE
   % The Kurdish Sorani, Bahdini, and others dialects is mainly written using a modified (Arabic-based alphabet) with 33 letters.
   % Unlike the regular Arabic alphabet, which is an abjad, kurdish is an alphabet in which vowels are mandatory, making the script easy to read.
   %
   % The kurdish alphabet order is:
   % in Latin: a, b, c, ç, d, e, ê, f, g, h, i, î, j, k, l, ll, m, n, o, p, q, r, rr, s, sh, t, u, uu, v, w, x, y, z
   % vowels: A, E, I, O, U, UU
   %

% Copy the template from ISO/IEC 14651
copy "iso14651_t1"

reorder-after <S0631> % ر
<S0695> % ڕ

   reorder-after <S0646> % ن
   <S0648> % و
   <S06C6> % ۆ

END LC_COLLATE

I.e. this sorts U+0695, U+0648, and U+06C6 differently from the default sort order.

The default sort order comes from

copy "iso14651_t1"

You use this line to copy the default sort order and then add changes needed for your language.

According to what you wrote in your locale, the 3 characters U+0695, U+0648, and U+06C6 sort
differently than the default sort order for Arabic characters, all the reset sort the same
as in the default sort order.

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#59

If you do *not* use

copy "iso14651_t1"

this is bad because then almost all Unicode characters which you do not cover by your own sort order will sort incorrectly. You want a reasonable default and apply the changes for your language to that default.

Of course your locale should sort Kurdish Sorani correctly, but it should not sort other characters (Cyrillic, Devanagari, ... whatever) completely silly.

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#60

Your locale also sorted many control characters and ASCII punctuation characters.

I think there is no reason to deviate from the default for these characters, therefore I removed them.

If you have a good reason why some of these need to be sorted differently for Kurdish, please tell me.

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#61

Your locale sorted the Kurdish numbers at the top, i.e. before the
Western numbers. The default order (as you can see in the ckb_IQ.UTF-8.in sorting test file in my patch) sorts these in between the Western numbers. Like this:

    0
    ٠
    1
    ١
    2
    ٢
    3
    ٣
    4
    ٤
    5
    ٥
    6
    ٦
    7
    ٧
    8
    ٨
    9
    ٩

That is reasonably good, isn’t it?

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#62

Your locale also resorted all the ASCII letters to make upper case letters come first.

I.e.

A
a

instead of

a
A

Lower case first is what comes from

copy "iso14651_t1"

When using CLDR for sorting, one can use an option
[caseFirst upper], see for example:

https://github.com/unicode-org/cldr/blob/master/common/collation/da.xml

glibc has no easy option to do that at the moment.

It is *possible* do sort A-Za-z differently in your locale *but*
if you do that you will get a weird order for all Latin characters you forget.
I.e. if you do not include äÄ in your sort order as well, they would still sort
lower case first. It is a lot of work to do this correctly for *all* Latin characters without a convenient option like CLDR’s [caseFirst upper],
I would recommend not doing that if it is not absolutely required.

Revision history for this message

In Sourceware.org Bugzilla #9809, Aras (aras-noori) wrote on 2020-01-13:

#63

(In reply to Mike FABIAN from comment #53)
> Your locale also resorted all the ASCII letters to make upper case letters
> come first.
>
> I.e.
>
> A
> a
>
> instead of
>
> a
> A
>
> Lower case first is what comes from
>
> copy "iso14651_t1"
>
> When using CLDR for sorting, one can use an option
> [caseFirst upper], see for example:
>
> https://github.com/unicode-org/cldr/blob/master/common/collation/da.xml
>
> glibc has no easy option to do that at the moment.
>
> It is *possible* do sort A-Za-z differently in your locale *but*
> if you do that you will get a weird order for all Latin characters you
> forget.
> I.e. if you do not include äÄ in your sort order as well, they would still
> sort
> lower case first. It is a lot of work to do this correctly for *all* Latin
> characters without a convenient option like CLDR’s [caseFirst upper],
> I would recommend not doing that if it is not absolutely required.

Hello Fabian,
thanks to your suggestions and notice. You are right with sorting (aA) as well with Numbers, this should be modified.
The kurdish alphabet order is:

ئ
U+0626

ا
U+0627

ب
U+0628

پ
U+067E

ت
U+062A

ج
U+062C

چ
U+0686

ح
U+062D

خ
U+062E

د
U+062F

ر
U+0631

ڕ
U+0695

ز
U+0632

ژ
U+0698

س
U+0633

ش
U+0634

ع
U+0639

غ
U+063A

ف
U+0641

ڤ
U+06A4

ق

U+0642

ک
U+06A9

گ
U+06AF
ل
U+0644
ڵ
U+06B5
م
U+0645
ن
U+0646
و
U+0648
ۆ
U+06C6
ھ
U+0647
ە
U+06D5
ی
U+06CC
ێ
U+06CE

Revision history for this message

In Sourceware.org Bugzilla #9809, Jwtiyar Nariman (jwtiyar) wrote on 2020-01-13:

#64

thank you mike you is really appreciated i have pointed all my answers according to your question and suggestion to our locale as follow:

1. For positive sign and negative i agree with you let it be + and - .
2. For regular expression i didn't know how to type it in my language hope to hekp me solve this.
we have "ب" for Y in English and "ن" for N in English .
3.You right we type Iraq in Kurdish(Sorani) now changed.
4.We have Kurdish alphabet as Aras Noori wrote before my reply and i look at iso14651_t1 now all characters which is used in Kurdish are exist, these characters that you did add them are from Arabic language not Kurdish.

Can you send the .dat file with your last changes?

Best Regards

Revision history for this message

In Sourceware.org Bugzilla #9809, Jwtiyar Nariman (jwtiyar) wrote on 2020-01-13:

#65

Thank you mike you your help is really appreciated
I have pointed all my answers according to your question and suggestion to our locale as follow:

1. For positive sign and negative i agree with you let it be + and - .
2. For regular expression i didn't know how to type it in my language hope to hekp me solve this.
we have "ب" for Y in English and "ن" for N in English .
3.You right we type Iraq in Kurdish(Sorani) now changed.
4.We have Kurdish alphabet as Aras Noori wrote before my reply and i look at iso14651_t1 now all characters which is used in Kurdish are exist, these characters that you did add them are from Arabic language not Kurdish.

Can you send the .dat file with your last changes?

Best Regards

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#66

> thanks to your suggestions and notice. You are right with sorting (aA) as
> well with Numbers, this should be modified.

So sorting

a
A

and

0
٠
1
١
...

is OK? I hope so ...

> The kurdish alphabet order is:

To achieve that order, this is enough:

copy "iso14651_t1"

reorder-after <S0631> % ر
<S0695> % ڕ

   reorder-after <S0646> % ن
   <S0648> % و
   <S06C6> % ۆ

I added the test file ckb_IQ.UTF-8.in in my patch, this file is sorted
using the rules of my patched ckb_IQ locale, the sorted result should
be the same as the original file, otherwise the test fails.

As the test passes, the above collation rules work and achieve the
order as in the ckb_IQ.UTF-8.in test file.

I’ll paste this test file here again for your easy refererence:

0
٠
1
١
2
٢
3
٣
4
٤
5
٥
6
٦
7
٧
8
٨
9
٩
a
A
b
B
c
C
d
D
e
E
f
F
g
G
h
H
i
I
j
J
k
K
l
L
m
M
n
N
o
O
p
P
q
Q
r
R
s
S
t
T
u
U
v
V
w
W
x
X
y
Y
z
Z
ئ
ا
ب
پ
ت
ج
چ
ح
خ
د
ر
ڕ
ز
ژ
س
ش
ع
غ
ف
ڤ
ق
ک
گ
ل
ڵ
م
ن
و
ۆ
ه
ە
ی
ێ

Other characters not in this test file are sorted according to the defaults from

copy "iso14651_t1"

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#67

(In reply to Jwtiyar Nariman from comment #56)
> Thank you mike you your help is really appreciated
> I have pointed all my answers according to your question and suggestion to
> our locale as follow:
>
> 1. For positive sign and negative i agree with you let it be + and - .

Your original locale had the positive sign empty.
Probably a mistake. So I’ll make it + now.

> 2. For regular expression i didn't know how to type it in my language hope
> to hekp me solve this.
> we have "ب" for Y in English and "ن" for N in English .

That is what I used:

yesexpr "^[+1yY<U0628>]"
noexpr "^[-0nN<U0646>]"

So these regular expressions except +, 1, y, Y, and ب as a yes answer.
And -, 0, n, N, and ن as a no answer.

> 3.You right we type Iraq in Kurdish(Sorani) now changed.
> 4.We have Kurdish alphabet as Aras Noori wrote before my reply and i look at
> iso14651_t1 now all characters which is used in Kurdish are exist, these
> characters that you did add them are from Arabic language not Kurdish.

I don’t understand. Most of these characters are used both in Arabic *and* Kurdish.

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#68

Created attachment 12194
ckb_IQ

> Can you send the .dat file with your last changes?

Here is the latest file with the changes I made.
I just added the + as the positive_sign.

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#69

Created attachment 12195
0001-Add-ckb_IQ-locale.patch

Updated patch.

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#70

Created attachment 12196
0002-Fix-ckb_IQ-Add-ckb_IQ-to-SUPPORTED-file-Add-ckb_IQ.U.patch

Updated patch.

Revision history for this message

In Sourceware.org Bugzilla #9809, Jwtiyar Nariman (jwtiyar) wrote on 2020-01-13:

#71

(In reply to Mike FABIAN from comment #57)
> > thanks to your suggestions and notice. You are right with sorting (aA) as
> > well with Numbers, this should be modified.
>
> So sorting
>
> a
> A
>
> and
>
> 0
> ٠
> 1
> ١
> ...
>
> is OK? I hope so ...
>
> > The kurdish alphabet order is:
>
> To achieve that order, this is enough:
>
> copy "iso14651_t1"
>
> reorder-after <S0631> % ر
> <S0695> % ڕ
>
> reorder-after <S0646> % ن
> <S0648> % و
> <S06C6> % ۆ
>
> I added the test file ckb_IQ.UTF-8.in in my patch, this file is sorted
> using the rules of my patched ckb_IQ locale, the sorted result should
> be the same as the original file, otherwise the test fails.
>
> As the test passes, the above collation rules work and achieve the
> order as in the ckb_IQ.UTF-8.in test file.
>
> I’ll paste this test file here again for your easy refererence:
>
> 0
> ٠
> 1
> ١
> 2
> ٢
> 3
> ٣
> 4
> ٤
> 5
> ٥
> 6
> ٦
> 7
> ٧
> 8
> ٨
> 9
> ٩
> a
> A
> b
> B
> c
> C
> d
> D
> e
> E
> f
> F
> g
> G
> h
> H
> i
> I
> j
> J
> k
> K
> l
> L
> m
> M
> n
> N
> o
> O
> p
> P
> q
> Q
> r
> R
> s
> S
> t
> T
> u
> U
> v
> V
> w
> W
> x
> X
> y
> Y
> z
> Z
> ئ
> ا
> ب
> پ
> ت
> ج
> چ
> ح
> خ
> د
> ر
> ڕ
> ز
> ژ
> س
> ش
> ع
> غ
> ف
> ڤ
> ق
> ک
> گ
> ل
> ڵ
> م
> ن
> و
> ۆ
> ه
> ە
> ی
> ێ
>
> Other characters not in this test file are sorted according to the defaults
> from
>
> copy "iso14651_t1"

Sorting is good now, but adding these
reorder-after <S0631> % ر
> <S0695> % ڕ
>
> reorder-after <S0646> % ن
> <S0648> % و
> <S06C6> % ۆ
iam not understanding because for example this " <S0695> % ڕ " how you order it?

Revision history for this message

In Sourceware.org Bugzilla #9809, Mike FABIAN (mike-fabian) wrote on 2020-01-13:

#72

(In reply to Jwtiyar Nariman from comment #62)

> > Other characters not in this test file are sorted according to the defaults
> > from
> >
> > copy "iso14651_t1"
>
> Sorting is good now, but adding these
> reorder-after <S0631> % ر
> > <S0695> % ڕ
> >
> > reorder-after <S0646> % ن
> > <S0648> % و
> > <S06C6> % ۆ
> iam not understanding because for example this " <S0695> % ڕ " how you
> order it?

copy "iso14651_t1"

contains

copy "iso14651_t1_common"

and some modifications which affect only Chinese and Japanese.

So we look into the iso14651_t1_common file to see what the default sort order is.

We find for example:

...
<S0631> % ARABIC LETTER REH
<S0632> % ARABIC LETTER ZAIN
<S0691> % ARABIC LETTER RREH
<S0692> % ARABIC LETTER REH WITH SMALL V
<S0693> % ARABIC LETTER REH WITH RING
<S0694> % ARABIC LETTER REH WITH DOT BELOW
<S0695> % ARABIC LETTER REH WITH SMALL V BELOW
<S0696> % ARABIC LETTER REH WITH DOT BELOW AND DOT ABOVE
...

Looking at this you see that ڕ U+0695 ARABIC LETTER REH WITH SMALL V BELOW
is sorted right after ڔ U+0694 ARABIC LETTER REH WITH DOT BELOW by default.
That is not what you want for Kurdish. For Kurdish, you want
ڕ U+0695 ARABIC LETTER REH WITH SMALL V BELOW to be sorted right after
ر U+0631 ARABIC LETTER REH.

This is achieved by the rule:

reorder-after <S0631> % ر
<S0695> % ڕ

Which removes U+0695 from its default position in the sort order
and inserts it again after U+0631.

reorder-after <S0646> % ن
<S0648> % و
<S06C6> % ۆ

does a similar thing to change the sorting of U+0648 and U+06C6.

To find out which of these rules I need, I created the ckb_IQ.UTF-8.in
test file first and wrote the Kurdish characters in the order you wanted
into that file.

Then I ran a test sort using a ckb_IQ locale which had *only*

LC_COLLATE
copy "iso14651_t1"
END LC_COLLATE

and *nothing* else.

The test sort showed that only U+0695, U+0648, and U+06C6 were sorted incorrectly.
All other characters from your list of Kurdish characters were sorted correctly
already. So I needed only to add rules to fix the sort order for these 3 characters.

You can see the same by just reading the iso14651_t1_common and find out which
of the Kurdish characters are already in the correct order in that file and which are not.
You have to do nothing for the characters which are already in correct order.
For the characters which are in a wrong position in iso14651_t1_common, you add
rules like

reorder-after <... collating-symbol after which to reorder ...>
<... the collating-symbol which should be reordered ...>

I found writing the test file and checking which characters are sorted
wrongly by default easier than staring at iso14651_t1_common. And it
is a good idea to have the test file anyway to make sure that the
Kurdish sort order always stays correct when something is changed in
glibc. If we have the test file, we will notice when some change causes a problem.

(In reply to Jwtiyar Nariman from comment #62)

> > Other characters not in this test file are sorted according to the defaults
> > from
> > 
> >     copy "iso14651_t1"
> 
> Sorting is good now, but adding these 
>   reorder-after <S0631> % ر
> >    <S0695> % ڕ
> >    
> >    reorder-after <S0646> % ن
> >    <S0648> % و
> >    <S06C6> % ۆ 
> iam not understanding because for example this " <S0695>  % ڕ   " how you
> order it?

copy "iso14651_t1"

contains

copy "iso14651_t1_common"

and some modifications which affect only Chinese and Japanese.

So we look into the iso14651_t1_common file to see what the default sort order is.

We find for example:

...
<S0631> % ARABIC LETTER REH
<S0632> % ARABIC LETTER ZAIN
<S0691> % ARABIC LETTER RREH
<S0692> % ARABIC LETTER REH WITH SMALL V
<S0693> % ARABIC LETTER REH WITH RING
<S0694> % ARABIC LETTER REH WITH DOT BELOW
<S0695> % ARABIC LETTER REH WITH SMALL V BELOW
<S0696> % ARABIC LETTER REH WITH DOT BELOW AND DOT ABOVE
...

Looking at this you see that ڕ U+0695 ARABIC LETTER REH WITH SMALL V BELOW
is sorted right after ڔ U+0694 ARABIC LETTER REH WITH DOT BELOW by default.
That is not what you want for Kurdish. For Kurdish, you want
ڕ U+0695 ARABIC LETTER REH WITH SMALL V BELOW to be sorted right after
ر U+0631 ARABIC LETTER REH.

This is achieved by the rule:

reorder-after <S0631> % ر
<S0695> % ڕ

Which removes U+0695 from its default position in the sort order
and inserts it again after U+0631.

reorder-after <S0646> % ن
<S0648> % و
<S06C6> % ۆ

does a similar thing to change the sorting of U+0648 and U+06C6.

To find out which of these rules I need, I created the ckb_IQ.UTF-8.in
test file first and wrote the Kurdish characters in the order you wanted
into that file.

Then I ran a test sort using a ckb_IQ locale which had *only*

LC_COLLATE
copy "iso14651_t1"
END LC_COLLATE

and *nothing* else.

The test sort showed that only U+0695, U+0648, and U+06C6 were sorted incorrectly.
All other characters from your list of Kurdish characters were sorted correctly
already. So I needed only to add rules to fix the sort order for these 3 characters.

You can see the same by just reading the iso14651_t1_common and find out which
of the Kurdish characters are already in the correct order in that file and which are not.
You have to do nothing for the characters which are already in correct order.
For the characters which are in a wrong position in iso14651_t1_common, you add
rules like

reorder-after <... collating-symbol after which to reorder ...>
<... the collating-symbol which should be reordered ...>

I found writing the test file and checking which characters are sorted
wrongly by default easier than staring at iso14651_t1_common.  And it
is a good idea to have the test file anyway to make sure that the
Kurdish sort order always stays correct when something is changed in
glibc. If we have the test file, we will notice when some change causes a problem.

Bug Watch Updater (bug-watch-updater) on 2020-01-14

Changed in glibc:
status:	Incomplete → In Progress

Revision history for this message

In Sourceware.org Bugzilla #9809, Jwtiyar Nariman (jwtiyar) wrote on 2020-01-14:

#73

Download full text (3.4 KiB)

Thank you very much dear mike i got it, you made a great job, thanks again.
So now every thing is ready to be accepted in glibc.

Best Regards (In reply to Mike FABIAN from comment #63)
> (In reply to Jwtiyar Nariman from comment #62)
>
> > > Other characters not in this test file are sorted according to the defaults
> > > from
> > >
> > > copy "iso14651_t1"
> >
> > Sorting is good now, but adding these
> > reorder-after <S0631> % ر
> > > <S0695> % ڕ
> > >
> > > reorder-after <S0646> % ن
> > > <S0648> % و
> > > <S06C6> % ۆ
> > iam not understanding because for example this " <S0695> % ڕ " how you
> > order it?
>
> copy "iso14651_t1"
>
> contains
>
> copy "iso14651_t1_common"
>
> and some modifications which affect only Chinese and Japanese.
>
> So we look into the iso14651_t1_common file to see what the default sort
> order is.
>
> We find for example:
>
> ...
> <S0631> % ARABIC LETTER REH
> <S0632> % ARABIC LETTER ZAIN
> <S0691> % ARABIC LETTER RREH
> <S0692> % ARABIC LETTER REH WITH SMALL V
> <S0693> % ARABIC LETTER REH WITH RING
> <S0694> % ARABIC LETTER REH WITH DOT BELOW
> <S0695> % ARABIC LETTER REH WITH SMALL V BELOW
> <S0696> % ARABIC LETTER REH WITH DOT BELOW AND DOT ABOVE
> ...
>
> Looking at this you see that ڕ U+0695 ARABIC LETTER REH WITH SMALL V BELOW
> is sorted right after ڔ U+0694 ARABIC LETTER REH WITH DOT BELOW by default.
> That is not what you want for Kurdish. For Kurdish, you want
> ڕ U+0695 ARABIC LETTER REH WITH SMALL V BELOW to be sorted right after
> ر U+0631 ARABIC LETTER REH.
>
> This is achieved by the rule:
>
> reorder-after <S0631> % ر
> <S0695> % ڕ
>
> Which removes U+0695 from its default position in the sort order
> and inserts it again after U+0631.
>
> reorder-after <S0646> % ن
> <S0648> % و
> <S06C6> % ۆ
>
> does a similar thing to change the sorting of U+0648 and U+06C6.
>
> To find out which of these rules I need, I created the ckb_IQ.UTF-8.in
> test file first and wrote the Kurdish characters in the order you wanted
> into that file.
>
> Then I ran a test sort using a ckb_IQ locale which had *only*
>
> LC_COLLATE
> copy "iso14651_t1"
> END LC_COLLATE
>
> and *nothing* else.
>
> The test sort showed that only U+0695, U+0648, and U+06C6 were sorted
> incorrectly.
> All other characters from your list of Kurdish characters were sorted
> correctly
> already. So I needed only to add rules to fix the sort order for these 3
> characters.
>
> You can see the same by just reading the iso14651_t1_common and find out
> which
> of the Kurdish characters are already in the correct order in that file and
> which are not.
> You have to do nothing for the characters which are already in correct order.
> For the characters which are in a wrong position in iso14651_t1_common, you
> add
> rules like
>
> reorder-after <... collating-symbol after which to reorder ...>
> <... the collating-symbol which should be reordered ...>
>
> I found writing the test file and checking which characters are sorted
> wrongly by default easier than staring at iso14651_t1_common. And it
> is a good idea to have the test file anyway to make sure that the
> Kurdish sort order always stays c...

Ubuntu
langpack-locales package

Request for new language packages for Kurdish Sorani (ckb)

Bug Description

Related branches

Other bug subscribers

Patches

Bug attachments

Remote bug watches

	Status	Importance	Assigned to
GLibC	Fix Released	Wishlist	sourceware-bugs #9809
langpack-o-matic	Fix Released	Undecided	Gunnar Hjalmarsson
langpack-locales (Ubuntu)	Fix Released	Medium	Gunnar Hjalmarsson

Ubuntulangpack-locales package

Request for new language packages for Kurdish Sorani (ckb)

Bug Description

Related branches

Other bug subscribers

Patches

Bug attachments

Remote bug watches

Ubuntu
langpack-locales package