Min Nan Chinese not available in Language Support

Bug #236028 reported by Kaihsu Tai on 2008-05-30
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
langpack-locales (Ubuntu)
Undecided
Martin Pitt
language-selector (Ubuntu)
Undecided
Arne Goetje

Bug Description

The Min Nan Chinese translation https://launchpad.net/~ubuntu-l10n-nan had started back in Gutsy and carried on in Hardy at https://translations.launchpad.net/ubuntu/hardy/+lang/nan

But today I checked Hardy's System: Administration: Language Support and could not find the entry for Min Nan Chinese.

By the way, the alternative names Bân-lâm-gú and Hō-ló-oē should also be listed.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dereck Wonnacott wrote:
> ** Changed in: language-selector (Ubuntu)
> Sourcepackagename: None => language-selector
>
There is no option to use Minnan, because there doesn't exist any locale
for Minnan.

Further more, as Minnan can be used with Latin script and also with Hant
script, we should discuss if it would be better to create two locales
and two sets of translations and let the users choose.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFITqNCbp/QbmhdHowRAr9/AKCZ9UAbQDMKCjyBmHWACFxflsX2XQCfY9yP
MIUs88xivWyCROq3SCZ7VoQ=
=mNcx
-----END PGP SIGNATURE-----

Kaihsu Tai (kaihsu) wrote :

Arne:
> There is no option to use Minnan, because there doesn't exist any locale for Minnan.
Are there any instructions on how to create a locale?

> Further more, as Minnan can be used with Latin script and also with Hant script, we should discuss if it would be better to create two locales and two sets of translations and let the users choose.

That is fine, but so far the translation effort has been focused on the Latin script variant only, and I would be reluctant to distract the colleagues with yet another script. By the way, perhaps these two variants can be coded as nan-Latn and nan-Hant (per ISO standards; adjust as appropriate). (Hant = traditional Hàn (Chinese) script.)

Arne Goetje (arnegoetje) on 2008-06-13
Changed in language-selector:
assignee: nobody → arnegoetje
status: New → Confirmed
Arne Goetje (arnegoetje) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kaihsu Tai wrote:
> Arne:
>> There is no option to use Minnan, because there doesn't exist any locale for Minnan.
> Are there any instructions on how to create a locale?

it's not that complicated, but the locale would need to be submitted to
upstream (glibc).

I can create one for you, if you help me to translate some terms for it:

 * abday: Sun, Mon, Tue, Wed, Thu, Fri, Sat
 * day: Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday
 * abmon: Jan. Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec
 * mon: January, February, March, April, May, June, July, August,
September, October, November, December
 * am_pm: am, pm
 * yesexpr: ^[yY].*
 * noexpr: ^[nN].*
 * yesstr: Yes
 * nostr: No
 * name_miss: Miss.
 * name_mr: Mr.
 * name_mrs: Mrs.
 * name_ms: Ms.
 * country_name: Taiwan

Further more I need to know the collation of the alphabet:
I believe that it's a-z plus o͘, where o͘ gets sorted after o.

>> Further more, as Minnan can be used with Latin script and also with
> Hant script, we should discuss if it would be better to create two
> locales and two sets of translations and let the users choose.
>
> That is fine, but so far the translation effort has been focused on the
> Latin script variant only, and I would be reluctant to distract the
> colleagues with yet another script. By the way, perhaps these two
> variants can be coded as nan-Latn and nan-Hant (per ISO standards;
> adjust as appropriate). (Hant = traditional Hàn (Chinese) script.)

Agree. So, if possible please translate the above strings into Minnan
using both, POJ and Hant scripts. Then we can have two locales,
nan_TW@Latin and nan_TW@Hant.
Do we need to create a locale for China? If yes, which script do they
use? I believe they don't use POJ (AFAIK POJ has been introduced by some
church when they baptized Taiwanese Minnan speakers, but I'm not sure if
it had made it's way back to Southern China...), but some Han script,
although I'm not sure if they use Hant or Hans...

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIUe7bbp/QbmhdHowRAqxUAJ96xietq+Hs0K4AoiEWKTJbtL9rxQCg0vvW
YLz8dCst6lbooaWC/eW7ruo=
=rOST
-----END PGP SIGNATURE-----

Arne Goetje, 2008-06-13 03:51:55-0000:
> I can create one for you, if you help me to translate some terms for it:
>
> * abday: Sun, Mon, Tue, Wed, Thu, Fri, Sat
> * day: Sunday, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday
> * abmon: Jan. Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec
> * mon: January, February, March, April, May, June, July, August,
> September, October, November, December
> * am_pm: am, pm
> * yesexpr: ^[yY].*
> * noexpr: ^[nN].*
> * yesstr: Yes
> * nostr: No
> * name_miss: Miss.
> * name_mr: Mr.
> * name_mrs: Mrs.
> * name_ms: Ms.
> * country_name: Taiwan

I will get onto this when I have time. Or perhaps some of my
colleagues can help too (they have been added to watch this
bug). Please post contributions here (to this bug).

> Further more I need to know the collation of the alphabet:
> I believe that it's a-z plus o͘, where o͘ gets sorted after o.

That is one acceptable solution, but it is not the
traditional sorting order, which is something like:
a b (c) ch chh (d) e (f) g h ...

> Agree. So, if possible please translate the above strings
> into Minnan using both, POJ and Hant scripts. Then we can
> have two locales, nan_TW@Latin and nan_TW@Hant.

I still think we should concentrate on nan_TW@Latn (‘Latn’
per ISO 15924, not ‘Latin’) first.

> Do we need to create a locale for China? If yes, which
> script do they use? I believe they don't use POJ (AFAIK
> POJ has been introduced by some church when they baptized
> Taiwanese Minnan speakers, but I'm not sure if it had made
> it's way back to Southern China...), but some Han script,
> although I'm not sure if they use Hant or Hans...

Again, I think we should do nan_TW@Latn first. As they say,
‘premature optimization (categorization) is the root of all
evil’. (未成年就這麼優,是一切邪惡的根源。 /
未熟調教是一切邪惡的根源。) ☺

Kaihsu Tai (kaihsu) wrote :

 * abday: lé-pài, pài-1, pài-2, pài-3, pài-4, pài-5, pài-6
 * day: lé-pài-ji̍t, pài-it, pài-jī, pài-saⁿ, pài-sì, pài-gō͘, pài-la̍k
 * abmon: 1-goe̍h, 2-goe̍h, 3-goe̍h, 4-goe̍h, 5-goe̍h, 6-goe̍h, 7-goe̍h, 8-goe̍h, 9-goe̍h, 10-goe̍h, 11-goe̍h, 12-goe̍h
 * mon: chiaⁿ-goe̍h, jī-goe̍h, saⁿ-goe̍h, cha̍p-jī-goe̍h, sì-goe̍h, gō͘-goe̍h, la̍k-goe̍h, chhit-goe̍h, peh-goe̍h, káu-goe̍h, cha̍p-goe̍h, cha̍p-it-goe̍h, cha̍p-jī-goe̍h
 * am_pm: téng-po·, ē-po·
 * yesexpr: ^[sS].*
 * noexpr: ^[mM].*
 * yesstr: Sī
 * nostr: M̄-sī
 * name_miss: ko͘-niû
 * name_mr: sian-siⁿ
 * name_mrs: lú-sū
 * name_ms: sió-chiá
 * country_name: Tâi-oân

Arne Goetje (arnegoetje) wrote :

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kaihsu Tai wrote:
>> Further more I need to know the collation of the alphabet:
>> I believe that it's a-z plus o͘, where o͘ gets sorted after o.
>
> That is one acceptable solution, but it is not the
> traditional sorting order, which is something like:
> a b (c) ch chh (d) e (f) g h ...

ch and chh are sorted already after c according to the a-z rule... so,
no need to put them in explicitly.
I see the only additional characters are o͘ and ⁿ. So, sort them after o
and n respectively?

>> Agree. So, if possible please translate the above strings
>> into Minnan using both, POJ and Hant scripts. Then we can
>> have two locales, nan_TW@Latin and nan_TW@Hant.
>
> I still think we should concentrate on nan_TW@Latn (‘Latn’
> per ISO 15924, not ‘Latin’) first.

Other locales use @Latin, they don't follow ISO 15924.
(We could also use @POJ for that matter, to let room for other Latin
based transliterations... ;) )
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIUp/abp/QbmhdHowRAr+lAKDvZrmMpHEmawwPZwPqhGj1EsAPZwCfVmVN
iXjjZwbEId2JV/eBzyTy+Vk=
=NYc0
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kaihsu Tai wrote:
> * am_pm: téng-po·, ē-po·
                        ^^ ^^
I assume that is supposed to be 006F,0358 instead of 006F,00B7 ? Or is
it supposed to be a tone mark?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIUqCbbp/QbmhdHowRAqUjAKDT4lrubyTeUtTe2/PcwdxzVMWfxwCfbVJK
frVFb89/8GUV+qSxxMjGYFI=
=WVo8
-----END PGP SIGNATURE-----

Arne Goetje, 2008-06-13 16:27:06-0000:
> I see the only additional characters are o͘ and ⁿ. So, sort
> them after o and n respectively?

I think ‘ⁿ’ should be sorted after ‘z’ rather than between
‘n’ and ‘o’, but I have not thought this through. Let’s sort
it after ‘z’ and see what happens.

> >> Agree. So, if possible please translate the above
> >> strings into Minnan using both, POJ and Hant scripts.
> >> Then we can have two locales, nan_TW@Latin and
> >> nan_TW@Hant.
> >
> > I still think we should concentrate on nan_TW@Latn
> > (‘Latn’ per ISO 15924, not ‘Latin’) first.
>
> Other locales use @Latin, they don't follow ISO 15924.

Let’s follow ISO 15924 for now and use ‘nan_TW@Latn’.

Arne Goetje, 2008-06-13 16:30:19-0000:
> Kaihsu Tai wrote:
> > * am_pm: téng-po·, ē-po·
> ^^ ^^
> I assume that is supposed to be 006F,0358 instead of 006F,00B7 ? Or is
> it supposed to be a tone mark?

Argh, the evil U+00B7 crept in. Of course they should be
U+0358, so
* am_pm: téng-po͘, ē-po͘

By the way, in formulating these answers, I referred to
http://zh-min-nan.wikipedia.org/wiki/Special:AllMessages
where some evil U+00B7 fossils are still lingering.

Arne Goetje (arnegoetje) on 2008-06-17
Changed in langpack-locales:
status: Confirmed → In Progress
Arne Goetje (arnegoetje) wrote :

ok, I have one problem with the current data:

The abday and abmon fields: If you look at two typical applications which use these (cal on command line and the calendar applet on gnome), you will find that the calendar applet only uses the first three letters as an abbrevation, while 'cal' even only uses the first two letters.

What's more is that the default date format is: weekday month day.

So, with the current data, cal would display the weekdays as: lé pà pà pà pà pà pà
and the calendar applet in gnome would use: "pài 8-g 28" for todays date.

One possibility would be to use:
 abday "ji̍t, it, jī, saⁿ, sì, gō͘, la̍k" or "lé, it, jī, saⁿ, sì, gō͘, la̍k"
 abmon "01g, 02g, 03g, 04g, 05g, 06g, 07g, 08g, 09g, 10g, 11g, 12g"

Other suggestions?

Despite of this, I have the locale ready for upload.

Kaihsu Tai (kaihsu) wrote :

Thanks for looking into this.

Arne Goetje, 2008-08-28 01:20:06-0000:
> One possibility would be to use:
> abday "ji̍t, it, jī, saⁿ, sì, gō͘, la̍k" or "lé, it, jī, saⁿ, sì, gō͘, la̍k"
> abmon "01g, 02g, 03g, 04g, 05g, 06g, 07g, 08g, 09g, 10g, 11g, 12g"
> Other suggestions?

I suggest:
abday "l-p, p-1, p-2, p-3, p-4, p-5, p-6"
abmon "1g, 2g, 3g, 4g, 5g, 6g, 7g, 8g, 9g, 10g, 11g, 12g"

> Despite of this, I have the locale ready for upload.

Again, many thanks.

--
1948-12-10 Universal Declaration of Human Rights http://everyhumanhasrights.org

Kaihsu Tai (kaihsu) wrote :

Arne Goetje wrote on 2008-08-28:
> Despite of this, I have the locale ready for upload.

Hi Arne, would you please upload the locale? Cheers.

Arne Goetje (arnegoetje) wrote :

I have a new package in my ppa.

Please go to System -> Administration -> Software Sources and in the Third-Party Software add the following link:

deb http://ppa.launchpad.net/arnegoetje/ubuntu intrepid main

Then please update the locales package and help testing:

sudo locale-gen nan_TW@Latn
LANG=nan_TW@Latn locale -ck LC_ADDRESS

do the last step for every LC_* environment variable (except LC_ALL).

Please report any issues back here, so that I can fix them.

For reference, the RCF, which describes the locale data is here:
http://anubis.dkuug.dk/jtc1/sc22/wg20/docs/n897-14652w25.pdf

kiatgak (kiatgak) wrote :

Bravo!

After

--------------
sudo locale-gen nan_TW@Latn
--------------

I tested LC_TIME with the following commands:

--------------
export LC_TIME=nan_TW@Latn [Enter]
--------------

(1) local command
--------------
locale -ck LC_TIME [Enter]

abday="lp;p1;p2;p3;p4;p5;p6"
day="lé-pài-ji̍t;pài-it;pài-jī;pài-saⁿ;pài-sì;pài-gō͘;pài-la̍k"
abmon="1g;2go;3go;4go;5go;6go;7go;8go;9go;10g;11g;12g"
mon="chiaⁿ-goe̍h;jī-goe̍h;saⁿ-goe̍h;sì-goe̍h;gō͘-goe̍h;la̍k-goe̍h;chhit-goe̍h;peh-goe̍h;káu-goe̍h;cha̍p-goe̍h;cha̍p-it-goe̍h;cha̍p-jī-goe̍h"
am_pm="téng-po͘;ē-po͘"
d_t_fmt="%a %d %b %Y %r %Z"
d_fmt="%m/%d/%y"
t_fmt="%r"
t_fmt_ampm="%I:%M:%S %p"
era=
era_year=""
era_d_fmt=""
alt_digits=
era_d_t_fmt=""
era_t_fmt=""
time-era-num-entries=0
time-era-entries="l"
week-ndays=7
week-1stday=19971130
week-1stweek=0
first_weekday=1
first_workday=1
cal_direction=1
timezone=""
date_fmt="%a %b %e %H:%M:%S %Z %Y"
time-codeset="UTF-8"
-------------------
in which:

abmon="1g;2go;3go;4go;5go;6go;7go;8go;9go;10g;11g;12g"

should be:

abmon "1g;2g;3g;4g;5g;6g;7g;8g;9g;10g;11g;12g"

or

abmon "01g;02g;03g;04g;05g;06g;07g;08g;09g;10g;11g;12g"

(2) date command:

---------------
date [Enter]

p4 10g 23 21:49:52 CST 2008
----------------

the output (date_fmt?) should be

2008 10g 23 p4 21:49:52 TST

or something like

2008 10g 23 (p4) 21:49:52 TST

or something like

2008-10-23 (4) 21:49:52 TST

in which, the timezone (LC_TIME timezone?) should be TST (Taiwan Standard Time; please refer to http://www.cwb.gov.tw/) rather than CST.

(3) date +%x command:

---------------
date +%x [Enter]

10/23/08
----------------

the output (d_fmt?) should be

2008-10-23

or

2008 10g 23

(4) date +%c command:

---------------
date +%c [Enter]

p4 23 10g 2008 10:09:51 ē-po͘ CST
----------------

similiar problem like date command

the output (d_t_fmt?) should be something like

2008 10g 23 (p4) 22:09:51 TST

Thanks!

kiatgak wrote:
>
> abmon="1g;2go;3go;4go;5go;6go;7go;8go;9go;10g;11g;12g"
>
> should be:
>
> abmon "1g;2g;3g;4g;5g;6g;7g;8g;9g;10g;11g;12g"

fixed

>
> the output (date_fmt?) should be
>
> 2008 10g 23 p4 21:49:52 TST
>
> or something like
>
> 2008 10g 23 (p4) 21:49:52 TST

fixed

> in which, the timezone (LC_TIME timezone?) should be TST (Taiwan
> Standard Time; please refer to http://www.cwb.gov.tw/) rather than CST.

that's not that easy. The timezone information seems to be set in the
tzdata package... I will investigate that. For now it has to stay CST.
No other way.

> ---------------
> date +%x [Enter]
>
> 10/23/08
> ----------------
>
> the output (d_fmt?) should be
>
> 2008-10-23

fixed

> ---------------
> date +%c [Enter]
>
> p4 23 10g 2008 10:09:51 ē-po͘ CST
> ----------------
>
> similiar problem like date command
>
> the output (d_t_fmt?) should be something like
>
> 2008 10g 23 (p4) 22:09:51 TST

fixed (except TST/CST as commented above)

new package is in my ppa. (2.7.9-6~ppa03)

Please test if you find more errors. Thanks.

Kaihsu Tai (kaihsu) wrote :

I am still using Hardy Heron, but will test once Intrepid Ibex gets released and I upgrade. Thanks, Arne!

kiatgak (kiatgak) wrote :

Thanks, Arne!

One more question about "mon" in LC_TIME.

in locale nan_TW@Latn:
-------------------
cal -3 [Enter]

   káu-goe̍h 2008 cha̍p-goe̍h 2008 cha̍p-it-goe̍h 2008
lp p1 p2 p3 p4 p5 p6 lp p1 p2 p3 p4 p5 p6 lp p1 p2 p3 p4 p5 p6
    1 2 3 4 5 6 1 2 3 4 1
 7 8 9 10 11 12 13 5 6 7 8 9 10 11 2 3 4 5 6 7 8
14 15 16 17 18 19 20 12 13 14 15 16 17 18 9 10 11 12 13 14 15
21 22 23 24 25 26 27 19 20 21 22 23 24 25 16 17 18 19 20 21 22
28 29 30 26 27 28 29 30 31 23 24 25 26 27 28 29
                                            30
---------------------

comparing to locale ja_JP.utf8:
--------------------
cal -3 [Enter]

      9月 2008 10月 2008 11月 2008
日 月 火 水 木 金 土 日 月 火 水 木 金 土 日 月 火 水 木 金 土
    1 2 3 4 5 6 1 2 3 4 1
 7 8 9 10 11 12 13 5 6 7 8 9 10 11 2 3 4 5 6 7 8
14 15 16 17 18 19 20 12 13 14 15 16 17 18 9 10 11 12 13 14 15
21 22 23 24 25 26 27 19 20 21 22 23 24 25 16 17 18 19 20 21 22
28 29 30 26 27 28 29 30 31 23 24 25 26 27 28 29
                                            30
-----------------------
In Japanese locale, "mon" uses digits (e.g. 11) rather than written in kana (じゅういち). That seems more concise.

In locale nan_TW@Latn, would

mon="1goe̍h;2goe̍h;3goe̍h;4goe̍h;5goe̍h;6goe̍h;7goe̍h;8goe̍h;9goe̍h;10goe̍h;11goe̍h;12goe̍h"

be more appropriate than

mon="chiaⁿ-goe̍h;jī-goe̍h;saⁿ-goe̍h;sì-goe̍h;gō͘-goe̍h;la̍k-goe̍h;chhit-goe̍h;peh-goe̍h;káu-goe̍h;cha̍p-goe̍h;cha̍p-it-goe̍h;cha̍p-jī-goe̍h"

?

This problem needs Kaihsu's confirm. (And Arne's technical review?) Thanks!

In message <email address hidden> Bug 236028
<email address hidden> writes:
> In locale nan_TW@Latn, would
>
>
mon="1goeÌ&#65533;h;2goeÌ&#65533;h;3goeÌ&#65533;h;4goeÌ&#65533;h;5goeÌ&#65533;h;6goeÌ&#65533;h;7goeÌ&#65533;h;8goeÌ&#65533;h;9goeÌ&#65533;h;10goeÌ&#65533;h;11goeÌ&#65533;h;12goeÌ&#65533;h"
>
> be more appropriate than
>
>
mon="chiaâ&#65533;¿-goeÌ&#65533;h;jÄ«-goeÌ&#65533;h;saâ&#65533;¿-goeÌ&#65533;h;sì-goeÌ&#65533;h;gÅ&#65533;͘-goeÌ&#65533;h;laÌ&#65533;k-goeÌ&#65533;h;chhit-
> goeÌ&#65533;h;peh-goeÌ&#65533;h;káu-goeÌ&#65533;h;chaÌ&#65533;p-goeÌ&#65533;h;chaÌ&#65533;p-it-goeÌ&#65533;h;chaÌ&#65533;p-jÄ«-goeÌ&#65533;h"
>
> ?
>
> This problem needs Kaihsu's confirm. (And Arne's technical review?)

Sure.

Kaihsu Tai wrote:

>> This problem needs Kaihsu's confirm. (And Arne's technical review?)
>
> Sure.
>

done.

Version 2.7.9-6~ppa04 in my PPA.

kiatgak (kiatgak) wrote :

One problem about LC_MESSAGES:
--------------
locale -ck LC_MESSAGES [Enter]

LC_MESSAGES
yesexpr="^[>sS].*"
noexpr="^[mM].*"
yesstr="Sī"
nostr="M̄-Sī"
messages-codeset="UTF-8"
--------------

In "yesexpr", there is an extra character ">".

kiatgak (kiatgak) wrote :

One question about LC_IDENTIFICATION:
-------------------
locale -ck LC_IDENTIFICATION [Enter]

LC_IDENTIFICATION
title="Minnan language locale for Taiwan"
source=""
address=""
contact="Arne Goetje"
<email address hidden>"
tel=""
fax=""
language="Minnan"
territory="Taiwan"
audience=""
application=""
abbreviation=""
revision="0.1"
date="2008-06-16"
category="trv_TW:2000;UTF-8;;;;;;;;;;;"
identification-codeset="UTF-8"
-------------------

In "category", should "trv_TW" be "nan_TW"? Please check, thanks!

kiatgak, 2008-10-27 06:39:27-0000:
> One question about LC_IDENTIFICATION:
> -------------------
> locale -ck LC_IDENTIFICATION [Enter]
>
> LC_IDENTIFICATION
> title="Minnan language locale for Taiwan"
> source=""
> address=""
> contact="Arne Goetje"
> <email address hidden>"
> tel=""
> fax=""
> language="Minnan"
> territory="Taiwan"
> audience=""
> application=""
> abbreviation=""
> revision="0.1"
> date="2008-06-16"
> category="trv_TW:2000;UTF-8;;;;;;;;;;;"
> identification-codeset="UTF-8"
> -------------------
>
> In "category", should "trv_TW" be "nan_TW"? Please check, thanks!

Indeed it should be ‘nan_TW’.
http://www.ethnologue.com/show_language.asp?code=nan

The code ‘trv_TW’ is for Truku (= Taroko), it appears.
http://www.ethnologue.com/show_language.asp?code=trv
http://www.sherin.in/index.php/2007/12/31/enrico-zini-happy-new-year/
Great that work is done in this language also!

Pêng-an.

--
2008-10-24/30 Disarmament Week http://www.un.org/disarmament/

Kaihsu Tai (kaihsu) wrote :

Off topic: I just found this page. Great! http://wiki.debian.org/I18n/TaiwanIndigenousLanguages

Kaihsu Tai wrote:
> Off topic: I just found this page. Great!
> http://wiki.debian.org/I18n/TaiwanIndigenousLanguages
>
yep, you just found my page. :) I'm in the process of collecting data
for their locales and registering them upstream. The goal is to support
all actively spoken local languages in Taiwan.

Kaihsu Tai wrote:
> kiatgak, 2008-10-27 06:39:27-0000:
>> One question about LC_IDENTIFICATION:
>> -------------------
>> locale -ck LC_IDENTIFICATION [Enter]
>>
>> LC_IDENTIFICATION
>> title="Minnan language locale for Taiwan"
>> source=""
>> address=""
>> contact="Arne Goetje"
>> <email address hidden>"
>> tel=""
>> fax=""
>> language="Minnan"
>> territory="Taiwan"
>> audience=""
>> application=""
>> abbreviation=""
>> revision="0.1"
>> date="2008-06-16"
>> category="trv_TW:2000;UTF-8;;;;;;;;;;;"
>> identification-codeset="UTF-8"
>> -------------------
>>
>> In "category", should "trv_TW" be "nan_TW"? Please check, thanks!
>
> Indeed it should be ‘nan_TW’.

yes, my mistake. I was editing both trv and nan locales at the same time
and just copied the metadata over... will fix it soon.

> http://www.ethnologue.com/show_language.asp?code=nan
>
> The code ‘trv_TW’ is for Truku (= Taroko), it appears.
> http://www.ethnologue.com/show_language.asp?code=trv
> http://www.sherin.in/index.php/2007/12/31/enrico-zini-happy-new-year/
> Great that work is done in this language also!

While we are at it:
1. ethnologue is not the authoritative data regarding language codes.
That's ISO 639-3.
2. There has been a proposal filed at ISO 639-3 to split the nan code
into dzu and xim, reflecting the Chaozhou and Xiamen dialects. As they
group the Taiwan dialect into the Xiamen group, we should probably use
xim_TW in future... ? (
http://www.sil.org/iso639-3/chg_detail.asp?id=2008-083&lang=nan )

Kaihsu Tai (kaihsu) wrote :

Hi Arne,

Excellent!!

I did some work on these languages back in the 1990s. See
for example my old paper
http://web.archive.org/web/19970808073150/http://www.taiwanese.com/tp/tpsurvey/tpsurvey.pdf
(Scary to think that it was more than 10 years ago!)

Please do talk with the Presbyterian Church in Taiwan and
the Bible Society in Taiwan, for the sake of the sustainable
viability of this work.

Well done!!

Arne Goetje, 2008-10-27 14:18:55-0000:
> Kaihsu Tai wrote:
> > Off topic: I just found this page. Great!
> > http://wiki.debian.org/I18n/TaiwanIndigenousLanguages
>
> yep, you just found my page. :) I'm in the process of
> collecting data for their locales and registering them
> upstream. The goal is to support all actively spoken local
> languages in Taiwan.

--
2008-10-24/30 Disarmament Week http://www.un.org/disarmament/

Kaihsu Tai (kaihsu) wrote :

Arne Goetje, 2008-10-27 14:27:23-0000:
> Kaihsu Tai wrote:
> > http://www.ethnologue.com/show_language.asp?code=nan
>
> While we are at it:
> 1. ethnologue is not the authoritative data regarding language codes.
> That's ISO 639-3.

Sorry, my fault. But the web version of Ethologue does show
ISO 639-3 code on the top of the language page.

> 2. There has been a proposal filed at ISO 639-3 to split the nan code
> into dzu and xim, reflecting the Chaozhou and Xiamen dialects. As they
> group the Taiwan dialect into the Xiamen group, we should probably use
> xim_TW in future... ? (
> http://www.sil.org/iso639-3/chg_detail.asp?id=2008-083&lang=nan )

Indeed the language as spoken in Taiwan is in the Xiamen
(Amoy = Ē-mn̂g) group. But we should only change the code
from ‘nan’ to whatever when it becomes officially approved.

Cheers!

--
2008-10-24/30 Disarmament Week http://www.un.org/disarmament/

Kaihsu Tai (kaihsu) wrote :

Arne Goetje, 2008-10-27 14:27:23-0000:
> 2. There has been a proposal filed at ISO 639-3 to split
> the nan code into dzu and xim, reflecting the Chaozhou and
> Xiamen dialects. As they group the Taiwan dialect into the
> Xiamen group, we should probably use xim_TW in future... ?
> (
> http://www.sil.org/iso639-3/chg_detail.asp?id=2008-083&lang=nan
> )

Thanks for alerting me to this. I do not support the new
code ‘xim’ and found that my name has been misused in the
proposal. I have submitted a correction request.

--
2008-10-24/30 Disarmament Week http://www.un.org/disarmament/

Arne Goetje (arnegoetje) wrote :

thanks for the link... btw: new package is in my PPA.

Arne Goetje (arnegoetje) on 2009-01-19
Changed in language-selector:
assignee: nobody → arnegoetje
Martin Pitt (pitti) wrote :

This does not have anything to do with the language selector, I believe. Please reopen if it does.

It seems you have a half-way working locale now? Could you please send that to

  http://sourceware.org/bugzilla/buglist.cgi?product=glibc&component=localedata

for upstream review/inclusion? Thank you!

Changed in language-selector:
status: New → Invalid
Arne Goetje (arnegoetje) wrote :

locale committed to upstream as nan_TW@latin. Convention is that @ variations need to be lower case and full length spelling.

Changed in langpack-locales:
status: In Progress → Fix Committed
Martin Pitt (pitti) on 2009-06-17
Changed in langpack-locales (Ubuntu):
assignee: Arne Goetje (arnegoetje) → Martin Pitt (pitti)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package langpack-locales - 2.9+git20090617-1

---------------
langpack-locales (2.9+git20090617-1) karmic; urgency=low

  * Update to current upstream glibc git head localedata:
    - Adds nan_TW@latin. (LP: #236028)
  * Drop patches accepted upstream:
    - es_CO-papersize.patch
    - iso14651_kra_sorting.patch
    - mt_MT-Awwissu-spelling.patch
  * Move locale.alias.5 to debian/local/, since it's not from upstream.
  * Add debian/patches/debian-*: Import localedata patches from Debian's
    eglibc source package, as far as they apply. This is easier to maintain
    than applying them inline, since the primary code base is now [e]glibc
    upstream.
  * Rename our Ubuntu patches to ubuntu-*.
  * debian/rules: Install a temporary localedata -> . symlink, so that the
    Debian patches apply.
  * Add ubuntu-wal_ET-SUPPORTED.patch: Add missing wal_ET entry to SUPPORTED.
    (LP: #362726) (The other missing entries got fixed by applying the Debian
    patches properly now)
  * Don't directly install SUPPORTED in debian/install any more, but in
    debian/rules, and apply some seddery to convert the format to the one we
    are currently using.
  * debian/local/test-locales: Update to get along with upstream format of
    SUPPORTED.
  * Bump Standards-Version to 3.8.1 and debhelper compat to 5.
  * debian/control: Add missing ${misc:Depends}.

 -- Martin Pitt <email address hidden> Wed, 17 Jun 2009 11:05:25 +0200

Changed in langpack-locales (Ubuntu):
status: Fix Committed → Fix Released
iōngchun (iongchun) wrote :

Does this locale generated correctly?
I try it in karmic, gdm set GDM_LANG and LANG to <email address hidden>, which seems correct.
But every application using locale complains, for example:

iongchun@ubuntu64:~$ perl
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = "<email address hidden>",
        LC_ALL = (unset),
        LANG = "<email address hidden>"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

And output of 'locale-gen' reads:
iongchun@ubuntu64:~$ sudo locale-gen
Generating locales...
Generating locales...
  cs_CZ.UTF-8... up-to-date
  de_AT.UTF-8... up-to-date
...
  nan_TW.UTF-8@latin... up-to-date
...
  zh_TW.UTF-8... up-to-date
Generation complete.

The locale code nan_TW.UTF-8@latin looks not correct to me.

Martin Pitt (pitti) wrote :

Right, it isn't. This is because you set LANGUAGE, not LANG. Use LANG and not LANGUAGE, and all is well (see "man 3 gettext" for details).

iōngchun (iongchun) wrote :

Not sure what should I with LANG or LANGUAGE, but what I see is:

iongchun@ubuntu64:~$ unset LANGUAGE LANG LC_ALL
iongchun@ubuntu64:~$ date
Wed Oct 21 14:35:02 CST 2009
iongchun@ubuntu64:~$ <email address hidden> date
Wed Oct 21 14:35:15 CST 2009
iongchun@ubuntu64:~$ LANG=nan_TW.UTF-8@latin date
2009 10g 21 (p3) 14:35:23 CST

Martin Pitt (pitti) wrote :

> 2009 10g 21 (p3) 14:35:23 CST

Does this actually make any sense?

Kaihsu Tai (kaihsu) wrote :

> > 2009 10g 21 (p3) 14:35:23 CST

> Does this actually make any sense?

Yes, this is an appropriate short form of date in this language, except that CST probably should be TAIST.

iōngchun (iongchun) wrote :

Okay, now I understand nan_TW.UTF-8@latin should be the correct format,
so it is GDM that incorrectly set GDM_LANG, LANG and LANGUAGE to <email address hidden>, right?

iongchun [2009-10-24 2:43 -0000]:
> Okay, now I understand nan_TW.UTF-8@latin should be the correct format,
> so it is GDM that incorrectly set GDM_LANG, LANG and LANGUAGE to <email address hidden>, right?

I don't think that it's gdm which sets it. It should be set on install
time in /etc/default/locale . What's in this file for you?

iōngchun (iongchun) wrote :

The file (which I guest it should be /etc/defaults/locale) does not exist for me.

I upgrade this system from jaunty to karmic, and install "Chinese, Min Nan" in "Language Support".
But after that I don't see the language appearing in GDM login screen.
So I use "Language Support" again to set "Chinese, Min Nan" as "for my menus and windows",
then it leads to the situation here.

As a workaround I set GDM_LANG to nan_TW.UTF-8@latin in my $HOME/.profile, and it works,
so I guess it is a problem of GDM.

Martin Pitt (pitti) wrote :

iongchun [2009-10-25 1:43 -0000]:
> The file (which I guest it should be /etc/defaults/locale) does not
No, /etc/default/locale

> exist for me.

Hm, the graphical installer writes it. How did you install Jaunty back
then?

iōngchun (iongchun) wrote :

Right, it is /etc/default/locale, I am sorry that I login into the wrong VM to check it.
The content is:

LANG="en_US.UTF-8"
LANGUAGE="en_US:en"

iōngchun (iongchun) wrote :

A fresh installed karmic system has these problems too:
1. have to run locale-gen manually after add the language from "Language Support"
2. GDM_LANG, LANG and LANGUAGE are all set to <email address hidden>
3. another language which also has @latin variant, sr_RS, seems work fine, but GDM_LANG, LANG and LANGUAGE are set without encoding, i.e. sr_RS@latin instead of sr_RS.UTF-8@latin

Martin Pitt (pitti) wrote :

Arne, does the language selector write the wrong locale there?

Changed in language-selector (Ubuntu):
status: Invalid → Confirmed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package language-selector - 0.5.0

---------------
language-selector (0.5.0) lucid; urgency=low

  * new UI to allow separate language and locale settings (LP: #40669)
    (LP: #210776) (LP: #226155)
  * fixed locale generation code (LP: #236028)
 -- Arne Goetje <email address hidden> Thu, 25 Feb 2010 05:51:33 +0800

Changed in language-selector (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers