There seems to be a consensus of opinion that the encoding part of locale names, that are assigned to the LANG or LC_* environment variables, should be .UTF-8 rather than .utf8. I'm currently working on language-selector and GDM with other language/locale related matters, so I can include the necessary changes in a couple of merge proposals in pipeline. Before I do so, and since I don't have an own idea on to which extent the changes would create new issues, I'd like that someone triages the bug with respect to language-selector and gdm (ubuntu). I'd also need help to draw a conclusion from the reasoning below. On 2011-01-25 12:54, Colin Watson wrote: > ... software that parses locale strings in ways that only handle > particular spellings of them tend to be buggy in other ways. For > example, such buggy software can easily fail to handle LANG=en_IN as > a UTF-8 locale, even though it's defined as such in > /usr/share/i18n/SUPPORTED (the .UTF-8 suffix is mainly for dealing > with locales that previously had a non-UTF-8 version, and some newer > locales just went UTF-8 from the start). Doesn't that point towards simply appending .UTF-8 to e.g. en_IN, irrespective of the name according to /usr/share/i18n/SUPPORTED? I did this test: [gunnar@gunnar-laptop ~/sandbox]$ sh $ cat mytest.po msgid "hello" msgstr "hello from India" $ dir=/usr/share/locale/en_IN/LC_MESSAGES $ sudo mkdir -p $dir $ sudo msgfmt mytest.po -o $dir/mytest.mo $ LANGUAGE='' $ LC_MESSAGES=en_IN $ echo $( gettext -d mytest hello ) hello from India $ LC_MESSAGES=en_IN.utf8 $ echo $( gettext -d mytest hello ) hello from India $ LC_MESSAGES=en_IN.UTF-8 $ echo $( gettext -d mytest hello ) hello from India $ exit [gunnar@gunnar-laptop ~/sandbox]$ No complaints, and gettext found the Indian 'translation' in all three cases, so en_IN.UTF-8 seems to work. Or would that name cause other apps to fail? > Getting back to the original patch, the general idea seems OK to me, > but I think it would be helpful for it to take a slightly different > approach to implementation. Rather than just appending .UTF-8, I > suggest searching /usr/share/i18n/SUPPORTED for a suitable match for > the language, country, and variant which has "UTF-8" as the second > column. That way, language-selector will always select the canonical > user-visible name for the locale, even if it's one of the > interesting cases such as en_IN where the canonical name doesn't have > an encoding suffix. Even if we would go for the canonical names, I don't think it's necessary to parse /usr/share/i18n/SUPPORTED. [gunnar@gunnar-laptop ~]$ locale -a | grep -F en_IN en_IN en_IN.utf8 [gunnar@gunnar-laptop ~]$ As you can see, the special case en_IN is represented by two items in the 'locale -a' output. We ought to be able to make use of that info. This example shows how the English locale names might be grabbed: [gunnar@gunnar-laptop ~]$ sh $ tmp=$( locale -a | grep -xvE C\|POSIX ) $ no_enc=$( echo "$tmp" | grep -vF .utf8 ) $ for locale in $( echo "$tmp" | grep -F .utf8 | sed 's/\.utf8//' ) > do > if ! expr $locale : en > /dev/null ; then > continue > elif expr "$no_enc" : .*$locale > /dev/null ; then > echo $locale > else > echo $( echo $locale | sed -r 's/([^@]+)/\1.UTF-8/' ) > fi > done en_AG en_AU.UTF-8 en_BW.UTF-8 en_CA.UTF-8 en_DK.UTF-8 en_GB.UTF-8 en_HK.UTF-8 en_IE.UTF-8 en_IN en_NG en_NZ.UTF-8 en_PH.UTF-8 en_SG.UTF-8 en_US.UTF-8 en_ZA.UTF-8 en_ZW.UTF-8 $ exit [gunnar@gunnar-laptop ~]$ As you can see, English locale names for Antigua/Barbuda and Nigeria are the same kind of special cases as en_IN.