Localization does not respect LC_MESSAGES

Bug #650910 reported by Mikko Rantalainen on 2010-09-29
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenShot Video Editor
Low
Andy Finch

Bug Description

Openshot Video Editor incorrectly uses Finnish user interface localization in following environment:

LC_ADDRESS=fi_FI.UTF-8
LC_COLLATE=fi_FI.UTF-8
LC_CTYPE=fi_FI.UTF-8
LC_MEASUREMENT=fi_FI.UTF-8
LC_MESSAGES=en_DK.UTF-8
LC_MONETARY=fi_FI.UTF-8
LC_NAME=fi_FI.UTF-8
LC_NUMERIC=en_DK.UTF-8
LC_PAPER=fi_FI.UTF-8
LC_TELEPHONE=fi_FI.UTF-8
LC_TIME=en_DK.UTF-8

As far as I know, LC_MESSAGES should specify user interface text language.

Does it work correctly when launching OpenShot with the following command:

$ LANG=fi_FI openshot

Also, I noticed that Finnish has not been completely translated. If you are
feeling generous and have some extra time, please help improve our
translations:
https://translations.launchpad.net/openshot/trunk/+pots/openshot/fi/+translate.
 Thanks!

On Wed, Sep 29, 2010 at 3:07 AM, Mikko Rantalainen <
<email address hidden>> wrote:

> Public bug reported:
>
> Openshot Video Editor incorrectly uses Finnish user interface
> localization in following environment:
>
> LC_ADDRESS=fi_FI.UTF-8
> LC_COLLATE=fi_FI.UTF-8
> LC_CTYPE=fi_FI.UTF-8
> LC_MEASUREMENT=fi_FI.UTF-8
> LC_MESSAGES=en_DK.UTF-8
> LC_MONETARY=fi_FI.UTF-8
> LC_NAME=fi_FI.UTF-8
> LC_NUMERIC=en_DK.UTF-8
> LC_PAPER=fi_FI.UTF-8
> LC_TELEPHONE=fi_FI.UTF-8
> LC_TIME=en_DK.UTF-8
>
> As far as I know, LC_MESSAGES should specify user interface text
> language.
>
> ** Affects: openshot
> Importance: Undecided
> Status: New
>
> --
> Localization does not respect LC_MESSAGES
> https://bugs.launchpad.net/bugs/650910
> You received this bug notification because you are a member of OpenShot
> Developers, which is subscribed to OpenShot Video Editor.
>
> Status in OpenShot Video Editor: New
>
> Bug description:
> Openshot Video Editor incorrectly uses Finnish user interface localization
> in following environment:
>
> LC_ADDRESS=fi_FI.UTF-8
> LC_COLLATE=fi_FI.UTF-8
> LC_CTYPE=fi_FI.UTF-8
> LC_MEASUREMENT=fi_FI.UTF-8
> LC_MESSAGES=en_DK.UTF-8
> LC_MONETARY=fi_FI.UTF-8
> LC_NAME=fi_FI.UTF-8
> LC_NUMERIC=en_DK.UTF-8
> LC_PAPER=fi_FI.UTF-8
> LC_TELEPHONE=fi_FI.UTF-8
> LC_TIME=en_DK.UTF-8
>
> As far as I know, LC_MESSAGES should specify user interface text language.
>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openshot.developers
> Post to : <email address hidden>
> Unsubscribe : https://launchpad.net/~openshot.developers
> More help : https://help.launchpad.net/ListHelp
>

Mikko Rantalainen (mira) wrote :

$ LANG=fi_FI openshot

(process:20523): Gtk-WARNING **: Locale not supported by C library.
 Using the fallback 'C' locale.
--------------------------------
   OpenShot (version 1.2.2)
--------------------------------
Process no longer exists: 20245. Creating new pid lock file.
Traceback (most recent call last):
  File "/usr/bin/openshot", line 57, in <module>
    main()
  File "/usr/lib/pymodules/python2.6/openshot/openshot.py", line 52, in main
    locale.setlocale(locale.LC_ALL, '')
  File "/usr/lib/python2.6/locale.py", line 513, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting

$

Mikko Rantalainen (mira) wrote :

$ locale -a
C
en_AG
en_AU.utf8
en_BW.utf8
en_CA.utf8
en_DK.utf8
en_GB.utf8
en_HK.utf8
en_IE.utf8
en_IN
en_NG
en_NZ.utf8
en_PH.utf8
en_SG.utf8
en_US.utf8
en_ZA.utf8
en_ZW.utf8
fi_FI.utf8
POSIX

Mikko Rantalainen (mira) wrote :

Note that I'm trying to get user interface messages in English but following stuff in Finnish:
- [snail mail] addresses
- sorting order
- terminal character codes (CTYPE)
- measurement (SI standard instead of imperial brain damage)
- money (euro)
- Finnish (person?) names
- A4 paper
- Finnish telephone number grouping

Note that I want following stuff in en_DK:
- messages (user interface, should be identical to en_UK and sometimes identical en_US)
- numbers (desimal separator should be "." instead of traditional Finnish ",")
- time (ISO 8601 instead of traditional Finnish rendering)

I have LANG set to "en_DK.utf8" and LC_ALL is not defined.

Olivier Girard (eolinwen) wrote :

Hi,
Strange. Have you think to reconfigure your locale ?
Do that (if you want) :
sudo gedit /var/lib/locales/supported.d/local
Add the line (for you !) fi_FI ISO-8859-1 then save the modifications.
Do the reconfiguration by :
sudo dpkg-reconfigure locales
Launch openshot normaly and after (if it is always not good) by LANG=fi_FI openshot
And tell us what happen ?
Thanks.

Alfred Carlsson (codac) wrote :

I think I have the same or a tightly related problem.
My problem arise when I set the System language to English but the currency/date/etc to Swedish.

In Ubuntu 10.04 I change these settings through the System menu option: "System -> Administration -> Language Support"

In the "Language & Text" windows that pops up I set "Language for menus and windows" to English (United Kingdom)
Choosing the next tab, Text, I set the option "Display numbers, dates, currency amounts in the usual format for:" Svenska (Sverige)" That's Swedish for "Swedish (Sweden)"

I also apply this system wide using the button in the setting window.

The result is that some text in Openshot is in English and some in Swedish, as this image shows.
http://yfrog.com/n8openshotlanguagemixedp

The serious effect is that the plugins does not show up and it is not possible to add them through the menu options.
Like in this image: http://yfrog.com/4yopenshot122notransitionp

If the system setting are set to all English or all Swedish the plugins are there and work just fine. When I set the system language to English and the locale(?) to Swedish like I describe above the plugins are no longer displayed in Openshot.

(also described in this post: http://openshotusers.com/forum/viewtopic.php?p=2684#p2684)

Mikko Rantalainen (mira) wrote :

cenwen: The problem is not that
LANG=fi_FI openshot
does not work. The problem is that openshot does not respect LC_MESSAGES. The only way I can get US English messages is to launch
LC_ALL=en_US.UTF-8 openshot
It should be possible to use just
LC_MESSAGES=en_US.UTF-8 openshot
to get UI in US English with everything else kept as I've otherwise specified (e.g. Finnish alphabetical sorting order among other things).

Every other piece of software does this except for openshot. For example,
LC_MESSAGES=fi_FI.UTF-8 pico
will launch pico with Finnish user interface and
LC_MESSAGES=en_UK.UTF-8 pico
will use UK English user interface without changing anything else.

(Also note that I'm fine with the fact that I don't have a locale called "fi_FI". Instead, I have a locale called "fi_FI.UTF-8", which is fine for me and I've set my LC_* environment variables as I like. See the bug description for details.)

Mikko Rantalainen (mira) wrote :

As far as I know, the environment variables should be interpreted as follows:

LANG - default language for all LC_* variables in case the specific variable is not defined

LC_ADDRESS - snail mail address formatting rules
LC_COLLATE - alphabetical sort order and rules for deciding if two letters are considered identical or not
LC_CTYPE - character classification and case conversions
LC_MEASUREMENT - measurements (e.g. metric vs. imperial)
LC_MESSAGES - USER INTERFACE MESSAGES AND DIAGNOSTICS (this bug!)
LC_MONETARY - display rules for money or currency related stuff
LC_NAME=fi_FI.UTF-8 - gender related prefixes such as "Mr" and "Mrs"
LC_NUMERIC=en_DK.UTF-8 - rendering of numerics (e.g. "1,000,000.00" vs "1 000 000,00") other than monetary
LC_PAPER=fi_FI.UTF-8 - paper size (e.g. letter vs A4)
LC_TELEPHONE=fi_FI.UTF-8 - telephone number formatting rules
LC_TIME=en_DK.UTF-8 - time rendering rules

LC_ALL - override for any LANG and/or LC_* environment settings.

Olivier Girard (eolinwen) wrote :

You confirme my thought and the hypothese that i have emited on another bug about this sort of problems. It missed us the shebang in all our files. I'll look if i can do a patch for this.
Thanks a lot for the feedback.

Changed in openshot:
milestone: none → 1.3.0
Olivier Girard (eolinwen) wrote :

I have created a patch for trying to fix this.
In fact, it provides two things.
First, normally, it will fixe the problem with the python path using the only good method.
Two i tent to introduce the utf/unicode norm but i am not sure of the result. I am afraid that this will not be sufficient but it will not be doing wrong. I was blaffed by the fact that all the files have not the sheban and i don't know what doing.

Olivier Girard (eolinwen) wrote :
Mikko Rantalainen (mira) wrote :

Your patch seems to have
#!/usr/bin/ env python
when it should have
#!/usr/bin/env python
(note that there's no space between slash and "env").

About "encoding". I'm not sure why your patch adds that to every file. It's only needed if your literal source code contains UTF-8 encoded characters. The fact, that your program deals with UTF-8 strings does not require UTF-8 encoding for the source code. See http://www.python.org/dev/peps/pep-0263/ for details.

Note that modules and libraries do not require shebang - only files that have executable bit set and are intented for direct execution require the shebang.

About the incorrect localization strings: it seems that the problem is in the openshot/language/Language_Init.py at code

                # Setup foreign language support
                langs = []
                lc, encoding = locale.getdefaultlocale()
                ...
                self.lang = gettext.translation("OpenShot", self.project.LOCALE_DIR, languages = langs, fallback = True)
The getdefaultlocale() gives lc="fi_FI" and encoding="UTF8" with my locale settings. However,
$ locale
LANG=en_DK.utf8
LC_CTYPE=fi_FI.UTF-8
LC_NUMERIC=en_DK.UTF-8
LC_TIME=en_DK.UTF-8
LC_COLLATE=fi_FI.UTF-8
LC_MONETARY=fi_FI.UTF-8
LC_MESSAGES=en_DK.UTF-8
LC_PAPER=fi_FI.UTF-8
LC_NAME=fi_FI.UTF-8
LC_ADDRESS=fi_FI.UTF-8
LC_TELEPHONE=fi_FI.UTF-8
LC_MEASUREMENT=fi_FI.UTF-8
LC_IDENTIFICATION="en_DK.utf8"
LC_ALL=

and gettext should get en_DK as specified by LC_MESSAGES.

In the end, gettext should not be initialized with the results from getdefaultlocale(). I'm not sure if this is because getdefaultlocale() is broken or because getdefaultlocale() is not mean for localization strings and the returned locale string is designed for something else but message localization.

Mikko Rantalainen (mira) wrote :

Python issue 813449 (locale.getdefaultlocale doesnt handle all locales gracefully) seems relevant
http://bugs.python.org/issue813449 (duplicate of http://bugs.python.org/issue504219)

According to it, "getdefaultlocale should not be used in new code." "If the intention is to compute the locale's encoding,
locale.getpreferredencoding should be used instead." Note that the getpreferredencoding() does not return locale setting, only the encoding.

Mikko Rantalainen (mira) wrote :

I guess that code
  lc, encoding = locale.getdefaultlocale()
should be replaced with something along the lines
  lc = locale.getlocale(locale.LC_MESSAGES)
  encoding = locale.getpreferredencoding()
Note that locale.getlocale() should not be called before the process has called locale.setlocale(locale.LC_ALL, "") [which, I think, is true in this case]. If encoding is not needed, the latter call may be obviously dropped.

Olivier Girard (eolinwen) wrote :

Your patch seems to have
#!/usr/bin/ env python
when it should have
#!/usr/bin/env python
(note that there's no space between slash and "env").

Effectively, yeah, i have done an error typing this on my keyboard. ouups. Sorry.
For encoding, that is the normal method who is learn at school and in the books. After two : -*- coding: utf8 -*- and -*- coding: utf-8 -*-

Personally, i prefer to use the first, this of Tarek Ziadé Programmation Python, Conception et Optimisation), disciple of GVR. I don't know if it is the good method but she can not be worse. The principal purpose is that ..................works for everybody.

Thanks to help me to evolve this nasty bug for resolving one time for that. I am not an expert but more a ....beginner.

I am looking all your answers, the links and on my side, i am swimming in the gettext python module and documentation. Perhaps it is not the good method that we used.
.

Olivier Girard (eolinwen) wrote :

Looking your answer, i have check my locale and i have a surprising result. Looks this :
olivier@Triton:~$ locale
LANG=fr_FR.utf8
LC_CTYPE="fr_FR.utf8"
LC_NUMERIC="fr_FR.utf8"
LC_TIME="fr_FR.utf8"
LC_COLLATE="fr_FR.utf8"
LC_MONETARY="fr_FR.utf8"
LC_MESSAGES="fr_FR.utf8"
LC_PAPER="fr_FR.utf8"
LC_NAME="fr_FR.utf8"
LC_ADDRESS="fr_FR.utf8"
LC_TELEPHONE="fr_FR.utf8"
LC_MEASUREMENT="fr_FR.utf8"
LC_IDENTIFICATION="fr_FR.utf8"
LC_ALL=

Something will be broken on your system ?
Somebody who have some problems can confirm this ?
I continue to search ......

Mikko Rantalainen (mira) wrote :

Check your /etc/default/locale - it should not have any comments on any line that defines any of the LC_* variables. I cannot currently find the right bug, but this is a known problem in parsing the file.

e.g if /etc/default/locale contains a line
LC_PAPER=fi_FI.UTF-8 # default to A4 paper
will cause LC_PAPER to be set to literal
"fi_FI.UTF-8"
instead of literal
fi_FI.UTF-8
(Note the extra quotation marks in the literal string.)

Olivier Girard (eolinwen) wrote :

Here, i have checked on another system (Marverick) and i have the same thing
olivier@mediacenter:~$ locale
LANG=fr_FR.utf8
LC_CTYPE="fr_FR.utf8"
LC_NUMERIC="fr_FR.utf8"
LC_TIME="fr_FR.utf8"
LC_COLLATE="fr_FR.utf8"
LC_MONETARY="fr_FR.utf8"
LC_MESSAGES="fr_FR.utf8"
LC_PAPER="fr_FR.utf8"
LC_NAME="fr_FR.utf8"
LC_ADDRESS="fr_FR.utf8"
LC_TELEPHONE="fr_FR.utf8"
LC_MEASUREMENT="fr_FR.utf8"
LC_IDENTIFICATION="fr_FR.utf8"
LC_ALL=
olivier@mediacenter:~$

and i have checked my locale like you have said and i have only that :
LANG="fr_FR.UTF-8"

So i don't understand why you have the default language on your system. Is it not all translated ?

Mikko Rantalainen (mira) wrote :

> So i don't understand why you have the default language on
> your system. Is it not all translated ?

Unfortunately, I cannot understand what you're trying to ask.

I'm living in Finland and my mother tongue is Finnish. However, I prefer English for the UI to better match UI of all the programs I use (some programs do not have Finnish localizations or the localizations are not high quality enough for my taste).

However, I'm trying to get ISO compatible experience as much as possible (including stuff like Monday is the first day of the week, ISO 8601 date formatting, etc. In addition, I don't want "continental decimal separator" for numbers despite the fact that it's the "correct" representation for FI locale and I prefer the Finnish collation order over some other choices). Hence the locale settings I have set.

The problem I'm reporting is that with my locale settings, OpenShot is using Finnish translations for some of it's strings despite the fact that I'm requesting English messages (see LC_MESSAGES environment variable). I guess it's incorrectly using one of the following environment variables to set the localization language for messages: LC_CTYPE, LC_COLLATE, LC_MONETARY, LC_PAPER, LC_NAME, LC_ADDRESS, LC_TELEPHONE, LC_MEASUREMENT. None of these should change the UI language.

Andy Finch (fincha) wrote :

I don't know where it's coming from, the only variables Openshot explicity sets are LC_ALL & LC_NUMERIC

Olivier Girard (eolinwen) wrote :

>I'm living in Finland and my mother tongue is Finnish. However, I prefer
English for the UI to better match UI of all the programs I use (some
programs do not have Finnish localizations or the localizations are not
high quality enough for my taste).
Okay so i understand better the result of your locale.

>The problem I'm reporting is that with my locale settings, OpenShot is
using Finnish translations for some of it's strings despite the fact
that I'm requesting English messages (see LC_MESSAGES environment
variable). I guess it's incorrectly using one of the following
environment variables to set the localization language for messages:
LC_CTYPE, LC_COLLATE, LC_MONETARY, LC_PAPER, LC_NAME, LC_ADDRESS,
LC_TELEPHONE, LC_MEASUREMENT. None of these should change the UI
language.
Have you try to change your default setting (just to see) for your locale and see if all is translate in Openshot ? It will be helpful to know if it is a " general" bug or a "particular" bug link to you locale preferences.

I am agree with you about the method getprefereencoding should be better for your case that the method getdefaultlocale. It will change automaticaly the locale in the locale by default i.e. English.
More explanation here :
http://docs.python.org/library/gettext.html
http://docs.python.org/library/locale.html
Have you try to modified the lc, encoding = locale.getdefaultlocale() by lc, encoding = locale.getpreferredencoding()
Just another thought.

Olivier Girard (eolinwen) wrote :

You could try another method which could be interesting for you .I have just seen that searching more explanation on the gdb utilisation : getfilesystemencoding(), always instead of getdefaultlocale.

Mikko Rantalainen (mira) wrote :

Executing
$ LC_ALL=fi_FI.UTF-8 openshot
gives me UI mostly in Finnish (some parts are missing translation but it seems that those are not translated instead of using incorrect locale). Output to stdout and stderr are a mixture of Finnish and English even with this locale setting.

$ LC_ALL=en_US.UTF-8 openshot
gives me UI fully in English as expected.

$ locale
LANG=en_DK.utf8
LC_CTYPE=fi_FI.UTF-8
LC_NUMERIC=en_DK.UTF-8
LC_TIME=en_DK.UTF-8
LC_COLLATE=fi_FI.UTF-8
LC_MONETARY=fi_FI.UTF-8
LC_MESSAGES=en_DK.UTF-8
LC_PAPER=fi_FI.UTF-8
LC_NAME=fi_FI.UTF-8
LC_ADDRESS=fi_FI.UTF-8
LC_TELEPHONE=fi_FI.UTF-8
LC_MEASUREMENT=fi_FI.UTF-8
LC_IDENTIFICATION="en_DK.utf8"
LC_ALL=
$ LC_CTYPE=en_US.UTF-8 openshot
gives me UI fully in English, which is not expected. It seems that OpenShot is incorrectly using LC_CTYPE for message localizations instead of LC_MESSAGES.

Steps to reproduce:
1) sudo apt-get install language-pack-fi language-pack-en
2) LC_ALL= LC_CTYPE=fi_FI.UTF-8 LC_MESSAGES=en_US.UTF-8 openshot
# notice the space after "LC_ALL=" to unset the LC_ALL environment variable

This will result in a mixture of Finnish and English. It seems that part of the UI is incorrectly using localization messages set by LC_CTYPE instead of LC_MESSAGES. LC_CTYPE is supposed to define character classes (which characters are numbers, which are letters, etc) and conversions (if lowercase of "SS" is "ss" (Finnish) or "ß" (German), for example).

The locale.getpreferredencoding() does not return a pair of locale and encoding, just the encoding. Locale should be queried with
    lc = locale.getlocale(locale.LC_MESSAGES)
which seems to return correct values for me (tried with python interpreter, not openshot).

It seems that locale.getdefaultlocale() returns value of LC_CTYPE in reality. That should be fine for decising the *encoding* for stdout and stderr but it's definitely not ok to use that to select language for message localization.

I'll repeat: use LC_CTYPE for output encoding and possible uppercase and lowercase conversions ("how unicode characters should be treated in this environment regardless of language of those characters"). However, use LC_MESSAGES for message localizations ("which unicode character string to output for a given message").

Also see comment 14.

Mikko Rantalainen (mira) wrote :

I rechecked some code and I think the correct fix is to change file
openshot/language/Language_Init.py
and the fix is to replace line
                lc, encoding = locale.getdefaultlocale()
with
                lc, encoding = locale.getlocale(locale.LC_MESSAGES)

Rationale:

It seems that return value of
                locale.getdefaultlocale()
is identical to
                locale.getlocale(locale.LC_CTYPE)
and as such it should not be used for localization messages as I explained earlier.

Note that there's an error in my comment #14 where I claim that locale.getlocale(locale.LC_MESSAGES) would return only locale. Instead, it always returns locale AND encoding, just like locale.getdefaultlocale().

Mikko Rantalainen (mira) wrote :

Also note that getfilesystemencoding() should be used only for conversion from internal unicode representation to filesystem native presentation and this conversion applies only to filenames, nothing else.

moimael (moimael) on 2010-12-20
Changed in openshot:
status: New → Confirmed
importance: Undecided → Low
Changed in openshot:
milestone: 1.3.0 → 1.4.0

mikko, excellent info and diagnosis!
can confirm this bug on my system, with a mix of de_AT localization and en_GB language. also, plugins don't show up, like alfred carlsson has seen.

LANG=en_GB.utf8
LANGUAGE=en_GB:en
LC_CTYPE=de_AT.UTF-8
LC_NUMERIC=en_GB.UTF-8
LC_TIME=de_DE.UTF-8
LC_COLLATE=de_AT.UTF-8
LC_MONETARY=de_AT.UTF-8
LC_MESSAGES=en_GB.UTF-8
LC_PAPER=de_AT.UTF-8
LC_NAME=de_AT.UTF-8
LC_ADDRESS=de_AT.UTF-8
LC_TELEPHONE=de_AT.UTF-8
LC_MEASUREMENT=de_AT.UTF-8
LC_IDENTIFICATION=de_AT.UTF-8
LC_ALL=

Andy Finch (fincha) on 2011-07-16
Changed in openshot:
status: Confirmed → In Progress
assignee: nobody → Andy Finch (fincha)
Andy Finch (fincha) on 2011-08-08
Changed in openshot:
status: In Progress → Fix Committed
Andy Finch (fincha) on 2011-09-23
Changed in openshot:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.