Locale names should always include the codeset component

Bug #1646260 reported by Gunnar Hjalmarsson
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
localechooser (Ubuntu)
Confirmed
High
Łukasz Zemczak
ubiquity (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

If you install Ubuntu in English with Tel Aviv as the timezone location, the installer figures out that the applicable locale is en_IL and adds the line

LANG="en_IL"

to /etc/default/locale.

en_IL is a perfectly fine locale name; actually it's *the* correct name of the English/Israel locale for UTF-8 according to SUPPORTED. However, Python does not agree. Python seems to generally presuppose that locale names include the codeset component, even if it accepts locale names without codeset if they are included in the hard coded dictionary locale_alias in /usr/lib/python3.5/locale.py. However, en_IL is a relatively new locale, and not (yet) included in locale_alias:

gunnar@gunnar-ubuntu-current:~$ python3
Python 3.5.2+ (default, Sep 22 2016, 12:18:14)
[GCC 6.2.0 20160927] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> locale.setlocale(locale.LC_CTYPE, 'en_IL')
'en_IL'
>>> mylocale = locale.getlocale(locale.LC_CTYPE)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.5/locale.py", line 577, in getlocale
    return _parse_localename(localename)
  File "/usr/lib/python3.5/locale.py", line 486, in _parse_localename
    raise ValueError('unknown locale: %s' % localename)
ValueError: unknown locale: en_IL
>>> quit()

I got to know about this issue via <http://askubuntu.com/q/854950>. Now, the problem is not limited to en_IL. New locales in glibc tend to be UTF-8 only locales without the codeset included in their names in SUPPORTED. glibc and Python will probably never be in sync.

One way to deal with this issue is to always add '.UTF-8' to such locale names. For instance, 'en_IL.UTF-8' is understood by both glibc and Python.

Probably this should be fixed in localechooser. Basically I'd like to see a code snippet along these lines:

if [ "$LOCALE" = "${LOCALE%.*}" ]; then
    LOCALE=$( echo $LOCALE | sed -r 's/([^@]+)/\1.UTF-8/' )
fi

I haven't prepared a patch, since I don't know where exactly it should be inserted without breaking anything else. (Don't know how to test it either.) Still hoping that somebody finds it important enough to fix.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in localechooser (Ubuntu):
status: New → Confirmed
Changed in ubiquity (Ubuntu):
status: New → Confirmed
Changed in localechooser (Ubuntu):
assignee: nobody → Łukasz Zemczak (sil2100)
Revision history for this message
Gunnar Hjalmarsson (gunnarhj) wrote :

@Łukasz: This issue keeps causing trouble for users.

https://askubuntu.com/q/1042915

Great if you have a chance to give it priority soon.

Changed in localechooser (Ubuntu):
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.