Member names do not support ǧ

Bug #341594 reported by Hendrik Maryns
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
GNU Mailman
Fix Released
Undecided
Mark Sapiro

Bug Description

In the administrator area, the members page, I wanted to enter the name of a subscriber. Her name is Çelikoǧlu. Note that the Turkish ǧ is not supported by most encodings. It gave an error in that nothing happened: her name wasn’t changed.

?? Doesn’t Mailman work with utf-8 ??

http://mailman.biohostnet.de/mailman/admin/mitarbeiter/members

Related branches

Revision history for this message
Mark Sapiro (msapiro) wrote :

If the list's preferred language is Turkish, the character set is iso-8859-9. The character set used depends on the language and is defined in the LC_DESCRIPTIONS dictionary which is initialized at the end of Defaults.py.

You can change the character set for Turkish to utf-8, but you also have to convert messages/tr/LC_MESSAGES/mailman.po and templates/tr/* to utf-8 and rebuild messages/tr/LC-Messages/mailman.mo.

Note however that in spite of any character set issues, you should have been able to enter a name with the ǧ and have it at least display as ǧ (it actually should display as a character, but xss protection goes overboard in escaping the '&').

Revision history for this message
Hendrik Maryns (hamaryns) wrote :

The list’s prefered language is German, since this is in Germany. However, in this globalized world, you just cannot assume all names are German, especially not names.

Why at all use iso-8859-9, or any other iso-xxxx for that matter? There is UTF-8 now, no need for it.

Note that I am not a sysadmin on the machine running Mailman, I am only a user administrator who is annoyed that I cannot enter the name of one of my collaborators. So thanks for your suggestions, but I won’t be bothering my web hoster with this stuff, unless you indicate that this indeed is due to a wrong setup on his part.

As I said: I was able to enter the name (of course, in Firefox), but submitting the changes did nothing, not even an error message. I do not understand your last sentence.

Revision history for this message
Mark Sapiro (msapiro) wrote :

The character set in Mailman for German is iso-8859-1. I agree with you that everything should be unicode (real names are currently stored in Mailman as unicodes) and character sets should be those such as utf-8 that can faithfully represent unicodes, and I think Mailman is moving in this direction, and the newer translations are utf-8 encoded, but there are still many languages that use iso-8859-x or other (euc-jp, euc-kr, koi8-r) encodings. It is primarily the translators who maintain these translations who determine the character set.

>As I said: I was able to enter the name (of course, in Firefox), but submitting the changes did nothing, not even an error message. I do not understand your last sentence.

There does seem to be a bug in that I can enter (on a German Language list) Çeliko-lu and submit changes and what I entered displays as Çeliko-lu. I can enter -elikoǧlu and submit changes and what I entered displays as -elikoǧlu, but if I enter Çelikoǧlu and submit changes I get a "Bug in Mailman version 2.1.12". I can work on fixing that, but that doesn't explain why your Mailman doesn't report the error unless your hosting service has somehow modified Mailman to supress this display.

And what I meant by the last sentence is that the display that I see (-eliko&#487;lu) should really display the browser's rendering of the HTML entity rather that the raw entity (&#487;), but there is code in Mailman to prevent this being used as an XSS attack (say by putting <script ... as part of the real name) and that code changes &#487; into &amp;#487; so you see &#487; instead of the browser's rendering of character 487.

Revision history for this message
Hendrik Maryns (hamaryns) wrote :

> There does seem to be a bug in that I can enter (on a German Language list) Çeliko-lu and submit changes and what I entered displays as Çeliko-lu. I can enter -elikoǧlu and submit changes and what I entered displays as -eliko&#487;lu, but if I enter Çelikoǧlu and submit changes I get a "Bug in Mailman version 2.1.12". I can work on fixing that, but that doesn't explain why your Mailman doesn't report the error unless your hosting service has somehow modified Mailman to supress this display.

This seems to indicate that it chooses an encoding depending on the characters inside. But since Ç and ǧ do not fit in the same iso-8859-xx encoding, it barfs. Strange, that it doesn’t consider UTF-8, then.

Strange that it doesn’t report anything, yes. Where are these reports to be found? Is there a log file?\

I found out it is version 2.1.9, btw.

Revision history for this message
Mark Sapiro (msapiro) wrote :

> This seems to indicate that it chooses an encoding depending on the characters inside. But since Ç and ǧ do not fit in the same iso-8859-xx encoding, it barfs. Strange, that it doesn’t consider UTF-8, then.

Not really. It tries to keep the name internally in unicode. The problem occurs in converting the name from the web browser to unicode when the input from the web browser contains an &#nnn; entity with nnn > 256 as well as other non-ascii characters. The code parses the input into pieces separated by HTML entities which it parses into characters, and then it uses the Python string join() method to put it all back together. join() has to coerce the string to unicode because of the >256 entity which has become a unicode. This coersion occurs using Python's default encoding which is normally 'ascii' and can only be changed in Python's site.py module.

I am currently testing the attached patch which I think will fix the bug.

>Strange that it doesn’t report anything, yes. Where are these reports to be found? Is there a log file?

It should report an error. The fact that it doesn't implies your hosting service has modified this mailman in some way. The error should also be logged in Mailman's 'error' log which could be anywhere, but in a default install is /usr/local/mailman/logs/error.

Mark Sapiro (msapiro)
Changed in mailman:
assignee: nobody → msapiro
status: New → Fix Committed
Mark Sapiro (msapiro)
Changed in mailman:
milestone: none → 2.1.13rc1
status: Fix Committed → Fix Released
Revision history for this message
Pander (pander) wrote :

In mailman 1:2.1.14-1ubuntu2 I get ? for ž and ë when I do list_members -f.

Revision history for this message
Pander (pander) wrote :

Ah, this is because sys.getdefaultencoding() returns 'ascii' That should be 'utf-8' although env | grep LANG returns LANG=en_US.UTF-8

Revision history for this message
Mark Sapiro (msapiro) wrote :

This is a Python thing. You can create, for example, /usr/lib/pythonv.v/site-packages/sitecustomize.py containing

import sys
sys.setdefaultencoding('utf8')

and this will set the default unicode encoding to utf8 as long as Python is run without the -S option. The sitecustomize.py module can be anywhere in the sys.path buily by site.py.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.