Find member does not match name in multibyte characters

Bug #1442298 reported by KOMEDA Shinji
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Mailman
Fix Released
Medium
Mark Sapiro

Bug Description

findmember encoding is not match menber name encoding.

Related branches

Revision history for this message
KOMEDA Shinji (komeda-shinji) wrote :
Revision history for this message
Mark Sapiro (msapiro) wrote :

I see the bug, but the patch doesn't fix the problem in all cases. I think a better patch is instead of

        regexp = regexp.decode()

to do

        regexp = regexp.decode(Utils.GetCharSet(mlist.preferred_language)).

I still need to do more testing, but I would like to know if providing this character set in this way still fixes the bug in your environment.

The problem with the original patch is decode() without a charset uses Python's default string encoding which is often ASCII resulting in UnicodeDecodeError and no change to regexp. While this doesn't make things worse, it doesn't fix the problem unless the site has changed Python's default string encoding to a charset more appropriate to the installation.

Changed in mailman:
assignee: nobody → Mark Sapiro (msapiro)
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
KOMEDA Shinji (komeda-shinji) wrote :

I'm using Mailman 2.1.16 in Ubuntu box, in my environment

    DEFAULT_SERVER_LANGUAGE = 'ja'

and Ubuntu has the following code in Mailman/Defaults.py

    import sys
    reload(sys)
    sys.setdefaultencoding('utf-8')

These code came from debian/patches/91_utf8.patch.

I changed "regexp.decode()" to "regexp.decode(Utils.GetCharSet(mlist.preferred_language))",
It works fine.

Thank you.

Revision history for this message
Mark Sapiro (msapiro) wrote :

This bug is 'mostly' fixed. There are potentially very complex cases of lists with English as the preferred language (admin UI language) with members whose real names contain non-ascii. In these cases, the search string POSTed by the browser may contain HTML entities such as é and é and parsing all this is complicated and error prone and not done by this fix.

Since the search string is a regexp, one can just use a dot (.) instead of a problem character in these cases.

For lists whose preferred language (admin UI language) is other than English, this bug should be fixed.

Changed in mailman:
milestone: none → 2.1.21
status: Confirmed → Fix Committed
Mark Sapiro (msapiro)
Changed in mailman:
milestone: 2.1.21 → 2.1.21rc1
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.