sync_members crashes for UTF-8 real name
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
GNU Mailman |
New
|
Undecided
|
Unassigned |
Bug Description
This was reported in on mailman-users <http://
Steps to reproduce:
1) Create a text file encoded in UTF-8 including a line such as
Cédríc <email address hidden>
2) Use a list test-list ensuring <email address hidden> is not already a member of test-list
3) run sync_members --no-change --welcome-msg=no --goodbye-msg=no --notifyadmin=no -f testutf8.txt test-list
Expected results:
address is added with
Added : Cédríc <email address hidden>
Actual results:
File "/usr/sbin/
s = email.Utils.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
Attached patch applies to 2.1 and 2.2 head.
Note that there is also a related issue with list_members -f, where safe() also encodes to 7-bit, resulting in
C??dr??c <email address hidden>
This is on a Debian 6.0.7 system with a UTF-8 locale.
This is complicated. It is not clear that this is a bug, and if it is a bug, it is not clear that the bug is in sync_members.
The problem occurs in the statements
s = email.Utils. formataddr( (name, addr)).encode(enc, 'replace')
when name contains non-ascii. The first issue is that the job of email.Utils. formataddr( ) is to take a name and address pair and return a string (e.g. 'name <addr>') suitable for inclusion on a To:, From:, Cc:, etc. email message header. Headers are not allowed to contain non-ascii, so it could be argued that if name contains non-ascii, the result returned by email.Utils. formataddr( ) should be RFC 2047 encoded so it doesn't contain non-ascii.
Ignoring that, the next issue is that Python's default encoding is ascii regardless of locale. Thus, when we try to encode() the string returned by email.Utils. formataddr( ), Python must first decode it and does this using the ascii codec which throws the exception. Removing the encode() as the suggested patch does avoids this, but is not, I think, the best way to fix this.
I think the proper fix is to make your Python locale aware by editing the /usr/lib/ pythonv. v/site. py module and changing the first
if 0:
in the definition of setencoding() to
if 1:
This will not only fix this issue with sync_members, it will also fix the garbled output from list_mermbers -f and probably other cases of non-ascii being replaced with '?' in the command line scripts.
Another way to do this is to add
import sys ncoding( 'utf-8' )
sys.setdefaulte
to the sitecustomize.py module (/etc/pythonv. v/sitecustomize .py on Debian).