Bug getting i18n'ed attachment filenames (RFC2231)

Bug #1060951 reported by Aurélien Bompard on 2012-10-03
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Mailman
High
Barry Warsaw

Bug Description

RFC 2231 allows filenames to have non-ascii characters. The get_filename() method in Python's Message class handles this by calling email.utils.collapse_rfc2231_value() at the end of get_filename. This method returns the filename in Unicode.

This fails in Mailman because the mailman.email.message.Message class has a wrapper around get() and __getitem__() to return unicode headers. As a result, the collapse_rfc2231_value() tries to transforms into unicode an already unicode string, and I get the following exception:

  File "/usr/lib/python2.7/email/utils.py", line 319, in collapse_rfc2231_value
    return unicode(rawval, charset, errors)
TypeError: decoding Unicode is not supported

A possible solution to this would be to make Mailman's Message get_filename() method be more than just an exception-catching wrapper, and re-implement the original get_filename() method, inserting a conversion to str before calling collapse_rfc2231_value().

Does this make sense ? Any other idea for a possible solution ?

Related branches

Barry Warsaw (barry) on 2012-10-03
tags: added: mailman3
Aurélien Bompard (abompard) wrote :

See the TestMessageSubclass testcase I've added to the attached testsuite for a way to reproduce it.
It's actually a little harder that I first thought, encoding the filename in the middle of the method is not enough.

Mark Sapiro (msapiro) wrote :

This works for me with Mailman 2.1.15 and email 4.0.1. Does it fail for you with Mailman 2.1.x? If so, what Mailman and email versions?

[msapiro@MSAPIRO ~]$ python
Python 2.6.5 (r265:79063, Jun 12 2010, 17:07:01)
[GCC 4.3.4 20090804 (release) 1] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import email
>>> email.__version__
'4.0.1'
>>> import sys
>>> sys.path.insert('/cygdrive/f/test-mailman/')
>>> from Mailman import Message
>>> msg = email.message_from_string("""Message-ID: <email address hidden>
... Content-Type: multipart/mixed; boundary="------------050607040206050605060208"
...
... This is a multi-part message in MIME format.
... --------------050607040206050605060208
... Content-Type: text/plain; charset=UTF-8
... Content-Transfer-Encoding: quoted-printable
...
... Test message containing an attachment with an accented filename
...
... --------------050607040206050605060208
... Content-Type: text/plain; charset=UTF-8;
... name="=?UTF-8?B?dG9kby1kw6lqZXVuZXIudHh0?="
... Content-Transfer-Encoding: base64
... Content-Disposition: attachment;
... filename*=UTF-8''%74%6F%64%6F%2D%64%C3%A9%6A%65%75%6E%65%72%2E%74%78%74
...
... VmlhbmRlCk1lbnRoZQpQYWluClZpbgoKQ3Vpc2luZTogcHLDqXBhcmVyIGwnYXDDqXJvLCBj
... b3VwZXIgZXQgZmFpcmUgcmlzc29sZXIgbGVzIHBhdGF0ZXMsIGV0IGZhaXJlIGxlcyBjb29r
... aWVzCg==
... --------------050607040206050605060208--
... """, Message.Message)
>>> msg
From nobody Wed Oct 3 08:43:13 2012
Message-ID: <email address hidden>
Content-Type: multipart/mixed; boundary="------------050607040206050605060208"

This is a multi-part message in MIME format.
--------------050607040206050605060208
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Test message containing an attachment with an accented filename

--------------050607040206050605060208
Content-Type: text/plain; charset=UTF-8;
        name="=?UTF-8?B?dG9kby1kw6lqZXVuZXIudHh0?="
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
        filename*=UTF-8''%74%6F%64%6F%2D%64%C3%A9%6A%65%75%6E%65%72%2E%74%78%74

VmlhbmRlCk1lbnRoZQpQYWluClZpbgoKQ3Vpc2luZTogcHLDqXBhcmVyIGwnYXDDqXJvLCBj
b3VwZXIgZXQgZmFpcmUgcmlzc29sZXIgbGVzIHBhdGF0ZXMsIGV0IGZhaXJlIGxlcyBjb29r
aWVzCg==
--------------050607040206050605060208--

>>> att = msg.get_payload()[1]
>>> att
From nobody Wed Oct 3 08:43:44 2012
Content-Type: text/plain; charset=UTF-8;
        name="=?UTF-8?B?dG9kby1kw6lqZXVuZXIudHh0?="
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
        filename*=UTF-8''%74%6F%64%6F%2D%64%C3%A9%6A%65%75%6E%65%72%2E%74%78%74

VmlhbmRlCk1lbnRoZQpQYWluClZpbgoKQ3Vpc2luZTogcHLDqXBhcmVyIGwnYXDDqXJvLCBj
b3VwZXIgZXQgZmFpcmUgcmlzc29sZXIgbGVzIHBhdGF0ZXMsIGV0IGZhaXJlIGxlcyBjb29r
aWVzCg==
>>> att.get_filename()
u'todo-d\xe9jeuner.txt'

Aurélien Bompard (abompard) wrote :

Sorry, I should have written it : it's with Mailman 3 HEAD.

Barry Warsaw (barry) on 2012-10-03
no longer affects: mailman/2.1
no longer affects: mailman/3.0
Barry Warsaw (barry) on 2014-12-09
Changed in mailman:
milestone: none → 3.0.0b5
assignee: nobody → Barry Warsaw (barry)
importance: Undecided → High
status: New → Fix Committed
Barry Warsaw (barry) on 2014-12-30
Changed in mailman:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers