qrunner crashes on invalid unicode sequence

Bug #1462755 reported by Thijs Kinkhorst on 2015-06-07
24
This bug affects 2 people
Affects Status Importance Assigned to Milestone
GNU Mailman
Low
Mark Sapiro
mailman (Ubuntu)
Undecided
Unassigned

Bug Description

When a message contains an invalud unicode sequence in its header, qrunner flat out crashes on that:

May 17 15:32:20 2015 (981) Uncaught runner exception: 'utf8' codec can't decode byte
0xe9 in position 18: invalid continuation byte
May 17 15:32:20 2015 (981) Traceback (most recent call last):
  File "/var/lib/mailman/Mailman/Queue/Runner.py", line 119, in _oneloop
    self._onefile(msg, msgdata)
  File "/var/lib/mailman/Mailman/Queue/Runner.py", line 190, in _onefile
    keepqueued = self._dispose(mlist, msg, msgdata)
  File "/var/lib/mailman/Mailman/Queue/IncomingRunner.py", line 130, in _dispose
    more = self._dopipeline(mlist, msg, msgdata, pipeline)
  File "/var/lib/mailman/Mailman/Queue/IncomingRunner.py", line 153, in _dopipeline
    sys.modules[modname].process(mlist, msg, msgdata)
  File "/var/lib/mailman/Mailman/Handlers/CookHeaders.py", line 239, in process
    i18ndesc = uheader(mlist, mlist.description, 'List-Id', maxlinelen=998)
  File "/var/lib/mailman/Mailman/Handlers/CookHeaders.py", line 65, in uheader
    return Header(s, charset, maxlinelen, header_name, continuation_ws)
  File "/usr/lib/python2.7/email/header.py", line 183, in __init__
    self.append(s, charset, errors)
  File "/usr/lib/python2.7/email/header.py", line 267, in append
    ustr = unicode(s, incodec, errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 18: invalid
continuation byte

May 17 15:32:20 2015 (981) SHUNTING:
1431869540.389822+156779307d54473d0eb732994bb67eee95733285

A solution for this specific case is to have Mailman/Handlers/CookHeaders.py pass the erorrs='replace' parameter.

I would say that this is actually a bug in python-email, since I think it doesn't make sense to set errors to "strict" rather than something like "replace" when the intention is to parse stuff so free-formed, under-specd
and user-controlled as email. Nonetheless, Mailman already sets errors='replace' in some places so it might aswell add it here.

Related branches

Mark Sapiro (msapiro) wrote :

Actually, the traceback says what's happening is CookHeaders is trying to create the List-Id: header to be added to the message.

It tries to create a header of the form:

List-Id: list description <list.example.com>

And the exception occurs when trying to rfc 2047 encode the list's description in the charset of the list's preferred language. This exception should be occurring on every list post. Is that the case?

Also, what is the list's preferred_language and what is the raw value of the list's description attribute. Obtain this info with something like:

$ bin/withlist list1
Loading list list1 (unlocked)
The variable `m' is the list1 MailList instance
>>> m.preferred_language
'en'
>>> m.description
'My List one'
>>>

(of course the list name and responses will be different in your case.)

Changed in mailman:
assignee: nobody → Mark Sapiro (msapiro)
importance: Undecided → Medium
milestone: none → 2.1.21
status: New → Incomplete
Thijs Kinkhorst (kink) wrote :

I received this response:

root@barbershop:~# /usr/lib/mailman/bin/withlist caljente
Loading list caljente (unlocked)
The variable `m' is the caljente MailList instance
>>> m.preferred_language
'nl'
>>> m.description
'Lijst voor Caljent\xe9-leden'
>>>

Not sure what encoding that is. I've changed it to "Caljente" for now, which
should be a reasonable workaround.

Mark Sapiro (msapiro) wrote :

It appears the underlying issue is someone has changed Mailman's character set for 'nl' (Dutch) from iso-8859-1 to utf-8. Possibly whoever did this did the appropriate things such as recoding the message catalog and templates to utf-8, but in any case, the strings in the attributes of this list weren't recoded. This is one of the major problems that make it difficult to change Mailman's encoding for a language. See the definitions of the recode(), doitem() and convert() functions in Mailman/versions.py in Mailman 2.1.19 or later.

So basically, this issue appears to be a 'shot oneself in the foot' thing and probably could be fixed by setting the list's description to 'Lijst voor Caljent\xc3\xa9-leden', although I would be concerned that there are other iso-8859-1 strings in list attributes.

Anyway, I see this as an issue worth fixing. The fix I would propose is in Mailman/Handlers/CookHeaders.py replace the line at the end of the definition of uheader which is currently

    return Header(s, charset, maxlinelen, header_name, continuation_ws)

with

    try:
        return Header(s, charset, maxlinelen, header_name, continuation_ws)
    except UnicodeError:
        syslog('error', 'list: %s: can\'t decode "%s" as %s', mlist.internal_name(), s, charset)
        return Header('', charset, maxlinelen, header_name, continuation_ws)

Changed in mailman:
importance: Medium → Low
status: Incomplete → In Progress
Mark Sapiro (msapiro) on 2015-06-09
Changed in mailman:
status: In Progress → Fix Committed
Thijs Kinkhorst (kink) wrote :

Thanks for the fix! Although arguably a misconfiguration, it's good that it doesn't crash the qrunner.

Mark Sapiro (msapiro) wrote :

Actually, IncomingRunner doesn't actually "crash"; it does encounter an unanticipated exception causing it to log the exception and shunt the message, and yes, the underlying issue is definitely a "misconfiguration", but catching the exception and dealing with it more gracefully without shunting the message wasn't hard, so I thought it worthwhile.

Mark Sapiro (msapiro) on 2016-02-03
Changed in mailman:
milestone: 2.1.21 → 2.1.21rc1
status: Fix Committed → Fix Released
Mark Sapiro (msapiro) wrote :

For more information on the causes of this issue and the fallout from what turns out to be Debian's changing of the character set for several languages, see the thread "Encoding problem with 2.15 to 2.18 upgrade with Finnish" beginning at <https://mail.python.org/pipermail/mailman-users/2015-December/080221.html> and continuing at <https://mail.python.org/pipermail/mailman-users/2016-January/080275.html>. There is a script mentioned in that thread at <https://www.msapiro.net/scripts/recode_list> (mirrored at <http://fog.ccsf.edu/~msapiro/scripts/recode_list>) that can programmatically recode the strings in a list's configuration to "fix" this issue.

Paul Collins (pjdc) wrote :

I just ran into this problem following an upgrade from 12.04 LTS to 16.04 LTS. recode_list fixed the problem (thank you, Mark!) but this seems like something Ubuntu should detect and offer to correct during the upgrade.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers