qrunner crashes on invalid unicode sequence
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| GNU Mailman |
Low
|
Mark Sapiro | ||
| mailman (Ubuntu) |
Wishlist
|
Unassigned |
Bug Description
When a message contains an invalud unicode sequence in its header, qrunner flat out crashes on that:
May 17 15:32:20 2015 (981) Uncaught runner exception: 'utf8' codec can't decode byte
0xe9 in position 18: invalid continuation byte
May 17 15:32:20 2015 (981) Traceback (most recent call last):
File "/var/lib/
self.
File "/var/lib/
keepqueued = self._dispose(
File "/var/lib/
more = self._dopipelin
File "/var/lib/
sys.
File "/var/lib/
i18ndesc = uheader(mlist, mlist.description, 'List-Id', maxlinelen=998)
File "/var/lib/
return Header(s, charset, maxlinelen, header_name, continuation_ws)
File "/usr/lib/
self.append(s, charset, errors)
File "/usr/lib/
ustr = unicode(s, incodec, errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 18: invalid
continuation byte
May 17 15:32:20 2015 (981) SHUNTING:
1431869540.
A solution for this specific case is to have Mailman/
I would say that this is actually a bug in python-email, since I think it doesn't make sense to set errors to "strict" rather than something like "replace" when the intention is to parse stuff so free-formed, under-specd
and user-controlled as email. Nonetheless, Mailman already sets errors='replace' in some places so it might aswell add it here.
Related branches
Mark Sapiro (msapiro) wrote : | #1 |
Changed in mailman: | |
assignee: | nobody → Mark Sapiro (msapiro) |
importance: | Undecided → Medium |
milestone: | none → 2.1.21 |
status: | New → Incomplete |
Thijs Kinkhorst (kink) wrote : | #2 |
I received this response:
root@barbershop:~# /usr/lib/
Loading list caljente (unlocked)
The variable `m' is the caljente MailList instance
>>> m.preferred_
'nl'
>>> m.description
'Lijst voor Caljent\xe9-leden'
>>>
Not sure what encoding that is. I've changed it to "Caljente" for now, which
should be a reasonable workaround.
Mark Sapiro (msapiro) wrote : | #3 |
It appears the underlying issue is someone has changed Mailman's character set for 'nl' (Dutch) from iso-8859-1 to utf-8. Possibly whoever did this did the appropriate things such as recoding the message catalog and templates to utf-8, but in any case, the strings in the attributes of this list weren't recoded. This is one of the major problems that make it difficult to change Mailman's encoding for a language. See the definitions of the recode(), doitem() and convert() functions in Mailman/versions.py in Mailman 2.1.19 or later.
So basically, this issue appears to be a 'shot oneself in the foot' thing and probably could be fixed by setting the list's description to 'Lijst voor Caljent\
Anyway, I see this as an issue worth fixing. The fix I would propose is in Mailman/
return Header(s, charset, maxlinelen, header_name, continuation_ws)
with
try:
return Header(s, charset, maxlinelen, header_name, continuation_ws)
except UnicodeError:
return Header('', charset, maxlinelen, header_name, continuation_ws)
Changed in mailman: | |
importance: | Medium → Low |
status: | Incomplete → In Progress |
Changed in mailman: | |
status: | In Progress → Fix Committed |
Thijs Kinkhorst (kink) wrote : | #4 |
Thanks for the fix! Although arguably a misconfiguration, it's good that it doesn't crash the qrunner.
Mark Sapiro (msapiro) wrote : | #5 |
Actually, IncomingRunner doesn't actually "crash"; it does encounter an unanticipated exception causing it to log the exception and shunt the message, and yes, the underlying issue is definitely a "misconfiguration", but catching the exception and dealing with it more gracefully without shunting the message wasn't hard, so I thought it worthwhile.
Changed in mailman: | |
milestone: | 2.1.21 → 2.1.21rc1 |
status: | Fix Committed → Fix Released |
Mark Sapiro (msapiro) wrote : | #6 |
For more information on the causes of this issue and the fallout from what turns out to be Debian's changing of the character set for several languages, see the thread "Encoding problem with 2.15 to 2.18 upgrade with Finnish" beginning at <https:/
Paul Collins (pjdc) wrote : | #7 |
I just ran into this problem following an upgrade from 12.04 LTS to 16.04 LTS. recode_list fixed the problem (thank you, Mark!) but this seems like something Ubuntu should detect and offer to correct during the upgrade.
Christian Ehrhardt (paelzer) wrote : | #8 |
Setting the task for a whishlist item, since it is essentially a config change that breaks it (as Mark said 'shot oneself in the foot'.
I'm personally not so keen about on-upgrade detection+warning since that (in general) has a history of too many false-positives leading people to config-break their system without a reason.
But then as I read Mark this is due to Debian intentionally changing some encodings, so maybe it should be done ...
Changed in mailman (Ubuntu): | |
status: | New → Confirmed |
importance: | Undecided → Wishlist |
Actually, the traceback says what's happening is CookHeaders is trying to create the List-Id: header to be added to the message.
It tries to create a header of the form:
List-Id: list description <list.example.com>
And the exception occurs when trying to rfc 2047 encode the list's description in the charset of the list's preferred language. This exception should be occurring on every list post. Is that the case?
Also, what is the list's preferred_language and what is the raw value of the list's description attribute. Obtain this info with something like:
$ bin/withlist list1 language
Loading list list1 (unlocked)
The variable `m' is the list1 MailList instance
>>> m.preferred_
'en'
>>> m.description
'My List one'
>>>
(of course the list name and responses will be different in your case.)