unicodedecode still kills things

Bug #308152 reported by Bunny
2
Affects Status Importance Assigned to Milestone
GNU Mailman
Fix Released
Undecided
Mark Sapiro

Bug Description

Smells like: 265976 is incurable...

# pkg_info | egrep "(python|mailman)"
mailman-2.1.11 A mailing list manager (MLM) with a user-friendly web front
python25-2.5.2_3 An interpreted object-oriented programming language
# uname -a
FreeBSD thurkler 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Wed Jan 31 11:14:12 JST 2007

Dec 15 19:35:20 2008 (16881) Uncaught runner exception: 'ascii' codec can't decode byte 0xa1 in position 1: ordinal not in range(128)
Dec 15 19:35:20 2008 (16881) Traceback (most recent call last):
  File "/usr/local/mailman/Mailman/Queue/Runner.py", line 120, in _oneloop
    self._onefile(msg, msgdata)
  File "/usr/local/mailman/Mailman/Queue/Runner.py", line 191, in _onefile
    keepqueued = self._dispose(mlist, msg, msgdata)
  File "/usr/local/mailman/Mailman/Queue/ArchRunner.py", line 73, in _dispose
    mlist.ArchiveMail(msg)
  File "/usr/local/mailman/Mailman/Archiver/Archiver.py", line 216, in ArchiveMail
    h.processUnixMailbox(f)
  File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 580, in processUnixMailbox
    self.add_article(a)
  File "/usr/local/mailman/Mailman/Archiver/pipermail.py", line 621, in add_article
    filename))
  File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 1120, in write_article
    f.write(article.as_text())
  File "/usr/local/mailman/Mailman/Archiver/HyperArch.py", line 577, in as_text
    atmark = unicode(_(' at '), cset)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa1 in position 1: ordinal not in range(128)

Dec 15 19:35:20 2008 (16881) SHUNTING: 1229337318.9934821+1651c52131b75e42ee253f2a55793e347e32c1b3

Date: Mon, 15 Dec 2008 19:36:04 +0900
From: Don <x>
User-Agent: Thunderbird 2.0.0.18 (Windows/20081105)
MIME-Version: 1.0
To: test@x
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 8bit
Subject: [test] test 16 - shift JIS
...

HTML▒▒ź▒եե▒▒▒▒▒▒▒ݴɤ▒▒ޤ▒▒▒...
(looks like it really was shiftjis...)

Related branches

Revision history for this message
Mark Sapiro (msapiro) wrote :

This is only related to (non) Bug 265976 in that the exception is the same.

It is not clear to me if this is a bug or not. The code in HyperArch.py is suspicious. The existing code fragment is

        if mm_cfg.ARCHIVER_OBSCURES_EMAILADDRS:
            otrans = i18n.get_translation()
            try:
                atmark = unicode(_(' at '), cset)
                i18n.set_language(self._lang)
                body = re.sub(r'([-+,.\w]+)@([-+.\w]+)',
                              '\g<1>' + atmark + '\g<2>', body)
            finally:
                i18n.set_translation(otrans)

And it seems it possibly should be

        if mm_cfg.ARCHIVER_OBSCURES_EMAILADDRS:
            otrans = i18n.get_translation()
            try:
                i18n.set_language(self._lang)
                atmark = unicode(_(' at '), cset)
                body = re.sub(r'([-+,.\w]+)@([-+.\w]+)',
                              '\g<1>' + atmark + '\g<2>', body)
            finally:
                i18n.set_translation(otrans)

but I'm not sure. I may be misunderstanding it.

However, I am unable to duplicate this exception or any other with the message fragment posted with either English or Japanese as the list language, so I don't know if the above change would have any effect.

It is clear the message is defective in that it claims the body is us-ascii and it isn't, but I don't *think* that should cause this particular error.

Can you email me a complete message that causes this exception and also tell me the preferred_language of this list and the mm_cfg.py setting of DEFAULT_SERVER_LANGUAGE if it is set to other than 'en'.

Changed in mailman:
assignee: nobody → msapiro
status: New → Incomplete
Revision history for this message
Mark Sapiro (msapiro) wrote :

> Can you email me a complete message that causes this exception

or attach it here which would be preferable if it doesn't contain sensitive information.

Revision history for this message
Bunny (bunny-evans) wrote :
  • a Edit (1.4 KiB, text/plain)

Managed to generate it again today.
Dec 16 17:08:58 2008 (16881) SHUNTING: 1229414937.7681551+51d222de4dc8b86ffa03c615e56e9c46d0afdb2b
Dec 16 17:17:12 2008 (16881) Uncaught runner exception: 'ascii' codec can't decode byte 0xa1 in position 1: ordinal no
t in range(128)

The file attached is the mail that matches this, extracted from the test.mbox,
the html appears to have been eaten.

There is no DEFAULT_SERVER_LANGUAGE in mm_cfg.py, the only place that appears to show is Defaults.py where it is, indeed, 'en'

Revision history for this message
Mark Sapiro (msapiro) wrote :

I'll try to duplicate this with the attached message, but please if you still have it, attach the qfiles/shunt/1229414937.7681551+51d222de4dc8b86ffa03c615e56e9c46d0afdb2b.pck file, or if you're concerned about sensitive information, email it to me, and also, please tell me the list's preferred_language.

Revision history for this message
Mark Sapiro (msapiro) wrote :

I am still unable to duplicate this. The message you attached appears to have been a text/html message which has been scrubbed for the archive. Presumably, this is a Japanese language list since the description of the scrubbed part is the Japanese translation of "An HTML attachment was scrubbed...". Also, since that appeared in the .mbox, it appears the list's scrub_nondigest setting is Yes.

I don't know why I can't duplicate this. Possibly there is something about the FreeBSD package that is different from my version. I'll keep trying, but the shunted .pck file and any other information you can provide may help.

Revision history for this message
Mark Sapiro (msapiro) wrote :

I still can't really duplicate the error. I know why it occurs, and I think I know how to fix it, but I don't know how to make it happen.

I'm sure that the error occurs because _(' at ') is translating ' at ' to Japanese, but the list's preferred language is in fact not Japanese, but English and the "unicode(_(' at '), cset)" specifies 'us-ascii' as cset.

The fix should be what I indicate in my initial comment above. The same change is attached here as HyperArch.py.patch.

I am still puzzled as to what causes the problem. I see what may be some holes in setting the language for i18n, but I can't actually duplicate the problem without completely breaking things - i.e., setting the character set for Japanese to 'us-ascii'.

Is there a mixture of Japanese and English lists in this installation or a mixture of list members some with Japanese and some with English as their preferred language?

Revision history for this message
Mark Sapiro (msapiro) wrote :

The more I think about this, the more I am convinced that the problem is due to the i18n language not being properly set for the list in question.

I am thinking that a message is processed (successfully) for a Japanese language list and leaves the i18n translation in IncomingRunner and ArchRunner set to Japanese. Then a post arrives for an English language list and the translation doesn't get set to English early enough in the processing.

I suspect if you still have the shunted messages in qfiles/shunt, that you could run bin/unshunt and they would likely be processed and archived without error.

Note that if you are going to do this, first examine the entries in qfiles/shunt (with bin/dumpdb or bin/show_qfiles) and remove or move aside any 'other' entries from previous errors that you don't want to try to reprocess.

Revision history for this message
Mark Sapiro (msapiro) wrote :

The problem turned out to be due to posts to an English language list from a member whose preferred language for the list was Japanese. The fix is as in the previously attached patch.

Changed in mailman:
status: Incomplete → Fix Committed
Mark Sapiro (msapiro)
Changed in mailman:
milestone: none → 2.1.12
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.