ToDigest.py i18n subject

Bug #558039 reported by tkikuchi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Mailman
Fix Released
Undecided
Unassigned

Bug Description

ToDigestl.py (v2.23) was considerably improved but
there remain some oddities in digested message subject
representation. Specificaly, MIME subject is not
wrapped with the lheader() propperly.
Please examine the files I am going to attach.

1. Test program to examine the behaviour of lheader()
and improbed (but lengthy) header_in_a_line() which
removes excessive CRLF and adjust folding white spaces.
 It simulates the subject-prefix which will be added in
CookHeaders.py.
2. Result of test program. You will notice
incompatiblity when the subject is once MIME encoded.
3. New patch which use header_in_a_line() and Utils.wrap()

Revision history for this message
tkikuchi (tkikuchi-users) wrote :

The file headtest.py.txt was added: headtest.py ... Code test program

Revision history for this message
tkikuchi (tkikuchi-users) wrote :

The file testresult.txt was added: testresult.txt ... Code test result

Revision history for this message
tkikuchi (tkikuchi-users) wrote :

The file ToDigest.py.diff.txt was added: ToDigest.py

Revision history for this message
tkikuchi (tkikuchi-users) wrote :

Logged In: YES
user_id=67709

Forget to note: This patch is revise of #668819 which was
closed and differently applied in recent CVS.

Revision history for this message
bwarsaw (bwarsaw) wrote :

Logged In: YES
user_id=12800

Tell me what you think of the hial() function in the
attached file.

Revision history for this message
bwarsaw (bwarsaw) wrote :

The file hial.py was added: None

Revision history for this message
tkikuchi (tkikuchi-users) wrote :

Logged In: YES
user_id=67709

Looks like good for english text (may be for western) but
folding white space should be treated as null string ('') in
iso-2022-jp encoded japanese (and other RFC 2047 encoded
MIME subject, I believe). u' '.join() must be u''.join() in
these languages. You must alway check if the part of the
header is mime encoded or not when joining. :-(

Revision history for this message
tkikuchi (tkikuchi-users) wrote :

Logged In: YES
user_id=67709

OK, Barry. I will compromise. Use u''.join() not u'
'.join(). This eliminate extra space added when joining.
Remember that all-ASCII header will get double space after
the prefix for English text while the spaces after the
prefix is removed for Japanese text.
I think __unicode__() joining in the email package should
take care the difference in RFC2822/2047 headers.

Revision history for this message
tkikuchi (tkikuchi-users) wrote :

Logged In: YES
user_id=67709

This is my final patch.
And, I like no blank line in TOC.

Revision history for this message
tkikuchi (tkikuchi-users) wrote :

The file ToDigest.py.diff.txt was added: Todigest.py.diff

Revision history for this message
tkikuchi (tkikuchi-users) wrote :

Logged In: YES
user_id=67709

updating patch for fixing new unicode-related error reported
by Dan Mick.

Revision history for this message
tkikuchi (tkikuchi-users) wrote :

Logged In: YES
user_id=67709

Sorry, I have uploaded intermediate version. This is final,
I hope.

Revision history for this message
tkikuchi (tkikuchi-users) wrote :

The file ToDigest.py.diff2.txt was added: ToDigest.py.diff 2nd

Revision history for this message
bwarsaw (bwarsaw) wrote :

Logged In: YES
user_id=12800

I'm accepting and applying this patch -- with two
differences. One the line that says "return
oneline.encode()" I want to pass the argument "replace" so
that there's more chance that much of the header decoding
can actually happen. Without this, iso-8859-1 encoded
Subject: headers posted to an English list leave the RFC
2047 encodings in the header, which looks ugly.

The second change is to replace the bare except (bad! :) in
oneline() with a qualified except. I believe only
UnicodeError and LookupError can occur here, although with
the first change above, maybe not even UnicodeError.

Now I want to see if this fixes the problem I've been having
with the spambayes list. Thanks! :)

Revision history for this message
bwarsaw (bwarsaw) wrote :

Logged In: YES
user_id=12800

Looks like this is working for python.org at least, so I'm
closing the issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.