Message excerpt corruption on admindb Web UI

Bug #1415406 reported by Yasuhito FUTATSUKI at POEM
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GNU Mailman
Medium
Mark Sapiro

Bug Description

Some messages held on admindb cannot display correctly becase of partial Unicode conversion error or
incomplete multi-byte character on mm_cfg.ADMINDB_PAGE_TEXT_LIMIT boundary.

Message character corruption has been occured in conditions below.

(1) Message charset/encoding is charset/encoding of multi-byte charaters.
(2) Message charset/encoding differs from web display charset/encoding.
(3) Message contains character that cannot convert to Unicode by using Python codec.
or
(3') Message body size exceeds mm_cfg.ADMINDB_PAGE_TEXT_LIMIT in bytes after decoding mime
     and cut down along multi-byte charater's byte sequence.

Under these conditions, Unicode error occur in converting message charset/encoding and message
has remained not to convert charset/encoding.

A patch attached below solves (3) by using decode/encode with 'replace' error handling scheme,
and (3') by rounding on character boundary not to exceeds the limit in bytes after
character/encoding converted.

Note: Even If Message charset/encoding is same as web display charset/encoding, condition (3')
may produce invalid html, but patch below don't fix it.

Related branches

Revision history for this message
Yasuhito FUTATSUKI at POEM (futatuki) wrote :
Revision history for this message
Mark Sapiro (msapiro) wrote :

If possible, please provide messages that meet
a) (1), (2) and (3)
b) (1), (2) and (3')
and a message which results in invalid HTML.

I would like to use these for unit tests.

Changed in mailman:
assignee: nobody → Mark Sapiro (msapiro)
importance: Undecided → Medium
milestone: none → 2.1.19
status: New → Triaged
Revision history for this message
Mark Sapiro (msapiro) wrote :

I have another question. In looking at your patch, it seems you go to some lengths to ensure that you cut the excerpted text at a point as close as possible to ADMINDB_PAGE_TEXT_LIMIT. It seems it would be much simpler and possibly even esthetically better to include all of the last line which goes over the limit. Is there some reason not to do that?

Revision history for this message
Yasuhito FUTATSUKI at POEM (futatuki) wrote :

It is pretty difficult to make sample of a) case in my working environment, so try to make it after b) case.
(One of case a) is known as a bogus iso-2022-jp message, which is one of reason why Mr. Kikuchi was maintain his own local branch 2.1-japan.)

The attachment below is a sample of b) case,
    sample-b-message.eml ... E-Mail message (UTF-8)
    admindb-sample-b.html ... output of admindb Web UI with msgid (EUC-JP)

Revision history for this message
Yasuhito FUTATSUKI at POEM (futatuki) wrote :

Answer of #3 question:
I'm afraid if ADMINDB_PAGE_TEXT_LIMIT is much smaller than last line which goes over the limit. I'm not sure that a 'format=flowed' message is decoded into one line or not, but if it is, one line may much bigger than 1000 octets.
An alternate of this way of fix, changing meaning of ADMINDB_PAGE_TEXT_LIMITS as message size of (Unicode) characters, not bytes. This is pretty simpler than my patch.

Revision history for this message
Yasuhito FUTATSUKI at POEM (futatuki) wrote :

Here is a sample of a) case.

sample-a-message.eml ... E-Mail message ((bogus) iso-2022-jp)
admindb-sample-a.html ... output of admindb Web UI with msgid (EUC-JP)

Revision history for this message
Mark Sapiro (msapiro) wrote :

Thank you for your help and for the test case.

I have committed a fix which just accepts all of the body line that reaches or exceeds ADMINDB_PAGE_TEXT_LIMIT. With a normal message, this will not extend the displayed body much as lines in the decoded message body, even with format-flowed, are normally not too long, and even if there is a pathological message with a very long line, this shouldn't raise an exception.

See http://bazaar.launchpad.net/~mailman-coders/mailman/2.1/revision/1524 for the fix.

Changed in mailman:
status: Triaged → Fix Committed
Mark Sapiro (msapiro)
Changed in mailman:
milestone: 2.1.19 → 2.1.19rc2
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers