Multipart/mixed issues in archives

Bug #265930 reported by Phelim-gervase
2
Affects Status Importance Assigned to Milestone
GNU Mailman
New
High
Unassigned

Bug Description

We are having problems with mailing lists that are not
being properly stripped down to text content in the
archives. When there is multipart/mixed, it doesn't
pull the multipart/alternative sections into their
appropriate text portions.

  For example, from content such as the following:

===========================================================================
===
>From ...
[...]
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary=------------InterScan_NT_MIME_Boundary
[...]

This is a multi-part message in MIME format.

--------------InterScan_NT_MIME_Boundary
Content-Type: multipart/alternative;
        boundary="----_=_NextPart_001_01C336A1.2C7564BC"
Content-Transfer-Encoding: 7bit

------_=_NextPart_001_01C336A1.2C7564BC
Content-Type: text/plain;
 charset=us-ascii
Content-Transfer-Encoding: quoted-printable

Kevin has a pending checkin that addresses the
minss/maxss issue.
=20
[...]
------_=_NextPart_001_01C336A1.2C7564BC
Content-Type: text/html;
 charset=us-ascii
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN">
<HTML xmlns=3D"http://www.w3.org/TR/REC-html40" xmlns:v
=3D=20
"urn:schemas-microsoft-com:vml" xmlns:o =3D=20
"urn:schemas-microsoft-com:office:office" xmlns:w =3D=20
"urn:schemas-microsoft-com:office:word" xmlns:x =3D=20
"urn:schemas-microsoft-com:office:excel" xmlns:st1 =3D=20
"urn:schemas-microsoft-com:office:smarttags"><HEAD><T
ITLE>Message</TITLE>=

[...]
===========================================================================
===

  I only get the following:

===========================================================================
===
[64bit-compiler-analysis] RE: vpr analysis
Syyyy Kyyyyy syyyk at yyy.com
Thu Jun 19 14:27:16 CDT 2003

Previous message: [64bit-compiler-analysis] 06-19-03
MSFT 64-Bit C/C++ compiler
+improvement discussion
Next message: [64bit-compiler-analysis] RE: vpr analysis
Messages sorted by: [ date ] [ thread ] [ subject ] [
author ]

---------------------------------------------------------------------------
-----

Skipped content of type multipart/alternative

---------------------------------------------------------------------------
-----

Previous message: [64bit-compiler-analysis] 06-19-03
MSFT 64-Bit C/C++ compiler
+improvement discussion
Next message: [64bit-compiler-analysis] RE: vpr analysis
Messages sorted by: [ date ] [ thread ] [ subject ] [
author ]

---------------------------------------------------------------------------
-----
More information about the 64bit-compiler-analysis
mailing list
===========================================================================
===

As you can see, the actual content of the
multipart/alternative portion [text/plain and
text/html] were completely stripped out instead of
being shown a plain text.

[http://sourceforge.net/tracker/index.php?func=detail&aid=759841&group_id=103&atid=100103]

Tags: pipermail
Revision history for this message
Phelim-gervase (phelim-gervase) wrote :

This appears to be within:

def process(mlist, msg, msgdata=None):

at around line 276, but I saw no way of making it recurse
for multipart/[mixed|alternative] sub-MIME parts.

Revision history for this message
Mrjc (mrjc) wrote :

This is causing me real problems! Is there any known
workarounds?

If I can't fix this I might have to use a different package as
presently all my archives are useless!

Revision history for this message
Mrjc (mrjc) wrote :

Additionally I think it is appropriate to up the priority on this
bug as it causes key functionality to fail.

Revision history for this message
Q7joey (q7joey) wrote :

i agree that this should be a high priority issue. a simple
message with just multipart/alternative will show up in the
archive ok, but if there is any other kind of attachment,
then the entire multipart section is skipped and you just
get a link for the extra attachment for download/view
ability. i haven't started to look at the code (and i'm not
a python/mailman person), but i'll report anything i can find.

Revision history for this message
Mrjc (mrjc) wrote :

This fails for many of my users as they habitually attach a
photo of themselves in their signatures. They are incredulous
at the idea that mailman can't handle it.

Thanks

Revision history for this message
Q7joey (q7joey) wrote :

i have a few line patch that seems to make it do what is
expected.

i can't see how to attach via sourceforge yet, so i'll paste
it here:

---
/usr/local/src/mailman-2.1.2/Mailman/Handlers/Scrubber.py
Fri Feb 7 23:13:50 2003
+++ ./Scrubber.py Sat Sep 27 08:58:46 2003
@@ -286,11 +286,13 @@
         # BAW: Martin's original patch suggested we might
want to try
         # generalizing to utf-8, and that's probably a good
idea (eventually).
         text = []
- for part in msg.get_payload():
+ for part in msg.walk():
+ if part.get_main_type() == 'multipart':
+ continue
             # All parts should be scrubbed to text/plain by
now.
             partctype = part.get_content_type()
             if partctype <> 'text/plain':
- text.append(_('Skipped content of type
%(partctype)s'))
+ text.append(_('Skipped content of type
%(partctype)s\n'))
                 continue
             try:
                 t = part.get_payload(decode=1)

Revision history for this message
Tokio Kikuchi (tkikuchi) wrote :

The patch by q7joey is merged into my Scrubber.py patch
#866238. I hope Barry can integrate it in 2.1.4.

Revision history for this message
Q7joey (q7joey) wrote :

i just started working on a 2.1.5 system and discovered that
this bug was still there. from looking in cvs, it appears
to be fixed there (although it seems to reference an
unrelated bugid).

updating this bug to reflect the cvs update would be nice.

Revision history for this message
Q7joey (q7joey) wrote :

i just looked at the cvs closer and i see that the patch is
on the 2.1 branch, but hasn't made it into the trunk yet.

Revision history for this message
Rekt (rekt) wrote :

Originator: NO

This bug (or something very similar to it) seems to still be a problem.
Consider the message here:

 http://marc.info/?l=openssh-unix-dev&m=119212056224122&w=2

and in its pipermail archive:

http://lists.mindrot.org/pipermail/openssh-unix-dev/2007-October/025812.html

Revision history for this message
Mark Sapiro (msapiro) wrote :

Originator: NO

I can't tell for sure, but the message at
<http://marc.info/?l=openssh-unix-dev&m=119212056224122&w=2> appears to be
malformed. If I go to
<http://marc.info/?l=openssh-unix-dev&m=119212056224122&q=raw> to view the
alleged raw message, I see at the beginning:

--===============1431543891==
Content-Type: multipart/signed; boundary="=-=-=";
 micalg=pgp-sha1; protocol="application/pgp-signature"

--=-=-=

On Thu 2007-10-11 11:00:41 -0400, Larry Becke wrote:
...

I expect to see something like:

--===============1431543891==
Content-Type: multipart/signed; boundary="=-=-=";
 micalg=pgp-sha1; protocol="application/pgp-signature"

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--=-=-=
Content-Type: text/plain; charset=...
Content-Transfer-Encoding: ...

On Thu 2007-10-11 11:00:41 -0400, Larry Becke wrote:
...

I.e., I don't see a Content-Type: header for the message body. If it is in
fact missing, that would cause Mailman's behavior in this case, but it is
the message that is at fault, not Mailman.

So the question is whether or not the alleged raw message is in fact a
true representation. If it is, then I think it is the sender's MUA that is
at fault.

Revision history for this message
Rekt (rekt) wrote :

Originator: NO

Thanks for the response, msapiro. marc.info's raw copy of it looks
basically identical to the version of that message that arrived in my
inbox, so i'd say it's a correct copy. The RFC822 headers for the raw
message were:

Return-Path: <email address hidden>
To: <email address hidden>
Subject: Re: scp -t . - possible idea for additional parameter
From: Daniel Kahn Gillmor <email address hidden>
Date: Thu, 11 Oct 2007 12:34:23 -0400
Message-ID: <email address hidden>
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1431543891=="

When i supply the concatenation of those headers, a blank line, and then
the raw message to msglint, the IETF's message validator [0], it outputs:

-----------
OK: found part multipart/mixed line 10
OK: preamble 10:
OK: found part multipart/signed line 15
OK: preamble 15:
OK: found default part text/plain line 18
OK: found part application/pgp-signature line 67
OK: epilogue 86:
WARNING: MIME headers should only be 'Content-*'. No meaning will apply to

         header 'MIME-Version' at line 89
OK: found part text/plain line 93
-----------

So that validator doesn't have any problem with the message (it assumes
the part starting at line 18, which is the section you're suggesting is
invalid, is text/plain). Is the validator wrong in assuming that? I don't
know the relevant specifications well enough to tell myself. Can you show
me where it's a requirement that each MIME section have a content-type?

Thanks for looking into this.

[0] http://www.apps.ietf.org/msglint.html

Revision history for this message
Rekt (rekt) wrote :

Originator: NO

Just did a bit of digging. It looks like section 5.2 of RFC 2045 suggests
that missing content-types should be treated as:

  Content-type: text/plain; charset=us-ascii

While i agree that it would be better for the sending MUA to include an
explicit content-type for each mime part (i'm about to file a bug against
the MUA), it seems problematic for pipermail to refuse to render such a
part at all.

Revision history for this message
Mark Sapiro (msapiro) wrote :

Originator: NO

You are correct. I was thinking that without the header, the following
text would be a preamble, but this is not the case.

There does appear to be a problem here, and I will look into it further.
The reconstructed message helps alot. Thanks for that.

BTW, the problem is not with pipermail. The message is processed by
Mailman/Handlers/Scrubber.py and flattened to plain text before pipermail
ever sees it. I have verified that the underlying Python email library
parses the MIME structure correctly and sees the body as a text/plain
part.

I have some ideas, but I haven't looked closely enough to be sure. I'll
post again when I know more.

Revision history for this message
Mark Sapiro (msapiro) wrote :

Originator: NO

It turns out this problem has been observed and discussed at great length
in December of 2006. See the thread that begins at
<http://mail.python.org/pipermail/mailman-users/2006-December/054904.html>.

A few fixes were discussed in that thread but never implemented. I have
now tested a fix along the lines of that discussion and committed it and it
will be in Mailman 2.1.10 (beta release is imminent).

Revision history for this message
Rekt (rekt) wrote :

Originator: NO

Thank you very much, Mark!

I'm assuming that this is the commit you're talking about:

http://marc.info/?l=mailman-cvs&m=119440136928253&w=2

I just applied the following diff to a debian lenny installation (mailman
2.1.9-8) i've been experimenting on:

--- Scrubber.py.orig 2007-11-06 21:15:30.000000000 -0500
+++ Scrubber.py 2007-11-06 21:16:07.000000000 -0500
@@ -342,7 +342,8 @@
         text = []
         for part in msg.walk():
             # TK: bug-id 1099138 and multipart
- if not part or part.is_multipart():
+ # MAS test payload - if part may fail if there are no
headers.
+ if not part._payload or part.is_multipart():
                 continue
             # All parts should be scrubbed to text/plain by now.
             partctype = part.get_content_type()

After recompiling Scrubber.py, I then did:

 /var/lib/mailman/bin/arch --wipe testlist

and it fixed a message with a similar formatting issue that had previously
been blank.

My only concern is that in the thread you linked to, it's mentioned that
arch --wipe can break external links. This makes me reluctant to use it to
fix older archives with blank messages which might have accumulated
external links. URLs should be stable! Is this really a possible
consequence of arch --wipe?

Revision history for this message
Mark Sapiro (msapiro) wrote :

Originator: NO

Yes, that is the commit I'm talking about.

And, yes it is possible for bin/arch --wipe to break saved URLs/external
links. See
<http://www.python.org/cgi-bin/faqw-mm.py?req=show&file=faq01.033.htp>.

One way in which this can happen is if the cumulative mailbox file
(archives/private/<listname>.mbox/<listname>.mbox) has unescaped "From "
lines in message bodies. This will only be the case with .mbox files that
are 'old' (maybe older than 2.1.x, but I'm not sure) or imported from
another application. The bin/cleanarch script can help find and fix such
lines.

I think there are other ways in which this can happen too, but I'm not
sure what they are, but I am confident that if Mailman isn't running when
you do the bin/arch --wipe, the message numbers in the new archive will get
assigned in mbox order, so the issue is if the original numbers somehow are
not in mbox order.

One thing you can do is stop Mailman, make a backup copy of the
archives/private/<listname>/ directory, run bin/arch --wipe <listname> and
then quickly check sample messages throughout the new and backup archives
to verify they have the same message numbers. If they do, just start
Mailman, If not, restore the backup.

Note that the current plan is to redesign the built-in archiver for
Mailman 3.0 to avoid this problem in the future among other things.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.