"From" at line start fools archiver, loses/splits message body

Reported by Phuture on 2004-05-13
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
GNU Mailman
High
Mark Sapiro

Bug Description

2.1.4 mailman version
- there is problem with archiver. when body of message
contains blank line and continued with text 'From ' at
beggining of line, then archiver thing that this is a
new message and split it.

i see, there is a cleanarch script, which can escape
non-header From line but why i must change original
post adding "escape character"? is there possibility to
check only header of message instead of whole message
including body?

[http://sourceforge.net/tracker/index.php?func=detail&aid=953320&group_id=103&atid=100103]

Desrod-users (desrod-users) wrote :

I can confirm this bug in 2.1.4, and that it is also in
2.1.5c2..

MAJOR BLOCKER! Please assign to someone and fix!

Desrod-users (desrod-users) wrote :

I can confirm that this affects the public/stable release
version of 2.1.5. Running bin/cleanarch on the suspect mbox
DOES NOT fix the problem.

I have a list with messages back in 2002, which have "parts"
showing up at the end of my September 2004 archives. The
messages contain "parts" of the body, and are missing most
of the headers.

Every time arch runs, it reduplicates the trashed/truncated
messages from 2002 onto the end of the September 2004 archive.

This is a MAJOR MAJOR MAJOR MAJOR blocker, and cleanarch is
not a proper solution. There are reasons why valid messages
can start with the word "From" at the beginning of a line,
and not be part of the "From:" header.

Here are two examples:

http://lists.pilot-link.org/pipermail/pilot-link-general/2002-September/016304.html

http://lists.pilot-link.org/pipermail/pilot-link-general/2002-June/016144.html

I haven't yet found a fix for this, and adding the > in
front of the "From" words at the beginning of the lines,
DOES NOT fix the problem. In fact, deleting those messages
(remember, they're from TWO YEARS AGO) from the .mbox,
rm'ing the .html files, and re-running arch on the list,
DOES NOT fix the problem.

Mark Sapiro (msapiro) wrote :

I had this problem on 2.1.4 after importing a large archive
from another service. I then fixed the "^From .*$" lines
that were in the bodies of a few messages and ran "bin/arch
--wipe" to rebuild the archive and that absolutely DID fix
the problem.

Barry Warsaw (barry) wrote :

This bug was fixed a long time ago, so it is no longer valid in the latest versions of Mailman. Old archives could still be affected if you regenerate the archive, but bin/cleanarch should fix up an old mbox file for you.

Changed in mailman:
status: New → Invalid
Bernie Innocenti (codewiz) wrote :

> This bug was fixed a long time ago

When was this bug fixed exactly? I still have it in 2.1.9, and couldn't find any reference to it being fixed in the changelogs of 2.1.10 to 2.1.12.

Mark Sapiro (msapiro) wrote :

The fix was to ensure that Mailman never writes unescaped "From " lines in message bodies.

The issue still exists in bin/arch if you import a .mbox from elsewhere with unescaped "From " lines or if you have an old Mailman generated .mbox with unescaped "From " lines. The solution for these is the use bin/cleanarch and/or other tools to escape any "From " lines in message bodies before using the file as input to bin/arch.

Bernie Innocenti (codewiz) wrote :

Yeah, but when was this fix done?

I see this issue in 2.1.9 with email sent to a list normally:
  http://lists.sugarlabs.org/archive/iaep/2009-May/005730.html
  (the original post continued with "From ...")

Mark Sapiro (msapiro) wrote :

I don't know exactly when it was fixed, and the fix may have been to the Python email package rather than to Mailman itself, but it was fixed way before 2.1.9.

I see the issue in your archive, but do the following:

$ cd to the Mailman installation
$ bin/withlist -i
No list name supplied.
Python 2.6.1 (r261:67515, Dec 27 2008, 17:04:27)
[GCC 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)] on cygwin
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import email
>>> email.__version__
'4.0.1'
>>> (control-D to exit)

'$ ' is a shell prompt; '>>> ' is a Python prompt; the rest is output.

What are the Python and email versions?

Also, find the message in the archives/private/iaep.mbox/iaep.mbox file. Is the "From " there unescaped?

Was the "From " escaped in the message you received from the list?

Bernie Innocenti (codewiz) wrote :

The email package bundled with mailman 2.1.9 as packaged by Ubuntu is version 2.5.8. The system module is 4.0.1.
Sounds like a packaging problem to me, and it's an unmodified Debian package. Shall we notify the maintainer?

The relevant part of the body of the email I received from the mailing list looked like this, in raw format:

 We started Sunday by brain-storming about issues and challenges with
 Sugar and Sugar Labs which could benefit from further exploration.
 From there, we broke into teams of 5 or 6 people to collaboratively
 explore subsets of those issues for about an hour.

(I indented the text to prevent Launchpad from reflowing it)

Mark Sapiro (msapiro) wrote :

Mailman 2.1.9 with email 2.5.8 should not have this problem. There is no problem with the Debian/Ubuntu package with respect to packaging email 2.5.8 with Mailman. Prior to Mailman 2.1.12, Mailman installed it's own email package because we nominally supported Python back to 2.1.x and the email package in the older Pythons wasn't new enough.

You don't want to use email 4.0.1 with Mailman 2.1.9 because of deprecation warnings and maybe other problems that will result.

Does this problem affect only this one message or do you see it more often?

I'm still interested in what's in the in the archives/private/iaep.mbox/iaep.mbox file.

I'm also curious because I didn't see the "second message" that would have been the

 From there, we broke into teams of 5 or 6 people to collaboratively
 explore subsets of those issues for about an hour.
 ...

message split off by the archiver anywhere in <http://lists.sugarlabs.org/archive/iaep/2009-May/date.html>.

Bernie Innocenti (codewiz) wrote :

> I'm still interested in what's in the in the archives/private/iaep.mbox/iaep.mbox file.

It is indeed escaped in the mbox file, like so:

We started Sunday by brain-storming about issues and challenges with
Sugar and Sugar Labs which could benefit from further exploration.
>From there, we broke into teams of 5 or 6 people to collaboratively
explore subsets of those issues for about an hour.

We have seen this problem twice already. I'd guess it happens every time someone sends a message containing a line starting with "From ".

Mark Sapiro (msapiro) wrote :

I am unable to duplicate the problem in current Mailman, and as you have seen, there really aren't any changes between 2.1.9 and current that would affect this. I suspect it may be caused by this Debian patch <http://patch-tracking.debian.net/patch/series/view/mailman/1:2.1.9-7/77_header_folding_in_attachments.patch>, but I haven't tested this theory.

Bernie Innocenti (codewiz) wrote :

Thanks Mark for looking into it.

That patch really smells like it could be causing this bug. I've subscribed its author to this bug to get some insight.

On Wed, May 20, 2009 at 10:35:38PM -0000, Bernie Innocenti wrote:

> That patch really smells like it could be causing this bug. I've
> subscribed its author to this bug to get some insight.

Looks plausible. It probably means that one call to Message.as_string
somewhere _does_ need mangle_from_=True (an argument to be added to
as_string, to be passed to Generator), but I don't know enough about
Mailman internals to know which one. I guesstimated the one in
Mailman/Archiver/pipermail.py, but that does not fix the issue. The
very best way to solve this would be an upstream fix for the bug that
this Debian patch fixes ;-) , that is
http://bugs.debian.org/244673

--
Lionel

Mark Sapiro (msapiro) wrote :

I have attached 77_header_folding_in_attachments_refactored.patch. This is still against the 2.1.9 base as that is where I was doing the testing but porting it forward should be trivial.

The refactored patch does three things:

It adds a Mailman/Generator.py module which is the same as that in the original patch except mangle_from_ defaults to True.

It patches Mailman/Message.py to add an as_string() override method in the Message.Message class as does the original, but it also adds an optional mangle_from_ argument which again defaults to True.

It patches the bulkdeliver function in Mailman/Handlers/SMTPDirect.py to call msg.as_string(mangle_from_=False) to generate the flatened message for mailing without mangling the from.

In addition, I left the patch to Mailman/Mailbox.py as is for its potential effect on headers in MIME format digests. As far as embedded From_ lines are concerned, I think they have to be escaped or processing the digest.mbox to produce the digest will fail. Also, failing to escape them may leave them unescaped in the archive LISTNAME.mbox/LISTNAME.mbox which would be bad.

Another issue had me scratching my head for a while. It is that all the mangle_from_ suppression may be futile. The "don't fold headers" part of the patch may be relevant anyway for messages without embedded From_ lines, but MTAs/MDAs may mangle the embedded From_ anyway even if Mailman doesn't. In my case, my test environment uses Exim as an MTA/MDA and Exim escapes From_ lines in the body somewhere between receiving the message and delivering it to a local mailbox, so even though the message left Mailman with the embedded From_ unchanged, it was ultimately escaped in my incoming mailbox.

On Thu, May 21, 2009 at 11:35:05PM -0000, Mark Sapiro wrote:

> I have attached 77_header_folding_in_attachments_refactored.patch. This
> is still against the 2.1.9 base as that is where I was doing the testing
> but porting it forward should be trivial.

Thank you. I tested it fixes launchpad bug 266068, and still fixes
265967. Will you apply it upstream, thereby closing 265967?

> Another issue had me scratching my head for a while. It is that all
> the mangle_from_ suppression may be futile.

Yes, I had the same gut feeling. Any signature protocol should escape
From lines in some shape (e.g. ">From" or " - From" or
quoted-printable encode the F or ...) before signing, or it will be
quite fragile. All (known to me or the author of GnuPG)
implementations of OpenPGP do it. Some MUAs do the quoted-printable
trick.

I originally left this part of the patch in because I was unsure about
S/MIME.

--
Lionel

Mark Sapiro (msapiro) wrote :

Yes, I will apply it upstream

Bernie Innocenti (codewiz) wrote :

On 05/22/09 20:40, Mark Sapiro wrote:
> Yes, I will apply it upstream

Thanks!

--
   // Bernie Innocenti - http://codewiz.org/
 \X/ Sugar Labs - http://sugarlabs.org/

Mark Sapiro (msapiro) on 2009-07-22
Changed in mailman:
assignee: nobody → Mark Sapiro (msapiro)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.