arch corrupts archives, but only for the last month

Bug #266388 reported by Desrod-users
2
Affects Status Importance Assigned to Milestone
GNU Mailman
Fix Released
Critical
Unassigned

Bug Description

I think this problem has been reported before in previous versions, and its
back again in 2.1.9.

When I regenerate archives for our lists, if ANY message contains a '<'
character in the body, Mailman splits it as a new message, and everything
after that gets corrupted.

This means if someone pastes some XML into the body of a message (which
happens quite often on our lists) or some HTML, or the headers of an email,
Mailman will break it, but *ONLY* for the latest month's messages, even if
the message that started it, was months or years ago.

If a message sent in April of 2003 includes an '<' as the first character
anywhere in the body of the message, February 2007's archive will be
corrupted.

You can see the results of this over here:

http://lists.plkr.org/pipermail/plucker-list/2007-February/thread.html

And also here:

http://lists.plkr.org/pipermail/plucker-dev/2007-February/thread.html

The raw mbox files are fine, every message is intact.

I don't see this problem on other lists I maintain, it only seems to affect
lists where HTML or XML or mail headers are pasted into the body of the
message.

I'd call this grave, because its odd how it just dumps itself on the latest
month's archive, when the latest month's messages don't even have the
problem.

[http://sourceforge.net/tracker/index.php?func=detail&aid=1661574&group_id=103&atid=100103]

Revision history for this message
Mark Sapiro (msapiro) wrote :

Originator: NO

What am I looking for at
<http://lists.plkr.org/pipermail/plucker-list/2007-February/thread.html>?
It looks OK to me.

<http://lists.plkr.org/pipermail/plucker-dev/2007-February/thread.html>
returns a 404.

There is an issue in that if the body of some message in the
archives/private/<listname>.mbox/<listname>.mbox file (or whatever mbox is
input to bin/arch) contains a line that begins with "From ", the archiver
takes that line as an mbox message separator and the message is truncated
at that point, and the rest of the message is seen as a new message without
a date so it is archived with the current date.

It sounds like that may be what you are seeing, but it has nothing to do
with a '<' as the first character of a line. It has to do with 'unescaped'
'From ' lines in the bodies of messages.

Mailman currently precedes any 'From ' at the beginning of a body line
with a '>' making it '>From ' in the .mbox and avoiding the problem, but
old .mbox files and .mbox files from other sources may have unescaped 'From
' lines.

There is a bin/cleanarch script distributed with Mailman to help 'fix' old
.mbox files with this problem.

Revision history for this message
Mark Sapiro (msapiro) wrote :

Originator: NO

No response after 6 weeks. I'm setting status to Pending which will
automatically close in 2 more weeks.

Revision history for this message
Sf-robot (sf-robot) wrote :

Originator: NO

This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.