Stops bloat in pipermail article databases

Bug #265984 reported by Ppsys
2
Affects Status Importance Assigned to Milestone
GNU Mailman
Fix Released
Medium
Unassigned

Bug Description

The standard pipermail archiving code saves the body text,
in HTML format, of every article in the -article database of
each archived list. This bloats the size of
these databases. Because they are pickled data structures,
which are loaded into memory in their entirety when
archiving operations for a list are being handled, this bloat
can substantially prejudice archiver performance and in the
limit, for lists carrying heavy traffic and/or receiving large
text postings, bring archiving to a grinding halt.

This patch changes HyperArch.py and pipermail.py so that
the data stored in the pipermail <code>$archives/private/
<listname>/database/<period>-article</code> does not
include the body text, in HTML format, of each article. This
reduces the size of the -article database for each list. The
benefits of this are most pronounced with high traffic lists
and those to which large text postings are made.

The patch also adds a script $prefix/bin/rb-arch which will
remove any body text, in HTML format, from existing -
article databases; this junk HTML is no longer added when
new articles are added to the databases but existing junk
HTML is not deleted unless this script is run. The alternative
is to run $prefix/bin/arch for a list.

Apply the patch from within the Mailman build directory
using the command:

    patch -p1 < path-to-patch-file

You use this patch at your own risk and I would appreciate
feedback about whether it works for you if you use it or/and
any problems you encounter with the patch.

[http://sourceforge.net/tracker/index.php?func=detail&aid=835332&group_id=103&atid=100103]

Tags: pipermail
Revision history for this message
Ppsys (ppsys) wrote :
Revision history for this message
Barry Warsaw (barry) wrote :

Accepted for MM2.1.4.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.