Stops bloat in pipermail article databases
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
GNU Mailman |
Fix Released
|
Medium
|
Unassigned |
Bug Description
The standard pipermail archiving code saves the body text,
in HTML format, of every article in the -article database of
each archived list. This bloats the size of
these databases. Because they are pickled data structures,
which are loaded into memory in their entirety when
archiving operations for a list are being handled, this bloat
can substantially prejudice archiver performance and in the
limit, for lists carrying heavy traffic and/or receiving large
text postings, bring archiving to a grinding halt.
This patch changes HyperArch.py and pipermail.py so that
the data stored in the pipermail <code>$
<listname>
include the body text, in HTML format, of each article. This
reduces the size of the -article database for each list. The
benefits of this are most pronounced with high traffic lists
and those to which large text postings are made.
The patch also adds a script $prefix/bin/rb-arch which will
remove any body text, in HTML format, from existing -
article databases; this junk HTML is no longer added when
new articles are added to the databases but existing junk
HTML is not deleted unless this script is run. The alternative
is to run $prefix/bin/arch for a list.
Apply the patch from within the Mailman build directory
using the command:
patch -p1 < path-to-patch-file
You use this patch at your own risk and I would appreciate
feedback about whether it works for you if you use it or/and
any problems you encounter with the patch.
[http://
Accepted for MM2.1.4.