Add SpamAssassin filter to mail pipeline

Bug #266588 reported by James Henstridge
6
Affects Status Importance Assigned to Milestone
GNU Mailman
Confirmed
Low
Unassigned

Bug Description

This filter adds support for discarding or holding spam
sent to the mailing list. It contacts a spamd daemon
(from SpamAssassin -- http://spamassassin.taint.org) to
score the message.

If the score is above a certain threshold (default 10),
the message is discarded and an entry is written to the
vette log.

If the score is above another lower threshold (default
5), the message is held for moderation.

The SpamAssassin.py file should be installed in
Mailman/Handlers/. The LIST_PIPELINE variable in
Mailman/Handlers/HandlerAPI.py should be modified to
include a 'SpamAssassin' item (I put it just after the
existing 'SpamDetect' item).

To change the defaults, the following can be added to
the mm_cfg.py file:
  SPAMASSASSIN_HOST = 'host:port' # how to contact SA
  SPAMASSASSIN_DISCARD_SCORE = 10
  SPAMASSASSIN_HOLD_SCORE = 5

If you don't want to discard messages, then
DISCARD_SCORE can be set to something very high (1000
should do it).

It looks the MM2.1 filter APIs have changed a bit, so
this filter will need some modifications to work with
that version. When I get round to upgrading, I might
look into updating it.

[http://sourceforge.net/tracker/index.php?func=detail&aid=534577&group_id=103&atid=300103]

Revision history for this message
James Henstridge (jamesh) wrote :
Revision history for this message
James Henstridge (jamesh) wrote :

There is a fairly easy optimisation for this filter that I
missed when writing it. It calls str() on the message
object twice. It would be quicker to call str() on the
message once.

Revision history for this message
James Henstridge (jamesh) wrote :

Just attached my updated version of the patch. This version
requires SpamAssassin 2.20 (for the extra commands that the
spamd daemon understands). It now displays a list of which
rules were triggered for held messages, and can give
messages from list members a bonus (defaults to 2), so that
they are less likely to get held as spam.

Revision history for this message
James Henstridge (jamesh) wrote :

This version is essentially the same as the previous
version, but adds compatibility with python > 1.5.2, which
doesn't like you passing two arguments to socket.connect().

Revision history for this message
Jafo-users (jafo-users) wrote :

FYI: I've been running the 2002-05-14 version of this patch
with spamassassin 2.20 for the last day on our main mailman
box and it seems to be working great.

Revision history for this message
James Henstridge (jamesh) wrote :

Yet another version. There were some bugs in handling of
certain error conditions when talking to spamd. These would
result in exceptions and the messages staying in the
delivery queue :(

With the new version, the message will be passed through
unchecked under these conditions, and a message will be
added to the error log.

Revision history for this message
Jafo-users (jafo-users) wrote :

FYI, I ran the previous version since installation and it
seemed to work fine. I didn't run into any problems, with
probably 500 messages handled. I've updated to the new
version and it seems ok so far, but I've only sent about 10
messages through.

Sean

Revision history for this message
James Henstridge (jamesh) wrote :

The Mailman installation on mail.gnome.org also uses this
filter. I don't think there are any stability problems with
the filter.

Revision history for this message
James Henstridge (jamesh) wrote :

Yet another new version that fixes a small typo. With
previous messages, you couldn't approve messages that had
been identified as spam once (they would get identified
again when the queue got processed, instead of passing the
message through).

Revision history for this message
James Henstridge (jamesh) wrote :

remembering to check the "upload file" checkbox this time ...

Revision history for this message
dann frazier (dannf) wrote :

hey James,
  found a typo. also wanted to point out:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=139942&repeatmerged=yes

--- SpamAssassin.py.orig Sat Aug 17 12:05:41 2002
+++ SpamAssassin.py Sat Aug 17 12:06:13 2002
@@ -35,7 +35,7 @@
     SPAMD_HOST = mm_cfg.SPAMASSASSIN_HOST
     i = string.find(SPAMD_HOST, ':')
     if i >= 0:
- SPAMD_HOST, SPAMD_PORT = SPAMD_HOST[:i], host[i+1:]
+ SPAMD_HOST, SPAMD_PORT = SPAMD_HOST[:i],
SPAMD_HOST[i+1:]
         try: SPAMD_PORT = int(SPAMD_PORT)
         except: SPAMD_PORT = None
 except:

Revision history for this message
Jafo-users (jafo-users) wrote :

How about changing that chunk of code to:

   SPAMD_HOST = 'localhost'
   SPAMD_PORT = None
   if hasattr(mm_cfg, 'SPAMASSASSIN_HOST):
       SPAMD_HOST = mm_cfg.SPAMASSASSIN_HOST
       try:
           SPAMD_HOST, SPAMD_PORT = string.split(SPAMD_HOST,
':', 1)
           SPAMD_PORT = int(SPAMD_PORT)
       except ValueError:
           SPAMD_PORT = None
   if not SPAMD_PORT: SPAMD_PORT = 783

This gets rid of the "bare except"s, and I think it's a
little clearer than the previous code. The ValueError will
be tripped if the string doesn't have a : in it, or if the
int coercion fails. Though perhaps in that instance you'd
want to log an error or something...

Sean

Revision history for this message
Jafo-users (jafo-users) wrote :

That last one had a missing quote. Try this patch:

*** SpamAssassin.py.orig Fri Aug 23 00:28:59 2002
--- SpamAssassin.py Fri Aug 23 00:31:00 2002
***************
*** 30,45 ****
  from Mailman.Logging.Syslog import syslog
  from Hold import hold_for_approval

! SPAMD_PORT = 0
! try:
! SPAMD_HOST = mm_cfg.SPAMASSASSIN_HOST
! i = string.find(SPAMD_HOST, ':')
! if i >= 0:
! SPAMD_HOST, SPAMD_PORT = SPAMD_HOST[:i], host[i+1:]
! try: SPAMD_PORT = int(SPAMD_PORT)
! except: SPAMD_PORT = None
! except:
! SPAMD_HOST = 'localhost'
  if not SPAMD_PORT: SPAMD_PORT = 783

  try: DISCARD_SCORE = mm_cfg.SPAMASSASSIN_DISCARD_SCORE
--- 30,44 ----
  from Mailman.Logging.Syslog import syslog
  from Hold import hold_for_approval

! SPAMD_HOST = 'localhost'
! SPAMD_PORT = None
! if hasattr(mm_cfg, 'SPAMASSASSIN_HOST'):
! SPAMD_HOST = mm_cfg.SPAMASSASSIN_HOST
! try:
! SPAMD_HOST, SPAMD_PORT =
string.split(SPAMD_HOST, ':', 1)
! SPAMD_PORT = int(SPAMD_PORT)
! except ValueError:
! SPAMD_PORT = None
  if not SPAMD_PORT: SPAMD_PORT = 783

  try: DISCARD_SCORE = mm_cfg.SPAMASSASSIN_DISCARD_SCORE

Sean

Revision history for this message
James Henstridge (jamesh) wrote : spamd.py (17/03/2003)

Other attachments

Revision history for this message
James Henstridge (jamesh) wrote :

Attached is an updated version of the filter for adding
SpamAssassin support to mailman. This version is targetted
at Mailman 2.1.x.

The code for talking to spamd has been split out into a
separate file, so that it can be updated independently of
the Mailman specific code. It has also been updated to work
with SpamAssassin 2.50 (and should be a lot more robust to
future additions to the spamd protocol).

The filter has also been changed to use the list name as the
username passed to spamd, which means that separate
auto-whitelists and bayes databases can be maintained for
each list.

Installation is trivial. Simply copy spamd.py and
SpamAssassin.py to the Mailman/Handlers directory and add
the following line to Mailman/mm_cfg.py:
  GLOBAL_PIPELINE.insert(1, 'SpamAssassin')

Revision history for this message
Phelim-gervase (phelim-gervase) wrote :

After installing patch 668685 for the HTDig integration into
Mailman 2.1.1, I started getting the following:

May 02 16:50:34 2003 (23484) Uncaught runner exception:
global name 'False' is not defined
May 02 16:50:34 2003 (23484) Traceback (most recent call last):
  File "/var/mailman2/Mailman/Queue/Runner.py", line 105, in
_oneloop
    self._onefile(msg, msgdata)
  File "/var/mailman2/Mailman/Queue/Runner.py", line 155, in
_onefile
    keepqueued = self._dispose(mlist, msg, msgdata)
  File "/var/mailman2/Mailman/Queue/IncomingRunner.py", line
130, in _dispose
    more = self._dopipeline(mlist, msg, msgdata, pipeline)
  File "/var/mailman2/Mailman/Queue/IncomingRunner.py", line
153, in _dopipeline
    sys.modules[modname].process(mlist, msg, msgdata)
  File "/var/mailman2/Mailman/Handlers/SpamAssassin.py",
line 75, in process
    score, symbols = check_message(mlist, str(msg))
  File "/var/mailman2/Mailman/Handlers/SpamAssassin.py",
line 57, in check_message
    connection = spamd.SpamdConnection(SPAMD_HOST)
  File "/var/mailman2/Mailman/Handlers/spamd.py", line 79,
in __init__
    self.request_headers =
mimetools.Message(StringIO.StringIO(), seekable=False)
NameError: global name 'False' is not defined

I corrected this by defining "False = 0" in spamd.py. I
don't know what the "real" solution should be though.

Revision history for this message
James Henstridge (jamesh) wrote : spamd.py (06/05/2003)

Other attachments

Revision history for this message
James Henstridge (jamesh) wrote :

I have just attached updated versions of the patches (dated
06/05/2003). These versions include a number of bug fixes
that I have been testing locally for a while. I also added
a similar workaround for the True/False usage (the True and
False constants were only added in Python 2.2.1 and 2.3a).

This version also puts the SpamAssassin score in the
"reason" for held messages, which means you can easily see
the scores of messages in the new Mailman 2.1 moderation
overview page.

I have also put together some documentation on the Mailman
setup I use:

http://www.daa.com.au/~james/articles/mailman-spamassassin/

This includes information on how to set up an unprivileged
spamd that maintains separate Bayes databases for each
mailing list.

Revision history for this message
Kink-users (kink-users) wrote :

Originator: NO

Since MM 2.1.10, the matches_p function has changed behaviour and hence
this handler has stopped working. This simple patch fixes that:

Index: SpamAssassin.py
===================================================================
--- SpamAssassin.py (revision 551)
+++ SpamAssassin.py (working copy)
@@ -78,7 +78,7 @@
     if MEMBER_BONUS != 0:
         for sender in msg.get_senders():
             if mlist.isMember(sender) or \
- matches_p(sender, mlist.accept_these_nonmembers):
+ matches_p(sender, mlist.accept_these_nonmembers,
mlist.internal_name()):
                 score -= MEMBER_BONUS
                 break

Revision history for this message
Jean.c.h (slug71) wrote :

Marked this bug as 'Invalid' due to its age and nothing further has been added in a long time. New versions have been released since as well as some underlying stuff in the OS platform itself.

If this bug still affects then please change status back to 'Confirmed'.

Changed in mailman:
status: New → Invalid
Revision history for this message
Mark Sapiro (msapiro) wrote :

It's not a bug, it's a patch, and it's still relevant.

Changed in mailman:
status: Invalid → Confirmed
Revision history for this message
Mark Sapiro (msapiro) wrote :

The patch in comment #19 is garbled. Here it is attached as a file.

Revision history for this message
Mark Sapiro (msapiro) wrote :

Attached is a patch to the spamd.py in comment #17. The intent of this patch is to catch a "[Errno 104] Connection reset by peer" exception as reported at http://mail.python.org/pipermail/mailman-users/2010-April/069283.html and which apparently occurs because of a SpamAssassin timeout, and handle it as other spamd communication errors are handled so the message isn't shunted.

Revision history for this message
mark.now (mark-now) wrote :

After installing patch in comment #19 and #22 SpamAssassin.py patch I started to get following error:

 Aug 18 16:14:38 2010 (26738) Uncaught runner exception: unsupported operand type(s) for -=: 'float' and 'str'
Aug 18 16:14:39 2010 (26738) Traceback (most recent call last):
  File "/usr/local/mailman/Mailman/Queue/Runner.py", line 120, in _oneloop
    self._onefile(msg, msgdata)
  File "/usr/local/mailman/Mailman/Queue/Runner.py", line 191, in _onefile
    keepqueued = self._dispose(mlist, msg, msgdata)
  File "/usr/local/mailman/Mailman/Queue/IncomingRunner.py", line 130, in _dispose
    more = self._dopipeline(mlist, msg, msgdata, pipeline)
  File "/usr/local/mailman/Mailman/Queue/IncomingRunner.py", line 153, in _dopipeline
    sys.modules[modname].process(mlist, msg, msgdata)
  File "/usr/local/mailman/Mailman/Handlers/SpamAssassin.py", line 83, in process
    score -= MEMBER_BONUS
TypeError: unsupported operand type(s) for -=: 'float' and 'str'

I guess TypeError: unsupported operand type(s) for -=: 'float' and 'str' is because the variable type are mixed up.

I changed score -= MEMBER_BONUS into score -= (MEMBER_BONUS) on line 83 and it seems to work ...

Revision history for this message
Mark Sapiro (msapiro) wrote :

Regarding comment #24:

Have you defined SPAMASSASSIN_MEMBER_BONUS in Defaults.py and/or mm_cfg.py? Did you define it as a string rather than as a float? i.e. something like

SPAMASSASSIN_MEMBER_BONUS = '5'

as opposed to

SPAMASSASSIN_MEMBER_BONUS = 5.0

if MEMBER_BONUS actually has a string value, simply changing score -= MEMBER_BONUS into score -= (MEMBER_BONUS) won't help. Did you mean score -= float(MEMBER_BONUS)?

Revision history for this message
Mark Sapiro (msapiro) wrote :

As of Mailman 2.1.21 the method for checking accept_these_nonmembers has changed again.

For Mailman prior to 2.1.10, use the SpamAssassin.py attached to comment #18.

For Mailman >= 2.1.10 and < 2.1.21 use that SpamAssassin.py patched with the patch attached to comment #22.

For Mailman >= 2.1.21 use the SpamAssassin.py attached here.

Revision history for this message
Mark Sapiro (msapiro) wrote :

For completeness, this is the spamd.py from comment #17 with the patch from comment #23 applied. It should be applicible for any Mailman 2.1 version.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.