Comment 10 for bug 1614403

Revision history for this message
kaputtnik (franku) wrote :

I couldn't believe that there is no ready solution out there... all i found are using the akismet service.

Wouldn't it be possible to make a pure python/django spam filter? Analyzing a comment/post/text by different analyzers and give them a weight...

1. Analyze newlines: How many newlines are in this comment regarding the number of words in each line?
2. Analyze the Text against some phrases: How often are specific phrases (love purchase, buy, ..) used in the text?
3. Analyze against numbers: Is there maybe a phone number in this text and how often does it appear?
4. Analyze if markup is intentionally used (f.e. double space at end of line)
5. Analyze external links: Maybe using a whitelist containing known image hoster or launchpad.*.
6. ...

Each analyzer returns a 'weight', a number showing how much the analyzer applies. If the overall weight of all analyzers is above a specific number, the comment is maybe spam.

Yes, i know, it's always easier said then done... just an idea...