Spam protection possibilities

Bug #1250641 reported by Aaron Wells
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mahara
Opinion
High
Unassigned

Bug Description

Mahara gets a steady annoying flow of forum spam. On a typical day it's 5 to 10 messages. This isn't a whole lot, but since the number of messages on mahara.org's forums is fairly low, this makes up about 40% of the messages each day.

I'm opening this bug to explore options to deal with this issue.

Revision history for this message
Aaron Wells (u-aaronw) wrote :

I've been told by Kristina that there have been previous attempts to curb the mahara.org forum spam, and the current approach of just manually deleting the messages was the best that could be done. I'll ask her for some more details about why the other approaches didn't work, so I can make sure I don't go down those same blind alleys.

The spam tends to be 1-3 messages posted from a newly registered account. Our account registration system does use Mahara's anti-spam functions, but these only block the simplest of automated attacks and could be circumvented by a slightly more targeted script. Previous investigations into the spamming apparently indicate that the spam is being posted by human beings, though.

Currently we just delete the spam messages and suspend the user accounts as they are found. However, this has the downside that by the time we notice the message, the email about it has already gone out to all forum subscribers.

Revision history for this message
Aaron Wells (u-aaronw) wrote :

As mentioned in https://bugs.launchpad.net/mahara/+bug/732522 , we're actually not even applying Mahara's current spam filters to the forum! So I think that's the first thing to try.

Some other strategies we discussed on IRC:

1. Moderation queue: Mahara currently has no moderation queue feature. For mahara.org we wouldn't want this because it would slow down discussion too much (i.e. person in UK makes a post, admins in NZ approve it the next day)

2. Bayesian spam detection: Could be useful, but non-trivial to add. There are plenty of existing libraries for this, though. And Son has even implemented his own as part of his PhD thesis.

3. Requiring human approval of user account registration: This feature is already implemented, so we'd just have to switch it on. On the other hand, it could be tricky to pick out the spammers from people who just aren't good at using English (and mahara.org has several non-English forums!). Also, it would frustrate legitimate new users trying to ask a question about their installation problems or fix something on the wiki.

4. Filtered moderation queue: If a message looks spammy, it goes in the moderation queue. Avoids the slowing of discussion, but it requires implementing numbers #1 and #3 above

Revision history for this message
Aaron Wells (u-aaronw) wrote :

Correction, on further inspection the mahara.org forum is already checking for blacklisted domains in forum posts. And it appears to work, because the only spam that's getting through has domains that aren't in the spamhaus blocklist.

We aren't applying the other anti-spam measures, like making sure there's a limited number of URLs in the post. But most of our spam only has 2 or 3 domains per message, and it's possible to spam just as annoyingly with only one URL.

So it would appear our one remaining option is Bayesian filtering.

Revision history for this message
Aaron Wells (u-aaronw) wrote :

Another suggestion, taken from phpBB: prevent posting of images & URLs until a user reaches 10 normal posts where other users have interacted.

Revision history for this message
Aaron Wells (u-aaronw) wrote :

I've implemented reCAPTCHA support and deployed it to mahara.org. We'll see whether that works. I'll also see about upstreaming the reCAPTCHA option to Mahara core.

Revision history for this message
Aaron Wells (u-aaronw) wrote :

I've spun off separate bug https://bugs.launchpad.net/mahara/+bug/1252098 to track the reCAPTCHA implementation. It's currently on mahara.org, but hasn't been upstreamed into gerrit yet.

Unfortunately, the reCAPTCHA didn't stop the spam. It continues at roughly the same rate as before.

So, I've moved on to implementing a new user probation system to prevent newly registered users from posting external links to the forums. I've spun off another bug to track that: https://bugs.launchpad.net/mahara/+bug/1252101

Robert Lyon (robertl-9)
summary: - mahara.org forum spam
+ [ongoing] mahara.org forum spam
Revision history for this message
Aaron Wells (u-aaronw) wrote : Re: [ongoing] mahara.org forum spam

We had another big chunk of forum spam this past Friday. A user registered and posted around 200 forum posts in Korean, with a domain name sort of spelled out with spaces, like "www example com". This went right past our probation checks, which only limit full URLs like "http://www.example.com", under the assumption that any other kind of message is sort of useless for a spammer. Apparently this spammer did not think it was useless.

So, we need to step up our spam game once again. After discussing it around the office, the latest proposal is to add a captcha for probationary users before they can make a forum post. I've filed a new bug about that here: https://bugs.launchpad.net/mahara/+bug/1432464

Revision history for this message
Aaron Wells (u-aaronw) wrote :

There's also this bug, about adding a spam-detection heuristic system (and moderation queue) to Mahara: https://bugs.launchpad.net/mahara/+bug/1293272

Revision history for this message
Aaron Wells (u-aaronw) wrote :

For the time being, we've just turned on manual approval for new user accounts.

Revision history for this message
Aaron Wells (u-aaronw) wrote :

I was giving this a little thought, and I think there are a few different milestones we can aim for here:

Milestone 1: Preventing a spammer from flooding email with dozens of posts in a single hour.

Milestone 2: Preventing a spammer from flooding email with dozens of posts over the course of a weekend.

Milestone 3: Total elimination of spam emails.

Different techniques would be needed for each of these. Milestone 3, in particular, would require total human moderation of posts by probationary users, and may not be feasible for us.

On a related note, it occurs to me that post rate throttling can't be a full solution to any of these, even Milestone 1. Even if we limited probationary users to, say, 3 posts per day, a spammer could simply register multiple accounts in order to circumvent that. We could try to throttle user registrations per IP or computer, but that's easily circumvented using proxies.

On the other hand, a global post rate throttle could be useful. Like, if there are more than 3 posts by *any* probationary user in a 24 hour period, we start sending all probationary user posts to moderation.

Revision history for this message
Kristina Hoeppner (kris-hoeppner) wrote :

We might also need to restrict with whom probationary users can share their pages. Instead of using forums, they could create pages and make those avaialble to individual users. I guess this could fall a bit into your scenario of preventing spammers to send many messages at once.

The same would go for sending messages to individual users.

Human moderation could be a possibility for forum posts, but would not work for user account creation. I've been spending about 1-1.5 hours each day dealing with new accounts. If we simply approved them it would be much less, but since we require users to provide a reason, I see why they sign up and thus actually let them know which Mahara instance they should be using for their uni work instead, ask who their lecturer is so I can get in touch with them. It is primarily students who request a login because their lecturer said they should. However, they don't always check the URL they should use correctly or are just told "Mahara" and then end up on mahara.org. So I'm not just looking out for spam accounts during the approval, but do a qualified approval process which in the long-run is better for the institutions using Mahara as they'll actually be able to have their users on their instances. Since we don't have a global Mahara instance for all and mahara.org is for discussions but not individual portfolios, I feel I have to do the qualified approval / denial as long as we require account confirmation.

Revision history for this message
Kristina Hoeppner (kris-hoeppner) wrote :

Using manual approval of new accounts has significantly reduced the forum spam. In Mahara 19.04 it is also possible to moderate any forum posts.

I'm closing this bug report for the time being as we are not going to take any further action at this stage. A number of the ideas that Aaron had in regards to reducing spam on a site are still valid and can be taken up again if needed at a later stage.

Changed in mahara:
assignee: Aaron Wells (u-aaronw) → nobody
status: Confirmed → Opinion
summary: - [ongoing] mahara.org forum spam
+ Spam protection possibilities
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.