Split multiple user activity notifications into chunks

Bug #778254 reported by Richard Mansfield
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mahara
Fix Released
High
Son Nguyen

Bug Description

See bug #716812.

When there are lots of users that need to be notified about a single activity, they need to be chunked and processed over several cron runs to avoid running out of memory.

tags: added: scalability
Revision history for this message
Robert Lyon (robertl-9) wrote :

This problem still exists and would be good to sort out as it means notifications for an activity like a forum posts can crash the system

Changed in mahara:
importance: Low → High
Revision history for this message
Robert Lyon (robertl-9) wrote :

A possible solution to this is we could add a column to the activity queue table to indicate the highest userid that has been processed for a cron run and limit the cron run to do only a certain amount each time so the cron just pocks up from userid x+1

Revision history for this message
Aaron Wells (u-aaronw) wrote :

So here's why this is complicated.

Mahara has a separation between "Activities" and "Notifications". The Activities are the things that cause notifications to be sent out -- posting to a forum, posting to a wall, sending a user message, these all create an Activity record in the "activity_queue" table.

Then, the activity_process_queue() method is called by the cron. It pulls all the existing records from activity_queue, and for each one it deletes it (to avoid duplicate messages going out) and then calls "handle_activity()" which then instantiates the right AcitivtyType class, and calls its ->notify_users() method.

So the problem is that, because we delete the record from activity_queue, everything has to be processed in one go. And in the case of the mahara.org news forum, the 36,000 records to process is too many.

The proposed fix is:

1. Add a "last_processed_user" column to "activity_queue".
2. In the cron, only process users whose usr.id > last_processed_user
3. After processing each user, update the activity_queue.last_processed_user value to that user's id.
4. After processing a certain number of users (1000 is probably good), quit processing users
5. If all the users get processed, then delete the activity_queue record.

It appears that we'll need to make some changes to activity_process_queue(), handle_activity()... and several methods in the ActivityType class, including the ActivityType->__construct() (which is where the list of users to notify is generated), activity_get_users(), and/or ActivityType->notify_users().

Hm... so, that it is a little tricky. None of the standard ActivityType subclasses override the constructor or the notify_users() method, so... that's good. It may be worth rethinking how we do all this ActivityType stuff in order to make it easier to split it into chunks.

Son Nguyen (ngson2000)
Changed in mahara:
assignee: nobody → Son Nguyen (ngson2000)
status: Confirmed → In Progress
Revision history for this message
Aaron Wells (u-aaronw) wrote :

For the specific case of the mahara.org News forum, I think what we should do is turn off auto-subscription. Currently it's set so that every user who registers a mahara.org account gets joined to the Mahara Community group, and subscribed to the News forum.

Or, to put it another way, perhaps we should reserve the News forum for news about *mahara.org* itself (because maybe there will be times when we need to send an email to every user on mahara.org?); and create a separate forum for news about Mahara.

Revision history for this message
Kristina Hoeppner (kris-hoeppner) wrote :

Changing the subscription may solve part of the problem, but not really its cause because you may need a forum to really reach everyone and then you could still not do it.

Revision history for this message
Son Nguyen (ngson2000) wrote :
Revision history for this message
Son Nguyen (ngson2000) wrote :

Note for testing:

1. Add an group and turn on "Auto-add users"
2. Add hundreds users via CSV (attached file)
3. Change the minute of activity_process_queue in the table 'cron' to '*'
4. Change the minute of interaction_forum_new_post in the table 'interaction_cron' to '*'

The chunk size is set to '1000' (Activity::USERCHUNK_SIZE)

You may change it to smaller value to see if it works

Revision history for this message
Mahara Bot (dev-mahara) wrote : A change has been merged

Reviewed: https://reviews.mahara.org/2729
Committed: http://gitorious.org/mahara/mahara/commit/3921ebdd0f5a07403a8188af742c549bca765b2d
Submitter: Son Nguyen (<email address hidden>)
Branch: master

commit 3921ebdd0f5a07403a8188af742c549bca765b2d
Author: Son Nguyen <email address hidden>
Date: Thu Nov 21 15:55:00 2013 +1300

Split multiple user activity notifications into chunks (Bug 778254)

1. Add a new column 'last_processed_userid' into the table
activity_queue
2. Enable the notify_users() method to work with split user chunks
3. Modify the activity_process_queue() function to update the
activity_queue with the new process

Change-Id: Id404aa8e8fa6515cfeb557a4ea11c14c43771fe7
Signed-off-by: Son Nguyen <email address hidden>

Revision history for this message
Aaron Wells (u-aaronw) wrote :

So it turns out the implementation in in patch 2729 still errored out mahara.org. By placing a lot of log_info() statements throughout the activity_process_queue() cron job, I was able to trace the crash to the part where it does activity_get_users().

The reason this is a problem, is because patch 2729 doesn't actually chunk the data pulled from the database; instead, it pulls the full amount of data and then only processes a subset of it.

So, I've submitted a patch which actual alters ActivityInteractionForumNewPost so that only pulls CHUNK_SIZE users from the DB each time. With this patch, the mahara.org News forum actually sends out notifications!

https://reviews.mahara.org/2790

Revision history for this message
Mahara Bot (dev-mahara) wrote :

Reviewed: https://reviews.mahara.org/2790
Committed: http://gitorious.org/mahara/mahara/commit/876305ec4e20d0e81b35d57e1d03716057f713db
Submitter: Robert Lyon (<email address hidden>)
Branch: master

commit 876305ec4e20d0e81b35d57e1d03716057f713db
Author: Aaron Wells <email address hidden>
Date: Wed Dec 11 15:04:39 2013 +1300

Making the notification batching system also batch the data pulled from the DB

Bug 778254

Change-Id: I273b347070266d2229159f838d0f7a061c63250b

Robert Lyon (robertl-9)
Changed in mahara:
status: In Progress → Fix Committed
Aaron Wells (u-aaronw)
Changed in mahara:
milestone: none → 1.9.0
tags: added: nominatedfeature
Robert Lyon (robertl-9)
Changed in mahara:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.