Cache page access to improve performance

Bug #1469594 reported by Aaron Wells
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mahara
Confirmed
Wishlist
Unassigned

Bug Description

The view_search() method is quite slow in Mahara sites that have a lot of pages. Likewise, really, any part of Mahara that needs to look up which pages a particular user has access to, tends to be quite slow, and to have quite complicated code.

The reason for this, is because of how granular our access controls are. In total, you can have access control records for:

1. Everyone
2. Logged-in users
4. Individual users
5. Friends of the page owner
6. Members of a particular group
7. People who hold a particular role in a particular group
8. Members of a particular institution
9. Administrators viewing pages marked objectionable
10. Secret URLs

And each of these can have a start time and end time, and there can be an overriding start time and end time. Of course, we don't tend to list the last two types on the "pages you have access to" list, so they don't really matter here.

I wonder if it would be possible, and if it would improve performance, if we used a cache table for all of this. The table would be something along these lines:

view_access_cache:
1. view_access_id NOT NULL
2. viewid NOT NULL
3. public (boolean)
4. loggedin (boolean)
3. usrid

When you're trying to find all the pages a user can view, we'd just search view_access_cache for records where "usrid = that user", and/or where public and/or loggedin is true. The other fields are primarily for convenience and record-keeping

We'd populate this by analyzing the view_access table, and for each row, we'd calculate all the users it applies to, and insert them into the view_access_cache table. The table would have to be initialized at the beginning by analyzing the entire table. After that, we'd update only in the following situations:

1. When we make changes to a page's permissions
2. A cron job would check for view_access records that have crossed their start date and/or stop date since the last cron run, and view's that have crossed their start date and/or stop date since the last cron run, and update their cache records accordingly.
3. Whenever a user becomes or stops being another user's friend
4. Whenever a user's role or membership in a group or institution changes

To keep things synced up, there could also be a less frequent cron job that verifies all the existing view_access_cache records still make sense. (Depending on how long it takes for this to run).

Revision history for this message
Aaron Wells (u-aaronw) wrote :

So with this setup, the query to get the list of pages you have access to would look like this:

select v.* from {view} v where exists (select 1 from {view_access_cache} vac where vac.viewid = v.id and vac.usr = ? or vac.public=1 or vac.loggedin=1);

Revision history for this message
Aaron Wells (u-aaronw) wrote :

The columns in view_access_cache above are just my initial brainstorm. It might make sense to add aditional ones, like maybe groupid to help with the "Pages shared with this group" block, or a column to indicate the last time the record was checked to see if it was still accurate.

Visiting a page, it would still make more sense to use the existing can_view_view() method. But we could update can_view_view() so that it ensures that the view_access_cache record is correct for that user & view.

Revision history for this message
Aaron Wells (u-aaronw) wrote :

I ran this idea past our database specialist and he said it's worth trying. He thinks the main challenge would be in keeping the cache table in sync with view_access.

He also pointed out that we can stop selecting the separate count(*) query in the "latest pages" block. The view_search() block, which is used by several places in the code, is built to support pagination, so it runs the query twice, once with count(*) to get the total, a second time with a LIMIT clause to get the current page. But "latest pages" isn't paginated, so there's no reason to do that separate count(*) query there.

Revision history for this message
Jean-Philippe Gaudreau (jp-gaudreau) wrote :

Hi Aaron,

We're actually experiencing performance issues with the "Latest changes I can view" block on the dashboard (16.04.4). I didn't dig the code yet but do you think it is related to this problem?

Thanks a lot!

Changed in mahara:
assignee: nobody → Cecilia Vela Gurovic (ceciliavg)
Revision history for this message
Cecilia Vela Gurovic (ceciliavg) wrote :

This is still a performance issue, so we'll need to assess the functions get_myviews_data() and view_search() from View class

These functions seem to be taking a long time to process, as we've seen on some Mahara sites, specially those using MySQL that have lots of pages and collections

We need to check to see
- where the bottleneck is (code or SQL queries)
- if we can improve complexity the code/queries so it's easier to understand and maintain

after that we will see if the solution proposed here is the best one fro the current Mahara version or we'll need to find another one

Doris Tam (doristam)
Changed in mahara:
assignee: Cecilia Vela Gurovic (ceciliavg) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.