update_hierarchy_path in artefacts/lib.php hammers sql when copying collections

Bug #1724603 reported by Brian Merritt on 2017-10-18
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mahara
High
Unassigned

Bug Description

A teacher asked 40 students to copy a 15 page collection with numerous artefacts on each page, which crippled our mysql server.

When testing it, even copying the collection once resulted in the web server timing out and raised the sql load incredibly.

The code "update_hierarchy_path" in the artefact/lib.php on line 1423 runs an sql query as below:

    `$sql = "UPDATE {artefact} SET path = ? || SUBSTR(path, ?) WHERE (path = ? OR path LIKE ? )";`

The artefact table in Mahara does not index the `path` column, so whilst updating one artefact is not a major issue, but updating the path column for many artefacts hits the database massively.

Indexing the path column (which is 1024 bytes) may not be a good solution long term, but either the query needs to be made more efficient or the column indexed.

Mahara version 17.04_STABLE (updated about a month ago)
Linux RHEL7
MYSQL 5.6
Browser is current chromium Version 61.0.3163.100 (but that is not relevant)

Robert Lyon (robertl-9) wrote :

Hi Brian,

Good point about the missing index on the path column.

The first thing I'd try is adding a unique index to the 'path' column as each path should be unique as they all end with the id of the artefact itself and that is unique.

The purpose of the 'path' column (if I remember right) is to handle the hierarchy issue where the child items are older (lower id) than the parent items. Eg if some files are uploaded then some folders are made and the files are moved into the folders we could end up with paths like

2/5/192/16
2/5/192/412
2/5/193/6
2/5/193/77

which were complicated to sort correctly with just the 'id' and 'parent' columns

It might make sense to have a 'artefact_path_structure' table to handle things instead of the 'path' column in the future.

Cheers

Robert

Changed in mahara:
status: New → Confirmed
Robert Lyon (robertl-9) on 2017-10-31
Changed in mahara:
importance: Undecided → High
Changed in mahara:
assignee: nobody → Cecilia Vela Gurovic (ceciliavg)
Changed in mahara:
assignee: Cecilia Vela Gurovic (ceciliavg) → nobody
Changed in mahara:
milestone: none → 18.04.0
issam.taboubi (issam-tab) wrote :

Hi Robert,

I would like to share a solution that I applied in our institution.

We have a collection (66 pages) that takes 130 second to be copied because of the method "update_hierarchy_path" doing update query on the table "artefact" (in our case 500K records) , with no index in the query it does not help.

We came up with the idea of adding the column owner in update query, because we update the hierarchy of the path for the artifacts we just inserted. And since there is an index on the owner column this has lowered the processing from 139 seconds to 19 seconds.

Here are the two modified lines of code (line 1429 of artefact/lib.php):

$params = array($newparent->path, $length, $this->owner, $this->path, db_like_escape("{$this->path}/") . '%');
 $sql = "UPDATE {artefact} SET path = ? || SUBSTR(path, ?) WHERE owner = ? AND (path = ? OR path LIKE ? )";

I want to know if there are any side effects for using the owner column in the query.

Please advise

Thank you.

Hello Issam,

Thank you for the suggestion. That's an incredible improvement! We'll add your suggestion to a patch in our review system to facilitate code review.

Cheers
Kristina

Robert Lyon (robertl-9) wrote :

Hi Issam,

I've added a patch with your change https://reviews.mahara.org/#/c/8492/

It has some adjustments to deal with groups/institutions copying things as well

issam.taboubi (issam-tab) wrote :

Amazing, thank you Robert

Changed in mahara:
status: Confirmed → In Progress

Reviewed: https://reviews.mahara.org/8492
Committed: https://git.mahara.org/mahara/mahara/commit/4b1f57b90374b3cc772b1560e1d3d05184716128
Submitter: Cecilia Vela Gurovic (<email address hidden>)
Branch: master

commit 4b1f57b90374b3cc772b1560e1d3d05184716128
Author: issam.taboubi <email address hidden>
Date: Thu Feb 1 14:07:01 2018 +1300

Bug 1724603: Adding 'owner' column to update for update_hierarchy_path()

To make use of the indexing on that colum to speed things up

behatnotneeded

Change-Id: I4503b8c4b600fea28de9ffff854ec1f40ea5a2e0
Signed-off-by: Robert Lyon <email address hidden>

Changed in mahara:
status: In Progress → Fix Committed
Robert Lyon (robertl-9) on 2018-04-05
Changed in mahara:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers