update_hierarchy_path in artefacts/lib.php hammers sql when copying collections
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mahara |
Fix Released
|
High
|
Unassigned |
Bug Description
A teacher asked 40 students to copy a 15 page collection with numerous artefacts on each page, which crippled our mysql server.
When testing it, even copying the collection once resulted in the web server timing out and raised the sql load incredibly.
The code "update_
`$sql = "UPDATE {artefact} SET path = ? || SUBSTR(path, ?) WHERE (path = ? OR path LIKE ? )";`
The artefact table in Mahara does not index the `path` column, so whilst updating one artefact is not a major issue, but updating the path column for many artefacts hits the database massively.
Indexing the path column (which is 1024 bytes) may not be a good solution long term, but either the query needs to be made more efficient or the column indexed.
Mahara version 17.04_STABLE (updated about a month ago)
Linux RHEL7
MYSQL 5.6
Browser is current chromium Version 61.0.3163.100 (but that is not relevant)
Changed in mahara: | |
importance: | Undecided → High |
Changed in mahara: | |
assignee: | nobody → Cecilia Vela Gurovic (ceciliavg) |
Changed in mahara: | |
assignee: | Cecilia Vela Gurovic (ceciliavg) → nobody |
Changed in mahara: | |
milestone: | none → 18.04.0 |
Changed in mahara: | |
status: | Confirmed → In Progress |
Changed in mahara: | |
status: | In Progress → Fix Committed |
Changed in mahara: | |
status: | Fix Committed → Fix Released |
Hi Brian,
Good point about the missing index on the path column.
The first thing I'd try is adding a unique index to the 'path' column as each path should be unique as they all end with the id of the artefact itself and that is unique.
The purpose of the 'path' column (if I remember right) is to handle the hierarchy issue where the child items are older (lower id) than the parent items. Eg if some files are uploaded then some folders are made and the files are moved into the folders we could end up with paths like
2/5/192/16
2/5/192/412
2/5/193/6
2/5/193/77
which were complicated to sort correctly with just the 'id' and 'parent' columns
It might make sense to have a 'artefact_ path_structure' table to handle things instead of the 'path' column in the future.
Cheers
Robert