Change "artefact.path" column to use the "nested set" technique for managing hierarchical data

Bug #1427885 reported by Aaron Wells
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mahara
Triaged
Wishlist
Unassigned

Bug Description

Originally, we just had each artefact store its parent ID. This is slow because it requires running multiple queries to find all the descendants of a node.

Then, we added a "path" element to each artefact. This is better, but you can't get a performance improvement by indexing the column, because most of the queries rely on the "LIKE" operator. (See https://bugs.launchpad.net/mahara/+bug/1423700 )

So if we want to squeeze more performance out of this, I think the one remaining thing to look into is the "nested set" technique. This technique results in very fast searches for descendants, with the cost of somewhat slower writes. http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/

There are even existing PHP libraries for using the technique, such as this one: http://www.sideralis.org/baobab/

Tags: performance
Aaron Wells (u-aaronw)
Changed in mahara:
importance: Undecided → Wishlist
milestone: none → 15.10.0
Revision history for this message
Aaron Wells (u-aaronw) wrote :

Another idea would be to sniff the DB, and if they're using Postgres 8.4 or later, we use a recursive query on the artefact.parent column. Which, apparently, will probably be more performant than examining the "path" column or using the nested set technique. http://www.postgresql.org/docs/8.4/static/queries-with.html

Revision history for this message
Robert Lyon (robertl-9) wrote :

There is a emulator procedure that mimics 'WITH RECURSIVE' for mysql

I'll attach it

Revision history for this message
Robert Lyon (robertl-9) wrote :

Actually, looking at the code I have in the migration script, it looks like it's meant to sort out artefact hierarchy - so I'll add it here:

        // need to do special sortby as child artefacts can have a lower id number than their parent
        // due to one being able to move older files into newer folders
        if (is_postgres()) {
            $sourcedata = get_recordset_sql("; WITH RECURSIVE cte AS
                (
                 SELECT *, id AS parentid, 1 AS hierarchy FROM artefact
                 WHERE parent IS NULL
                 UNION ALL
                 SELECT t.*, cte.parentid, hierarchy + 1 FROM artefact t INNER JOIN cte ON t.parent = cte.id
                ) SELECT * FROM cte ORDER BY hierarchy");
        }
        else if(is_mysql()) {
            // we need to do a trick as mysql doesn't have WITH RECURSIVE option
            // 1) we need to add the WITH_EMULATOR proceedure to mysql database
            // See the mysqli_with_recursive_alternative.txt in this dir

            // 2) we call the prepared staement
            $sourcedata = get_recordset_sql('CALL WITH_EMULATOR(\'cte\', \'SELECT *, id AS parentid, 1 AS hierarchy FROM artefact WHERE parent IS NULL\',\'SELECT t.*, cte.parentid, hierarchy + 1 FROM artefact t INNER JOIN cte ON t.parent = cte.id\',\'SELECT * FROM cte ORDER BY hierarchy\', 0, \'\')');
        }

Aaron Wells (u-aaronw)
Changed in mahara:
milestone: 15.10.0 → 16.04.0
Aaron Wells (u-aaronw)
Changed in mahara:
milestone: 16.04.0 → 16.10.0
Robert Lyon (robertl-9)
Changed in mahara:
milestone: 16.10.0 → 16.10.1
Robert Lyon (robertl-9)
Changed in mahara:
milestone: 16.10.1 → 17.04.0
Changed in mahara:
milestone: 17.04.0 → none
milestone: none → 17.10.0
Robert Lyon (robertl-9)
Changed in mahara:
milestone: 17.10.0 → 18.04.0
Revision history for this message
Robert Lyon (robertl-9) wrote :

I see that 'WITH RECURSIVE' looks like it's going to be added to mysql 8.0
https://dev.mysql.com/doc/refman/8.0/en/with.html#common-table-expressions-recursive

But until then a stored procedure would be needed
https://stackoverflow.com/questions/20215744/how-to-create-a-mysql-hierarchical-recursive-query

Revision history for this message
Peter Spicer (peter.spicer) wrote :

Regarding nested sets, there is one problem - it's not just slower on write, it gets progressively slower the more nodes in the tree since in the usual implementation you need to update the left/right values for every single set in the tree... which potentially amounts to every row in the table. There are some options to use fractional numbers rather than integers but this isn't perfect either.

One case we saw on one of our customers is a hierarchical check:

SELECT SUM(aff.size) FROM "artefact" a JOIN "artefact_file_files" aff ON aff.artefact = a.id WHERE a.path LIKE '%/12345/%' LIMIT 2

Without heading to nested sets, there might be a way to minimise this particular pain point - if artefact 1 has a path of /2/3/1/, a simple two column table could be created and kept up to date when artefact paths change, that records (1, 2), (1, 3) and (1, 1) - meaning that querying that aspect of the hierarchy comes quite a bit cheaper as you can say 'which artefacts are in the same hierarchy'. It's not a complete replacement but for just purely simple 'is in the same hierarchy as' checks, it might be a cheap win.

Robert Lyon (robertl-9)
Changed in mahara:
milestone: 18.04.0 → 18.10.0
Revision history for this message
Kristina Hoeppner (kris-hoeppner) wrote :

Note: We'll need to check if MySQL moves nested sets better. Postgres does, but MySQL couldn't so far.

Changed in mahara:
milestone: 18.10.0 → 19.04.0
Changed in mahara:
milestone: 19.04.0 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.