Allow faster indexing of elasticsearch via cli script

Bug #1732565 reported by Robert Lyon on 2017-11-16
This bug affects 1 person
Affects Status Importance Assigned to Milestone

Bug Description

When one re-indexes a large site it can take hours before the site is fully re-indexed. This is because even though we index via the bulk system it is restricted by number of records we can read into memory and speed of cron run.

A way we could speed this up is via a fast index CLI script that allows us to fire off the next cron run for elasticsearch indexing immediately after previous one finishes

This way we would save the 'dead time' between runs waiting for the server clock to tick over to the next minute

Robert Lyon (robertl-9) wrote :

Also it looks like if you drop down the number to index for each run on a large site the indexing goes faster as the bulk of the slowness looks to be computing information into memory before passing it to the index.

420,000 records at 10,000 records per run took 90mins
420,000 records at 5,000 records per run took 45mins

Submitter: Robert Lyon (<email address hidden>)
Branch: master

commit aac8315fc68c68dc7711250c24781cff92ebb742
Author: Robert Lyon <email address hidden>
Date: Thu Nov 16 12:10:53 2017 +1300

Bug 1732565: Allow for faster indexing of large sites into elasticsearch

Currently if you index 386,000 records at 10,000 at a time
- default cron at every 5 mins takes 195 mins (3.25 hours)
- cron set to every minute should take 39 mins (but if indexing not
finished the following minute may be skipped)

With fast_index.php it starts next index straight after last finishes
so runs thru at optimal speed
- took only 25 minutes (almost 66% faster than per minute cron and way
faster than default cron speed)


Change-Id: I65bfb19ab6481a95bafb120d3139d37e0ef28f92
Signed-off-by: Robert Lyon <email address hidden>

Robert Lyon (robertl-9) on 2018-02-01
Changed in mahara:
status: In Progress → Fix Committed
tags: added: nominatedfeature
Robert Lyon (robertl-9) on 2018-04-05
Changed in mahara:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers