Record attribute indexing slowed by non-index configuration

Bug #1588543 reported by Dan Wells
This bug affects 2 people
Affects Status Importance Assigned to Milestone

Bug Description

EG 2.9+

A huge number of rows (700+) have been added to config.record_attr_definition in recent versions of Evergreen. However, most of these rows do not actually define any indexing at all, but are used for other purposes. The reingest process currently churns over all of these rows to ultimately do nothing.

One potentially easy win is to limit the group of attributes considered at ingest to only those which might index something. Quick testing indicates a substantial speed boost to typical use of metabib.reingest_record_attributes() (over 7x speedup).

WIP Branch is here:;a=shortlog;h=refs/heads/user/dbwells/speed_up_rec_attr_ingest


One reason the above branch makes the difference is does is greatly reducing the number of resulting selects to config.coded_value_map. However, since the complete config rows also do these selects, additional speed may be realized by adding an index on ctype, such as:

CREATE INDEX ON config.coded_value_map (ctype);

This change in early testing cuts processing time by an additional ~15% over the branch above. Other indexes have been tried with diminishing returns, but these are still being investigated.

In addition, we should consider whether some of the newer config rows should in fact have 'filter' turned off in stock. The 'filter' attribute is true by default in the config.record_attr_definition, so intentions of recent commits are not 100% clear. I would advocate we set 'filter' to false for any row which has no indexing information (even though they are now caught by the branch commit above). We should also consider defaulting to false for some of the less common fixed fields, as it may be enough to now have them available for turning on if a site desires to do so.

Dan Wells (dbw2)
Changed in evergreen:
assignee: nobody → Dan Wells (dbw2)
Revision history for this message
Kathy Lussier (klussier) wrote :

Hi Dan,

After Dyrcona loaded this patch on his dev server, I noticed that filtering by format is no longer working. To rule out other possible causes, I installed the branch on a fresh VM and confirmed that the search filter problem occurred there as well.

Basically, if I do a search limited by one of the stock search filters, I immediately get a search results page with zero results.

Revision history for this message
Dan Wells (dbw2) wrote :

Kathy and Jason, thanks for testing! The branch was overlooking composite attributes, as their indexing data is stored in a separate table.

I have pushed another commit to add them back to the set. This obviously gives back some of the measured gains, but so far still a measurable improvement (roughly 2x for attr-only reingest on my system with local configuration, YMMV).

On the positive side, the suggested index (CREATE INDEX ON config.coded_value_map (ctype);) now appears to make a larger positive difference (~33% faster, or 3x total speed over baseline).

This branch remains experimental, and all feedback is welcome.

Revision history for this message
Galen Charlton (gmc) wrote :

I've tested the changes to the stored procedure as well as the suggested index, and they look good to me. I've also written a unit test (and confirmed that it would have caught the problem with Dan's initial patch). I've squashed Dan's patches and put everything together in the collab/gmcharlt/lp1588543_record_attrs branch in the working/Evergreen repository:;a=shortlog;h=refs/heads/collab/gmcharlt/lp1588543_record_attrs

Since the changes so far provide a substantial benefit, I'm putting a pullrequest on this bug now; further refinements can come later.

tags: added: performance pullrequest
Changed in evergreen:
assignee: Dan Wells (dbw2) → Mike Rylander (mrylander)
Revision history for this message
Mike Rylander (mrylander) wrote :

Merged for 2.11, thanks both!

Changed in evergreen:
status: New → Fix Committed
Changed in evergreen:
assignee: Mike Rylander (mrylander) → nobody
milestone: none → 2.11-beta
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers