Evergreen

Bug #1695911
Comment #10

Comment 10 for bug 1695911

Revision history for this message

Bill Erickson (berick) wrote on 2018-10-03:

#10

Expansion of the above proposals:

CREATE UNIQUE INDEX browse_entry_sort_value_value_key
ON metabib.browse_entry USING btree (DIGEST((value || sort_value), 'sha1'));

I was getting md5 collisions during reingests, so I upgraded to sha1. So far the difference in indexing speed has not been noticeable.

ALTER TABLE metabib.browse_entry ADD COLUMN combo_sort_value TEXT;

The value stored here will be SUBSTR((sort_value || value), 1, 2048);

This replaces 'ORDER BY sort_value, value' incantations.

This gives a single field that can be (safely) indexed, but still retains the secondary sort on 'value' except in extreme cases where the value is really long. In those cases, the sorting may not be perfect, but at that point we're sorting paragraphs of text, so I doubt it matters much. Plus this way the values can actually exist in the database.

browse / browse_pivot / bib and auth ingest functions updated to populate and use the new combo_sort_value field.

Since we still need indexes on value and sort_value, may as well add them more carefully:

CREATE INDEX browse_entry_sort_value_idx
ON metabib.browse_entry (SUBSTR(sort_value, 1, 2048));

CREATE INDEX browse_entry_value_idx
ON metabib.browse_entry (SUBSTR(value, 1, 2048));

This will allow extremely long values to get ingested without failure and should (as I understand it) have no impact on entries whose values are less than 2048.