Did you mean: search suggestions exist for deleted records and can result in no hits

Bug #1931626 reported by Michele Morgan
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Evergreen
Confirmed
Medium
Unassigned

Bug Description

Terms from deleted records are included in search suggestions, which can result in no hits for the suggested term.

Examples from a concerto database:

Search term - Suggestion

casiers - casier
horis - loris

The suggestions yield no hits, and appear to come from deleted record ids 10 and 20.

Revision history for this message
Mike Rylander (mrylander) wrote :

Hi Michele,

Did you test this on an upgraded database, using the symspell-sideloader.pl script? It looks like we'll need to improve the instructions for its use to suggest excluding deleted records from the data gathering step.

NOTE: in a quiescent database, one can truncate the dictionary table and rerun the sideloading steps to perform a refresh of the data.

Thanks!

Revision history for this message
Michele Morgan (mmorgan) wrote :

Hi Mike,

I tested this on a master test system, built after the release of 3.7. I am also seeing the examples from the initial report on demo.evergreencatalog.com.

Revision history for this message
Mike Rylander (mrylander) wrote :

Thanks, Michele. I'm pretty certain the demo server is upgraded, not fresh, and an upgraded server will have this issue if my theory is correct. Was your master system's database upgraded using the numbered scripts, or a completely fresh database?

Revision history for this message
Jason Boyer (jboyer) wrote :

Demo.evergreencatalog.com is rebuilt from the ground up every weekend. Since the goal is that nothing persists there’s no point in upgrading it. :)

While there are triggers to maintain the symspell entries they're only fired when rows are deleted from the metabib.*_field_entry tables. The aaa_indexing_ingest_or_delete trigger on biblio.record_entry calls biblio.indexing_ingest_or_delete which does remove a couple things when records are deleted, but metabib field entries aren't one of them, regardless of the (poorly named nowadays) ingest.metarecord_mapping.preserve_on_delete global flag.

from a concerto db:
everpost=# select * from metabib.title_field_entry where source =257;
 id | source | field | value | index_vector
-----+--------+-------+------------------+----------------------------------------
 765 | 257 | 6 | Sdílení naděje | 'nadej':4C 'nadeje':2A 'sdileni':1A,3C
 764 | 257 | 53 | Sdílení naděje / | 'nadej':4C 'nadeje':2A 'sdileni':1A,3C
(2 rows)

everpost=# delete from biblio.record_entry where id=257;
DELETE 0
everpost=# select deleted from biblio.record_entry where id=257;
 deleted
---------
 t
(1 row)
everpost=# select * from metabib.title_field_entry where source =257;
 id | source | field | value | index_vector
-----+--------+-------+------------------+----------------------------------------
 765 | 257 | 6 | Sdílení naděje | 'nadej':4C 'nadeje':2A 'sdileni':1A,3C
 764 | 257 | 53 | Sdílení naděje / | 'nadej':4C 'nadeje':2A 'sdileni':1A,3C
(2 rows)

Changed in evergreen:
status: New → Confirmed
Revision history for this message
Jason Boyer (jboyer) wrote :

Oh, I forgot about this: Since the 3.7 upgrade script includes all records, deleted or not, and removing entries from metabib.*_field_entry will affect their counts, a straightforward upgrade script that updated biblio.indexing_ingest_or_delete and also removed deleted entries from the field entry tables should leave systems in a consistent state. It's site-dependent if that is preferable to disabling all symspell triggers, taking out the trash, and rebuilding the symspell dictionaries entirely.

Changed in evergreen:
importance: Undecided → High
importance: High → Medium
tags: added: didyoumean
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.