Evergreen

wishlist: Did you mean? Multi word, single class search suggestions

Bug #1997485 reported by Andrea Neiman on 2022-11-22

This bug affects 4 people

Affects		Status	Importance	Assigned to	Milestone
	Evergreen	Fix Released	Wishlist	Unassigned	Evergreen 3.11-beta

Bug Description

This work is sponsored by the Evergreen Community Development Initiative, with development work performed by Equinox.

This is the next phase of the larger "Did You Mean" search suggestions project, following on bug 1893997 which implemented single word single class suggestions.

This part of the project implements multi word single class search suggestions in staff and OPAC interfaces.

This includes:
* Bibliographic-based search suggestions for multiword and phrase searches in a single search class
* Search suggestions from authority 4xx fields (variant terms) within a specific search class like author or subject
* Configuration options for each search class as well as an expansion of configuration options as compared the previous Did You Mean implementation

Full specifications can be seen here: https://yeti.equinoxoli.org/dev/public/techspecs/dym_stage3_20210616.pdf

A community branch will be shared once partner testing is complete.

Tags:

Terran McCanna (tmccanna) on 2022-11-28

Changed in evergreen:
status:	New → Confirmed

Revision history for this message

Mike Rylander (mrylander) wrote on 2023-01-12:

The top three commits of the linked branch embody this development:

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/lp-1997485-multi-term-did-you-mean

From the commit message and release notes:

Expanding on the previous single-class, single-term search suggestion development, this feature provides suggestions for single-class searches with multiple terms.

* The Library Settings that were previously used to control the global behavior of search suggestions have been moved to search class configuration fields. This was done because the data in each search class benefits from different setting values.

* If a patron's search brings back a suggestion that matches an authority variant heading, the system will provide the main heading as a suggestion as well, along with spelling-corrected suggestions.

* Quoted phrases in user input require strict term order and adjacency for the phrase portion of the suggestion generated for the phrase(s), whereas unquoted input (or the portion that is not quoted) does not.

MARC Search/Facet Class (config.metabib_class) field additions:

* variant_authority_suggestion Whether this class should attempt variant authority suggestions based on search-class/browse-axis mapping
* symspell_transfer_case Whether suggestions should retain user-supplied letter case
* symspell_skip_correct Only supply suggestions to misspelled words
* symspell_suggestion_verbosity Setting that controls the amount of effort, and therefore time, spent on suggestion generation
* max_phrase_edit_distance Maximum average per-word edit distance when evaluating suggestions
* suggestion_word_option_count Maximum alternate suggestions per word
* max_suggestions Maximum suggstions to present
* low_result_threshold Maximum hit count beyond which suggestions are not provided
* min_suggestion_use_threshold Minimum number of times a suggestion must exist in the corpus
* pg_trgm_weight Weight of the trigram similarity metric; 0 avoids calculation costs
* soundex_weight Weight of the soundex similarity metric; 0 avoids calculation costs
* keyboard_distance_weight Weight of the keyboard distance similarity metric; 0 avoids calculation costs

The top three commits of the linked branch embody this development:

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/lp-1997485-multi-term-did-you-mean

From the commit message and release notes:

Expanding on the previous single-class, single-term search suggestion development, this feature provides suggestions for single-class searches with multiple terms.

* The Library Settings that were previously used to control the global behavior of search suggestions have been moved to search class configuration fields.  This was done because the data in each search class benefits from different setting values.

* If a patron's search brings back a suggestion that matches an authority variant heading, the system will provide the main heading as a suggestion as well, along with spelling-corrected suggestions.

MARC Search/Facet Class (config.metabib_class) field additions:

* variant_authority_suggestion   Whether this class should attempt variant authority suggestions based on search-class/browse-axis mapping
 * symspell_transfer_case         Whether suggestions should retain user-supplied letter case
 * symspell_skip_correct          Only supply suggestions to misspelled words
 * symspell_suggestion_verbosity  Setting that controls the amount of effort, and therefore time, spent on suggestion generation
 * max_phrase_edit_distance       Maximum average per-word edit distance when evaluating suggestions
 * suggestion_word_option_count   Maximum alternate suggestions per word
 * max_suggestions                Maximum suggstions to present
 * low_result_threshold           Maximum hit count beyond which suggestions are not provided
 * min_suggestion_use_threshold   Minimum number of times a suggestion must exist in the corpus
 * pg_trgm_weight                 Weight of the trigram similarity metric; 0 avoids calculation costs
 * soundex_weight                 Weight of the soundex similarity metric; 0 avoids calculation costs
 * keyboard_distance_weight       Weight of the keyboard distance similarity metric; 0 avoids calculation costs

tags:	added: pullrequest
Changed in evergreen:
milestone:	none → 3.11-beta

Jeff Davis (jdavis-sitka) on 2023-01-23

tags:

added: didyoumean

Revision history for this message

Elizabeth Thomsen (et-8) wrote on 2023-03-01:

Screenshot of Did You Mean suggestion Edit (17.0 KiB, image/png)

Testing on https://bugsquash.mobiusconsortium.org/

Tested multiple misspelled words in keyword, subject, author, title and received appropriate suggestions.

Examples:
moderm mussic --> modern music
amed saladn --> ahmed saladin
obeo basoon concerta --> oboe bassoon concerto
jemisom --> jemisin
racoon citty --> raccoon city

Because this is a limited database, lots of misspellings don't generate suggestions or come up with ones ones that don't seem useful, because there weren't any records that matched the words I had in mind. For example, no suggestions for grenland because there are no records with greenland, so I found it easier to look at records first and then create the misspellings.

I tried testing to see if I could get a match based on an authority record, doing a subject search for Large print books, but didn't get any suggestions. Here's what's in the authority record:
=150 \\$aLarge type books
=450 \\$aLarge print books

This option may not be turned on

Revision history for this message

Andrea Neiman (aneiman) wrote on 2023-03-02:

Thanks Elizabeth! Testing on a larger dataset is definitely helpful here if possible. When we did partner testing we used a larger data set in that test environment for that reason.

Some configuration documentation is here to facilitate testing:
https://docs.google.com/document/d/1NKuSqFASsS4GDPRLTIN79j5is-GPv4vBAcnSZpx2Jm0/edit?usp=sharing

Revision history for this message

Elizabeth Thomsen (et-8) wrote on 2023-03-03:

Thanks, Andrea! I wouldn't have as much confidence on how well this will work in real life if I hadn't spent so much time on this during the partner testing! It's fun to test, even on a dataset like this where you really have to find records with interesting words and work backwards to spell them wrong. I think it's going to make a big difference to all users including people like me who can spell well but can't type!

I encourage anyone interested in this to look at the configurations in the document you shared to see just how many options there are and how much we can fine tune this based on experience. I look forward to lots of future discussion and conference presentations on how different sites have changed these configurations!

Revision history for this message

Elizabeth Thomsen (et-8) wrote on 2023-03-03 (last edit on 2023-03-03):

Checked the Metabib Class Configuration on https://bugsquash.mobiusconsortium.org/ and "Perform variant heading authority suggestion cross-reference" is set to Yes for all classes.

There's a subject authority record with a cross reference, linked to bib records:
=150 \\$aFantasy
=450 \\$aDay dreams

When I do a subject search on Day dreams, I get no matching records, and no Did You Mean suggestion for Fantasy.

Revision history for this message

Ruth Frasur Davis (redavis) wrote on 2023-03-10:

I have gone back to test the searches described by Elizabeth on the partner testing server and all searches returned results as expected. I consent to signing off on it with my name, Ruth Frasur (rfrasur) and email address, <email address hidden><email address hidden>.

tags:

added: signedoff

Revision history for this message

Mike Rylander (mrylander) wrote on 2023-03-31:

I've pushed a rebased and baseline-schema-reified version of this to:

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/lp-1997485-multi-term-did-you-mean-rebase-and-reify-baseline

It includes Ruth's sign-off, as well.

Revision history for this message

Chris Sharp (chrissharp123) wrote on 2023-04-20:

Trying to apply this in PINES, the upgrade script succeeds, but the DO function fails with

psql:Open-ILS/src/sql/Pg/upgrade/XXXX.schema.DYM-multi-word.sql:777: ERROR: null value in column "low_result_threshold" violates not-null constraint
DETAIL: Failing row contains (keyword, Keyword, f, f, 1.0, 0.4, 0.2, 0.1, f, t, t, f, 2, 2, 5, -1, null, 1, 0, 0, 0).

Upon inspection, we don't have the "opac.did_you_mean.low_result_threshold" setting set in PINES at any level. Indeed, we don't have any of the consulted settings in place. I'm basically unfamiliar with DYM beyond what I've had to do for upgrades, but should we allow for an organization to not have those set in the upgrade script, or should we alert before the script is run? or in the docs?

Revision history for this message

Galen Charlton (gmc) wrote on 2023-04-20:

Those settings are not meant to be required; updating the IF statements in that DO block to the equivalent of "IF FOUND AND val IS NOT NULL..." should take care of the problem.

Revision history for this message

Mike Rylander (mrylander) wrote on 2023-04-20:

#10

To confirm explicitly, the DO block is (at most) a best-effort attempt to move old settings to the new locations, but is /not/ required to succeed in order to use the new code/features.

Some of the options we have, in order of my /personal/ preference:

* Remove the DO block and update the release notes to direct the upgrading sysadmin's attention to the new location of the various settings. This is my preference so as to encourage intentional configuration.
* Galen's suggestion of "IF FOUND AND val IS NOT NULL ..." to conditionally change the defaults provided at the table definition level.
* Update the DO block to use COALESCE to force new defaults when (some of) the old settings are not there.

All of those are relatively low-effort, but in the interest of only attacking this once, are there any opinions (strong or otherwise) or other options to consider that would save someone's time?

Thanks, all!

Revision history for this message

Chris Sharp (chrissharp123) wrote on 2023-04-21:

#11

Thank you both for the quick response. I have no strong opinions about the approach. All things being equal, Galen's idea of moving any existing settings into their new location automatically makes sense to me.

Galen Charlton (gmc) on 2023-04-27

Changed in evergreen:
assignee:	nobody → Galen Charlton (gmc)

Revision history for this message

Galen Charlton (gmc) wrote on 2023-04-27:

#12

I've pushed a new working branch: user/gmcharlt/lp1997485_dym_tng

This branch:

* Addresses the glitch during update that Chris saw; if the library settings were not set, the upgrade will not throw errors.
* Adjust how authority-record-derived suggestions are made. The upshot is that 1XX, 4XX, and 5XX fields from authority records are now included in the search suggestion indexed, meaning that a user search that matches a heading from an authority 4XX or 5XX field can trigger a suggestion based on the main heading provided that that heading is linked to at least one bib record.

I think this branch is ready to go and am inclined to push it tonight, but if anybody wants to look at it on Thursday (4/27), please speak up.

Changed in evergreen:
assignee:	Galen Charlton (gmc) → nobody

Revision history for this message

Galen Charlton (gmc) wrote on 2023-04-28:

#13

Merged for inclusion in 3.11-beta. Thanks, Mike, Ruth, and everybody else!

Changed in evergreen:
status:	Confirmed → Fix Committed

Revision history for this message

Galen Charlton (gmc) wrote on 2023-04-29:

#14

Pushed a follow-up (though whoops: with the wrong bug number in the commit) to fix a regression related to delayed reification of search dictionary updates.

Evergreen Bug Maintenance (bugmaster) on 2023-05-18