wishlist: Did you mean? Multi word, single class search suggestions

Bug #1997485 reported by Andrea Neiman
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Wishlist
Unassigned

Bug Description

This work is sponsored by the Evergreen Community Development Initiative, with development work performed by Equinox.

This is the next phase of the larger "Did You Mean" search suggestions project, following on bug 1893997 which implemented single word single class suggestions.

This part of the project implements multi word single class search suggestions in staff and OPAC interfaces.

This includes:
* Bibliographic-based search suggestions for multiword and phrase searches in a single search class
* Search suggestions from authority 4xx fields (variant terms) within a specific search class like author or subject
* Configuration options for each search class as well as an expansion of configuration options as compared the previous Did You Mean implementation

Full specifications can be seen here: https://yeti.equinoxoli.org/dev/public/techspecs/dym_stage3_20210616.pdf

A community branch will be shared once partner testing is complete.

Changed in evergreen:
status: New → Confirmed
Revision history for this message
Mike Rylander (mrylander) wrote :

The top three commits of the linked branch embody this development:

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/lp-1997485-multi-term-did-you-mean

From the commit message and release notes:

Expanding on the previous single-class, single-term search suggestion development, this feature provides suggestions for single-class searches with multiple terms.

 * The Library Settings that were previously used to control the global behavior of search suggestions have been moved to search class configuration fields. This was done because the data in each search class benefits from different setting values.

 * If a patron's search brings back a suggestion that matches an authority variant heading, the system will provide the main heading as a suggestion as well, along with spelling-corrected suggestions.

 * Quoted phrases in user input require strict term order and adjacency for the phrase portion of the suggestion generated for the phrase(s), whereas unquoted input (or the portion that is not quoted) does not.

MARC Search/Facet Class (config.metabib_class) field additions:

 * variant_authority_suggestion Whether this class should attempt variant authority suggestions based on search-class/browse-axis mapping
 * symspell_transfer_case Whether suggestions should retain user-supplied letter case
 * symspell_skip_correct Only supply suggestions to misspelled words
 * symspell_suggestion_verbosity Setting that controls the amount of effort, and therefore time, spent on suggestion generation
 * max_phrase_edit_distance Maximum average per-word edit distance when evaluating suggestions
 * suggestion_word_option_count Maximum alternate suggestions per word
 * max_suggestions Maximum suggstions to present
 * low_result_threshold Maximum hit count beyond which suggestions are not provided
 * min_suggestion_use_threshold Minimum number of times a suggestion must exist in the corpus
 * pg_trgm_weight Weight of the trigram similarity metric; 0 avoids calculation costs
 * soundex_weight Weight of the soundex similarity metric; 0 avoids calculation costs
 * keyboard_distance_weight Weight of the keyboard distance similarity metric; 0 avoids calculation costs

tags: added: pullrequest
Changed in evergreen:
milestone: none → 3.11-beta
tags: added: didyoumean
Revision history for this message
Elizabeth Thomsen (et-8) wrote :

Testing on https://bugsquash.mobiusconsortium.org/

Tested multiple misspelled words in keyword, subject, author, title and received appropriate suggestions.

Examples:
moderm mussic --> modern music
amed saladn --> ahmed saladin
obeo basoon concerta --> oboe bassoon concerto
jemisom --> jemisin
racoon citty --> raccoon city

Because this is a limited database, lots of misspellings don't generate suggestions or come up with ones ones that don't seem useful, because there weren't any records that matched the words I had in mind. For example, no suggestions for grenland because there are no records with greenland, so I found it easier to look at records first and then create the misspellings.

I tried testing to see if I could get a match based on an authority record, doing a subject search for Large print books, but didn't get any suggestions. Here's what's in the authority record:
=150 \\$aLarge type books
=450 \\$aLarge print books

This option may not be turned on

Revision history for this message
Andrea Neiman (aneiman) wrote :

Thanks Elizabeth! Testing on a larger dataset is definitely helpful here if possible. When we did partner testing we used a larger data set in that test environment for that reason.

Some configuration documentation is here to facilitate testing:
https://docs.google.com/document/d/1NKuSqFASsS4GDPRLTIN79j5is-GPv4vBAcnSZpx2Jm0/edit?usp=sharing

Revision history for this message
Elizabeth Thomsen (et-8) wrote :

Thanks, Andrea! I wouldn't have as much confidence on how well this will work in real life if I hadn't spent so much time on this during the partner testing! It's fun to test, even on a dataset like this where you really have to find records with interesting words and work backwards to spell them wrong. I think it's going to make a big difference to all users including people like me who can spell well but can't type!

I encourage anyone interested in this to look at the configurations in the document you shared to see just how many options there are and how much we can fine tune this based on experience. I look forward to lots of future discussion and conference presentations on how different sites have changed these configurations!

Revision history for this message
Elizabeth Thomsen (et-8) wrote (last edit ):

Checked the Metabib Class Configuration on https://bugsquash.mobiusconsortium.org/ and "Perform variant heading authority suggestion cross-reference" is set to Yes for all classes.

There's a subject authority record with a cross reference, linked to bib records:
=150 \\$aFantasy
=450 \\$aDay dreams

When I do a subject search on Day dreams, I get no matching records, and no Did You Mean suggestion for Fantasy.

Revision history for this message
Ruth Frasur Davis (redavis) wrote :

I have gone back to test the searches described by Elizabeth on the partner testing server and all searches returned results as expected. I consent to signing off on it with my name, Ruth Frasur (rfrasur) and email address, <email address hidden><email address hidden>.

tags: added: signedoff
Revision history for this message
Mike Rylander (mrylander) wrote :

I've pushed a rebased and baseline-schema-reified version of this to:

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/lp-1997485-multi-term-did-you-mean-rebase-and-reify-baseline

It includes Ruth's sign-off, as well.

Revision history for this message
Chris Sharp (chrissharp123) wrote :

Trying to apply this in PINES, the upgrade script succeeds, but the DO function fails with

psql:Open-ILS/src/sql/Pg/upgrade/XXXX.schema.DYM-multi-word.sql:777: ERROR: null value in column "low_result_threshold" violates not-null constraint
DETAIL: Failing row contains (keyword, Keyword, f, f, 1.0, 0.4, 0.2, 0.1, f, t, t, f, 2, 2, 5, -1, null, 1, 0, 0, 0).

Upon inspection, we don't have the "opac.did_you_mean.low_result_threshold" setting set in PINES at any level. Indeed, we don't have any of the consulted settings in place. I'm basically unfamiliar with DYM beyond what I've had to do for upgrades, but should we allow for an organization to not have those set in the upgrade script, or should we alert before the script is run? or in the docs?

Revision history for this message
Galen Charlton (gmc) wrote :

Those settings are not meant to be required; updating the IF statements in that DO block to the equivalent of "IF FOUND AND val IS NOT NULL..." should take care of the problem.

Revision history for this message
Mike Rylander (mrylander) wrote :

To confirm explicitly, the DO block is (at most) a best-effort attempt to move old settings to the new locations, but is /not/ required to succeed in order to use the new code/features.

Some of the options we have, in order of my /personal/ preference:

 * Remove the DO block and update the release notes to direct the upgrading sysadmin's attention to the new location of the various settings. This is my preference so as to encourage intentional configuration.
 * Galen's suggestion of "IF FOUND AND val IS NOT NULL ..." to conditionally change the defaults provided at the table definition level.
 * Update the DO block to use COALESCE to force new defaults when (some of) the old settings are not there.

All of those are relatively low-effort, but in the interest of only attacking this once, are there any opinions (strong or otherwise) or other options to consider that would save someone's time?

Thanks, all!

Revision history for this message
Chris Sharp (chrissharp123) wrote :

Thank you both for the quick response. I have no strong opinions about the approach. All things being equal, Galen's idea of moving any existing settings into their new location automatically makes sense to me.

Galen Charlton (gmc)
Changed in evergreen:
assignee: nobody → Galen Charlton (gmc)
Revision history for this message
Galen Charlton (gmc) wrote :

I've pushed a new working branch: user/gmcharlt/lp1997485_dym_tng

This branch:

* Addresses the glitch during update that Chris saw; if the library settings were not set, the upgrade will not throw errors.
* Adjust how authority-record-derived suggestions are made. The upshot is that 1XX, 4XX, and 5XX fields from authority records are now included in the search suggestion indexed, meaning that a user search that matches a heading from an authority 4XX or 5XX field can trigger a suggestion based on the main heading provided that that heading is linked to at least one bib record.

I think this branch is ready to go and am inclined to push it tonight, but if anybody wants to look at it on Thursday (4/27), please speak up.

Changed in evergreen:
assignee: Galen Charlton (gmc) → nobody
Revision history for this message
Galen Charlton (gmc) wrote :

Merged for inclusion in 3.11-beta. Thanks, Mike, Ruth, and everybody else!

Changed in evergreen:
status: Confirmed → Fix Committed
Revision history for this message
Galen Charlton (gmc) wrote :

Pushed a follow-up (though whoops: with the wrong bug number in the commit) to fix a regression related to delayed reification of search dictionary updates.

Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.