Relevance ranking deteriorates when phrases are added to search

Bug #1516707 reported by Kathy Lussier on 2015-11-16
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Medium
Unassigned
2.8
Medium
Unassigned
2.9
Medium
Unassigned

Bug Description

Evergreen release: 2.8

When searching the catalog for a phrase, relevance ranking deteriorates significantly.

As an example, searching for 'the blue' (without quotation marks) retrieves highly relevant results:

http://bark.cwmars.org/eg/opac/results?query=the+blue&qtype=title&fi%3Asearch_format=&locg=1

The first hit is a record that has "The blue" as the title proper. The metabib.title_field_entry entries for this record are:

field index_vector
4 'a':2A,5C 'blue':1A,4C 'novel':3A,6C
6 'blue':2A,4C 'the':1A,3C

However, if I perform the same search as a phrase, this record is unfindable (I paged through 20 pages of results before giving up.)

http://bark.cwmars.org/eg/opac/results?query=%22the%20blue%22;qtype=title;locg=1

The first hit is for a record with the following entries in metabib.title_field_entry http://pastebin.com/0ydEsWxn

The word 'blue' is found several times in a custom index created for titles in the 505t.

We found several other examples where phrase searching led to worse relevance ranking than a non-phrase search.

the martian:
http://bark.cwmars.org/eg/opac/results?query=the+martian&qtype=title&fi%3Asearch_format=&locg=1&sort=

"the martian":
http://bark.cwmars.org/eg/opac/results?query=%22the+martian%22&qtype=title&fi%3Asearch_format=&locg=1&sort=

martian "a novel":
http://bark.cwmars.org/eg/opac/results?query=martian+%22a+novel%22&qtype=title&fi%3Asearch_format=&locg=1&sort=

the help:
http://bark.cwmars.org/eg/opac/results?query=the+help&qtype=title&fi%3Asearch_format=&locg=1&sort=

"the help":
http://bark.cwmars.org/eg/opac/results?query=%22the+help%22&qtype=title&fi%3Asearch_format=&locg=1&sort=

The system doesn't appear to be taking coverage density into consideration when ranking search results.

I've replicated this problem on several other Evergreen catalogs (NOBLE, MVLC, Bibliomation, Georgia PINES).

Mike Rylander (mrylander) wrote :

Kathy,

If you can capture the core queries from the postgres logs for exemplar quoted and unquoted searches, it may be straight forward to find the issue.

TIA!

Kathy Lussier (klussier) wrote :

mmorgan++ for grabbing the core queries from the NOBLE server.

The core query for the unquoted search: http://pastebin.com/WqhYabbd

The core query for the quoted search: http://pastebin.com/h7gcB0Gz

Thanks for looking Mike!

Mike Rylander (mrylander) wrote :

Welp, I found the problem. The cover density modifier configuration is not being used when turning phrase subqueries into an SQL fragment. Looking for a solution ...

Kathy Lussier (klussier) wrote :

Thank you for whipping up a patch so quickly Mike! The fix works great! The relevance ranking for the quoted search is now the same as for the unquoted search. I didn't notice any other regression in searching with the fix in place.

I merged the changes to master and backported to 2.9. Jason suggested holding off on the merge to 2.8 until we get the ok from Bill, so I'll leave the target, but won't commit the fix yet.

Changed in evergreen:
status: New → Fix Committed
milestone: none → 2.next
Changed in evergreen:
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
Changed in evergreen:
milestone: 2.next → 2.10-beta
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers