Relevance ranking deteriorates when phrases are added to search

Bug #1516707 reported by Kathy Lussier
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Medium
Unassigned
2.8
Fix Released
Medium
Unassigned
2.9
Fix Released
Medium
Unassigned

Bug Description

Evergreen release: 2.8

When searching the catalog for a phrase, relevance ranking deteriorates significantly.

As an example, searching for 'the blue' (without quotation marks) retrieves highly relevant results:

http://bark.cwmars.org/eg/opac/results?query=the+blue&qtype=title&fi%3Asearch_format=&locg=1

The first hit is a record that has "The blue" as the title proper. The metabib.title_field_entry entries for this record are:

field index_vector
4 'a':2A,5C 'blue':1A,4C 'novel':3A,6C
6 'blue':2A,4C 'the':1A,3C

However, if I perform the same search as a phrase, this record is unfindable (I paged through 20 pages of results before giving up.)

http://bark.cwmars.org/eg/opac/results?query=%22the%20blue%22;qtype=title;locg=1

The first hit is for a record with the following entries in metabib.title_field_entry http://pastebin.com/0ydEsWxn

The word 'blue' is found several times in a custom index created for titles in the 505t.

We found several other examples where phrase searching led to worse relevance ranking than a non-phrase search.

the martian:
http://bark.cwmars.org/eg/opac/results?query=the+martian&qtype=title&fi%3Asearch_format=&locg=1&sort=

"the martian":
http://bark.cwmars.org/eg/opac/results?query=%22the+martian%22&qtype=title&fi%3Asearch_format=&locg=1&sort=

martian "a novel":
http://bark.cwmars.org/eg/opac/results?query=martian+%22a+novel%22&qtype=title&fi%3Asearch_format=&locg=1&sort=

the help:
http://bark.cwmars.org/eg/opac/results?query=the+help&qtype=title&fi%3Asearch_format=&locg=1&sort=

"the help":
http://bark.cwmars.org/eg/opac/results?query=%22the+help%22&qtype=title&fi%3Asearch_format=&locg=1&sort=

The system doesn't appear to be taking coverage density into consideration when ranking search results.

I've replicated this problem on several other Evergreen catalogs (NOBLE, MVLC, Bibliomation, Georgia PINES).

Revision history for this message
Mike Rylander (mrylander) wrote :

Kathy,

If you can capture the core queries from the postgres logs for exemplar quoted and unquoted searches, it may be straight forward to find the issue.

TIA!

Revision history for this message
Kathy Lussier (klussier) wrote :

mmorgan++ for grabbing the core queries from the NOBLE server.

The core query for the unquoted search: http://pastebin.com/WqhYabbd

The core query for the quoted search: http://pastebin.com/h7gcB0Gz

Thanks for looking Mike!

Revision history for this message
Mike Rylander (mrylander) wrote :

Welp, I found the problem. The cover density modifier configuration is not being used when turning phrase subqueries into an SQL fragment. Looking for a solution ...

Revision history for this message
Mike Rylander (mrylander) wrote :
tags: added: pullrequest
Revision history for this message
Kathy Lussier (klussier) wrote :

Thank you for whipping up a patch so quickly Mike! The fix works great! The relevance ranking for the quoted search is now the same as for the unquoted search. I didn't notice any other regression in searching with the fix in place.

I merged the changes to master and backported to 2.9. Jason suggested holding off on the merge to 2.8 until we get the ok from Bill, so I'll leave the target, but won't commit the fix yet.

Changed in evergreen:
status: New → Fix Committed
milestone: none → 2.next
Changed in evergreen:
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
Changed in evergreen:
milestone: 2.next → 2.10-beta
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.