Relevance ranking deteriorates when phrases are added to search
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
Fix Released
|
Medium
|
Unassigned | ||
2.8 |
Fix Released
|
Medium
|
Unassigned | ||
2.9 |
Fix Released
|
Medium
|
Unassigned |
Bug Description
Evergreen release: 2.8
When searching the catalog for a phrase, relevance ranking deteriorates significantly.
As an example, searching for 'the blue' (without quotation marks) retrieves highly relevant results:
http://
The first hit is a record that has "The blue" as the title proper. The metabib.
field index_vector
4 'a':2A,5C 'blue':1A,4C 'novel':3A,6C
6 'blue':2A,4C 'the':1A,3C
However, if I perform the same search as a phrase, this record is unfindable (I paged through 20 pages of results before giving up.)
http://
The first hit is for a record with the following entries in metabib.
The word 'blue' is found several times in a custom index created for titles in the 505t.
We found several other examples where phrase searching led to worse relevance ranking than a non-phrase search.
the martian:
http://
"the martian":
http://
martian "a novel":
http://
the help:
http://
"the help":
http://
The system doesn't appear to be taking coverage density into consideration when ranking search results.
I've replicated this problem on several other Evergreen catalogs (NOBLE, MVLC, Bibliomation, Georgia PINES).
Changed in evergreen: | |
status: | Fix Committed → Fix Released |
status: | Fix Released → Fix Committed |
Changed in evergreen: | |
milestone: | 2.next → 2.10-beta |
Changed in evergreen: | |
status: | Fix Committed → Fix Released |
Kathy,
If you can capture the core queries from the postgres logs for exemplar quoted and unquoted searches, it may be straight forward to find the issue.
TIA!