Subject links in the catalog inappropriately strip periods from search string

Bug #1623955 reported by Kathy Lussier
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Medium
Unassigned
2.10
Fix Released
Medium
Unassigned
2.11
Fix Released
Medium
Unassigned

Bug Description

The subject links in the catalog inappropriately remove periods contained in the subject heading from the search string. However, the normalization that occurs when indexing the records replaces the period with a space. This means that clicking a subject link where there are periods may fail to retrieve records containing that subject heading.

Example:
http://evergreen.noblenet.org/eg/opac/record/1122171 has a subject heading for New York (N.Y.) > Office of the Central Park Administrator.

If you click the last piece of that heading, you'll get no results, even though we know that, at a minimum, the catalog should be retrieving that contained the original link. Note for testers: in some catalogs, this link will retrieve results because they might also be indexing the 043 as a subject, which contains the term, ny. However, I consider a successful retrieval for those systems to be a happy accident.

Another example:
http://bark.cwmars.org/eg/opac/record/1553167 contains a subject heading for Presbyterian Church (U.S.A.) > Sermons. If you click on the Sermons link, you retrieve just one record, which is for an entirely different title that happens to have a subject heading with USA.

The preferred behavior is to maintain the period in the search string. As part of the search, the catalog can then handle the period in whatever way it should for the normalizer configured for that system.

Kathy Lussier (klussier)
Changed in evergreen:
status: New → In Progress
Revision history for this message
Kathy Lussier (klussier) wrote :
Changed in evergreen:
assignee: Kathy Lussier (klussier) → nobody
status: In Progress → Triaged
tags: added: pullrequest
Revision history for this message
Dan Wells (dbw2) wrote :

Kathy, thanks for your work on this. Two quick comments:

1) The backslash before the '.' was there to escape the '.'. We don't need it anymore, and it should probably be removed to prevent confusion.
2) We had a similar fix on our local system, but had replaced '.' with space instead of leaving them be. It could be that our fix was out of date, but it could also be that there is some case where that helps. I am guessing our fix was just out of date, but thought it was worth mentioning in case it rings a bell with anyone else.

Revision history for this message
Kathy Lussier (klussier) wrote :

Good catch Dan! I removed the backslash and force-pushed an update to the above branch.

I considered replacing the '.' with a space, but I ultimately decided it was better to maintain the period in the search string. I think it's better for search to take responsibility for handling the period in a way that matches the normalization rules used for that particular index. If, for some reason, a site were to use custom normalization rules for the subject index that stripped the period, these links would continue to work as is.

FWIW, I don't know offhand of any system that normalizes its indexes in a way that strips the periods, but we do have two possible ways of handling apostrophes in our indexes - one that strips the apostrophes and one that replaces them with spaces. Maintaining the apostrophes in the catalog's subject and author links makes it possible for a site to choose which normalization method works best for them without breaking those links.

Dan Wells (dbw2)
Changed in evergreen:
assignee: nobody → Dan Wells (dbw2)
Revision history for this message
Dan Wells (dbw2) wrote :

Thanks, Kathy! Pushed to master through 2.10.

For later reference, this block of code was never meant to normalize, per se. It's a workaround to prevent "normal" record content links from accidentally containing search grammar. It's origin can be seen in a58bb07326a5. That said, I don't see that '.' ever had special meaning in our search grammar, so maybe chalk this up to some extra zeal.

At some point we should probably work out a bona fide encoding scheme for special search characters (a la URL encode, though perhaps not that exactly). On the other hand, we got nearly four good years out of this current fix, so why not keep riding :)

Changed in evergreen:
milestone: none → 2.next
assignee: Dan Wells (dbw2) → nobody
status: Triaged → Fix Committed
Changed in evergreen:
milestone: 2.next → 2.12-beta
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.