Searching the catalog for a phrase match fails when the search term begins or ends with certain characters.

Bug #1373601 reported by Michele Morgan on 2014-09-24
This bug affects 3 people
Affects Status Importance Assigned to Milestone

Bug Description

Evergreen 2.5.1 + (?)

Does not appear to be a problem in a 2.3.5 system, at least not in the one 2.3.5 system I was able to identify (

When searching the catalog for terms in quotes, implying that an exact phrase match is desired, if the term begins or ends with punctuation and other non-alphanumeric characters, no hits are found.

Example searches are:

"o, africa!"
"$30 music school"
"$60 a day"
"footballers wive$"
"westward ho!"
"are you being served?"

Eliminating the beginning or ending character within the quotes, and searching these terms without quotes does yield results. This happens in both title and keyword indices.

This is true in all Evergreen catalogs tried so far on releases 2.5 and up. For the "o, africa!" search, below are links to several catalogs. The first link is the search with the quotes, the second without the quotes. Note that the last set of links is the 2.3.5 system, and the quoted search is successful:

NOBLE (2.6.2):!%22&qtype=keyword&fg%3Aadvanced=&locg=1!&qtype=keyword&fg%3Aadvanced=&locg=1

MVLC (head):!%22&qtype=keyword&fg%3Aadvanced=&locg=1!&qtype=keyword&fg%3Aadvanced=&locg=1

CWMARS (2.5.5):!%22&qtype=keyword&fg%3Aformat_filters=&locg=1!&qtype=keyword&fg%3Aformat_filters=&locg=1

Bibliomation (head):!%22&qtype=keyword&fi%3Asearch_format=&locg=1!&qtype=keyword&fi%3Asearch_format=&locg=1

Pines (2.5.1):!%22&qtype=keyword&fi%3Aitem_type=&locg=1!&qtype=keyword&fi%3Aitem_type=&locg=1

TADL (2.5.1):!%22&qtype=keyword&locg=22!&qtype=keyword&locg=22

NTLC (2.3.5)!%22&qtype=keyword!&qtype=keyword

Kathy Lussier (klussier) wrote :

Also noting that while a search for "o, africa!" fails, a search for "o, africa! : a novel" is successful, so it definitely seems to only be having trouble with punctuation at the beginning or end.

I think there are times when we want the ending punctuation to be ignored, especially when it is punctuation like a / that is not part of the title, but is part of cataloging practice. However, when the punctuation is part of the title, it should work.

Given the results of the 2.3 testing, I'm wondering if this change occurred as part of the large query parser changes that came with 2.4?

Michele, I removed the word exact from the title of the bug report so that it doesn't get confused with the "Matches Exactly" search option, which works a bit differently.

Changed in evergreen:
status: New → Confirmed
summary: - Searching the catalog for an exact phrase match fails when the search
- term begins or ends with certain characters.
+ Searching the catalog for a phrase match fails when the search term
+ begins or ends with certain characters.
Mike Rylander (mrylander) wrote :

I've got a fix for this coming soon. However, the example above of "footballers wive$" ends in the way that we have defined as "make this a right-anchored phrase search".

Mike Rylander (mrylander) wrote :

As promised:;a=shortlog;h=refs/heads/user/miker/lp1373601-phrase-search-punctuation

To perform unanchored phrase limits, we make sure that the phrase supplied
by the user does not end in the middle of a word by bounding the condition
with word-boundary bracket expresssions. However, if the phrase starts
or ends with a non-word character (that is, something other than numbers,
letters, or the underscore) then the word-boundary expression won't match.
The effect of this is to cause phrase searches starting or ending in
punctuation to fail when the user would not expect them to.

To address this, we now test the phrase for word-iness at the front and
back before applying word-boundary bracket expressions.

tags: added: pullrequest
Kathy Lussier (klussier) on 2016-04-01
Changed in evergreen:
milestone: none →
Kathy Lussier (klussier) wrote :

Thanks Mike! It looks good to me.

In addition to your note on the right-anchored search problem, I also want to make note of the fact that, with the "61*" example, the asterisk is treated as a wildcard. Users will successfully retrieve the record, but it may get pushed down in search results by lots of other results that match the truncated search term.

I've merged the fix to master and backported it to 2.10, 2.9, and 2.8.

Changed in evergreen:
status: Confirmed → Fix Committed
Changed in evergreen:
status: Fix Committed → Fix Released
Changed in evergreen:
milestone: → 2.11-alpha
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers