tpac: "does not contain" advanced search option doesn't always exclude search terms

Bug #1019360 reported by Kathy Lussier
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Invalid
Undecided
Unassigned

Bug Description

Evergreen version: 2.2

I tried a search from the advanced search page with "contains" "martin luther" in the first search box and ""does not contain" "king" in the second search box. In several 2.2 catalogs, the search did not exclude the term king from the search results.

See:
http://bark.cwmars.org/eg/opac/results?bool=and&qtype=keyword&contains=contains&query=martin+luther&bool=and&qtype=keyword&contains=contains&query=king&bool=and&qtype=keyword&contains=contains&query=&sort=&locg=1&pubdate=is&date1=&date2=&_adv=1

or

http://evergreen.noblenet.org/eg/opac/results?bool=and&qtype=keyword&contains=contains&query=martin+luther&bool=and&qtype=keyword&contains=nocontains&query=king&bool=and&qtype=keyword&contains=contains&query=&sort=&locg=1&pubdate=is&date1=&date2=&_adv=1

However, it doesn't seem to happen in ever tpac catalog. http://catalog.mvlc.org/eg/opac/advanced is an example of one catalog where this search will work.

After discussing this with bshum in IRC, we think it might just be happening in catalogs where additional indexes have been added to the keyword index. For example, a system may have added a title index with the keyword class to add extra weight to the title in relevancy.

Trying a single search string like martin luther -king works fine, and the above search also works fine when doing it in jspac.

Kathy Lussier (klussier)
description: updated
Revision history for this message
Kathy Lussier (klussier) wrote :

I was mistaken in one part of my report. I am seeing cases where a -keyword search is also failing. For example, if I try the following search at bark.cwmars.org:

twilight -meyer

I will retrieve records for the Stephanie Meyer book.

Once again, this only seems to happen at sites where additional indexes were added to the keyword search. In the above example, it also is a problem in the jspac catalog, so it looks like this problem isn't isolated to tpac.

Revision history for this message
Mike Rylander (mrylander) wrote :

This is a consequence of having the ability to group indexed fields into classes. That is, title proper, uniform title and abbreviated title all grouped together under the class of "title" for instance.

Imagine a record containing:

 Title proper: the x of y and z
 Uniform title: x of y and z
 Abbreviated title: x of y

Now, imagine a search of "title: x y -z". Because that does match the abbreviated title, the record is included in the results. There is (at least one) way we could address this by using the 'setweight' functionality [1] available in Postgres' full text search, which is intended for intra-document weighting but can be abused to simply segregate sections of a document, but that would limit us to four fields (or, more specifically, field chunks) per class.

A way around this using the current Evergreen mechanisms is to make heavier use of aliases in the search class dropdown. You can specify exactly which fields you want grouped together under an alias, and use that alias in the dropdown, and avoid searching the "extra" indexed fields in a class when the common case does not require them. IOW, you can create your own "subclasses" within a class, using aliases, and simply use those aliases in the search type dropdown /instead of/ the base, builtin classes there.

Does that make sense, and help some?

[1] http://www.postgresql.org/docs/9.1/interactive/textsearch-controls.html#TEXTSEARCH-PARSING-DOCUMENTS

Revision history for this message
Kathy Lussier (klussier) wrote :

Thanks Mike! The cause of the problem does make sense; it may take a little more time and experimentation for me to grasp the proposed workaround.

However, I think I may have confused the issue by adding to this ticket this morning. In my earlier testing, there was a definite difference in the search results I retrieved when I used the "does not contain" option on the advanced search screen and when I simply added the excluded search term as -king. See:

http://bark.cwmars.org/eg/opac/results?bool=and&qtype=keyword&contains=contains&query=martin+luther&bool=and&qtype=keyword&contains=nocontains&query=king&bool=and&qtype=keyword&contains=contains&query=&sort=&locg=1&pubdate=is&date1=&date2=&_adv=1

vs.

http://bark.cwmars.org/eg/opac/results?fi%3Aitem_type=&query=martin+luther+-king&qtype=keyword&locg=1

One thing we noticed is that the "does not contain" search added my excluded term in the search box surrounded by quotation marks as: -"king" Removing those quotation marks led to better results. I the "does not contain" king search could behave more similarly to the -king search, it would be a plus.

Revision history for this message
Mike Rylander (mrylander) wrote :

That is, indeed, a TPAC-specific change from the past. I'm honestly not sure of the reason for that, but it says "does not contain exactly" and given that the "+" operator is equivalent to surrounding a work in quotes (a single word phrase search) I suspect it's for parity with that. There may be a more practical reason, though.

If the results are better without that quoting transformation and less surprising for the user (and, since a user can do that themselves) we should remove that transformation. Perhaps a poll of the situation on the -general list is in order?

Revision history for this message
Kathy Lussier (klussier) wrote :

Thanks again Mike! I can send out a message to the -general list if that is helpful.

Kathy

Revision history for this message
Kathy Lussier (klussier) wrote :

I haven't received any feedback yet on the list, but, if there is support to remove the quotes from the "does not contain" search, I created a branch in working to do so.

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/kmlussier/lp1019360

tags: added: pullrequest
Revision history for this message
Dan Scott (denials) wrote :

Confirmed that the same problem is evident at Conifer.

Changed in evergreen:
status: New → Confirmed
Revision history for this message
Ben Shum (bshum) wrote :

Anything to be done about this bug now that new QueryParser is on the scene for 2.4? Or maybe just something for 2.2/2.3?

Revision history for this message
Kathy Lussier (klussier) wrote :

Wow! This is an old bug report. This problem seems to be resolved with the query parser work. Marking this bug invalid.

Changed in evergreen:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.