TPAC: Search grammar in record data affects links in surprising ways

Bug #1065383 reported by Dan Scott
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Undecided
Unassigned

Bug Description

* Evergreen 2.3.0

Building on bug # 856811 where we discovered that the # symbol in facets caused searches to fail, I've found that the problem goes deeper; it's not just facets, it's almost any link that we surface in the TPAC that uses search syntax drawn from the MARC data, including author: / title: / negation operator search syntax. I added another commit to user/dbs/pound_facets in working to demonstrate the current problems (search for "Trombone concerto").

I then added a third commit to address some of the low-hanging fruit in the TPAC that should produce fewer negatively surprising results. From the commit log:

"""
Expand the list of filtered characters to cover all of the special characters documented for the Evergreen search grammar (http://evergreen-ils.org/dokuwiki/doku.php?id=documentation:technical:search_grammar) when generating links in the TPAC so as to avoid inadvertently launching filtered searches when a user clicks on something that should just be a display value.

For example, if a title includes "Presenting a subject: tips for consultants", it should _not_ launch a search for "subject" containing "tips for consultants".

This commit addresses most of the link problems in the record display, as well as the author links in the search results table.

Still problematic are the facets (which seem to rely on exact matching, such that filtering out the problematic characters is itself problematic) and autocomplete (which requires modifying the Autocomplete Dojo widget).

In addition, this commit makes the series code actually display, as it was using a non-standard method to attempt to return the results from the BLOCK (and failing). Also, it makes the links for authors in the record details match the MODS32 definition for personal name parts and only use the "acdq" subfields. This enables a click on the link to actually return results; previously, in the case where the author field included (for example) a subfield "g" value, that value would be included in the generated link and would likely lead to 0 hits.

For authors, we substitute with a space rather than just eliding the substituted value. Authors are particularly likely to have dates like 1899-1978; "1899 1978" matches, but "18991978" will not.

Perhaps we should take the same approach with the others, or break down the search/replace logic a little further (for example, we could remove the "-" only if it is preceded by a space or is at the start of the string and is followed immediately by a character, and preserve it if it is surrounded by digits). But this seems to take us pretty far down the road of less negatively surprising results.
"""

So... please see the three commits in http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/dbs/pound_facets for some test data & some fixes that I believe should be merged to at least rel_2_3.

Tags: pullrequest
Bill Erickson (berick)
Changed in evergreen:
assignee: nobody → Bill Erickson (erickson-esilibrary)
status: New → In Progress
Revision history for this message
Bill Erickson (berick) wrote :

* Confirmed the removal subfield 'g' for author search
* Confirmed removal of special chars in subject links
* Confirmed "Search for related items by.." is showing up.

Signed off and pushed to master, rel_2_3.

Related, I found an issue with series extraction/display:

For "Trombone Concerto", the related series section shows two entries, one for "American Trombone Concertos ;" (440a) and one for (440v) "2". Possible solutions are to change the XPATH to '//*[@tag="' _ tag _ '"]' to pick up the 440 as a whole blob or limit it to the 440a (or similar). Not sure which is preferred...

In hindsight, I should have waited to push these commits until this was resolved, but got ahead of myself...

Changed in evergreen:
assignee: Bill Erickson (erickson-esilibrary) → nobody
status: In Progress → Confirmed
Changed in evergreen:
milestone: 2.3.1 → 2.4.0-alpha
Revision history for this message
Dan Scott (denials) wrote :

Wait. Why is the milestone for this now 2.4.0-alpha? The fixes for the bugs that were described when the bug was opened were pushed to master. Bill noted a different bug in passing, but that should be copied/pasted into a new bug and this one should be closed (with a milestone of 2.3.1, I believe).

Even then, the milestone should be 2.3.2; this has been a series of bug fixes, not a new feature.

Revision history for this message
Dan Scott (denials) wrote :

Opened bug 1083796 to carry on the bug fixery, and am setting the milestone for this to 2.3.1.

no longer affects: evergreen/2.3
Changed in evergreen:
milestone: 2.4.0-alpha → 2.3.2
status: Confirmed → Fix Committed
Revision history for this message
Jason Stephenson (jstephenson) wrote :

I set the milestone to 2.4.0-alpha because it was not flagged fix committed at the time and 2.3.1 had come and gone. Guess it will get "fix released" when 2.3.2 is out.

Dunno if it should affect 2.3 or not at this point. It probably should, but it doesn't really matter that much.

Ben Shum (bshum)
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.