Query Parser chokes on some input, generates incorrect data structure on others

Bug #1251359 reported by Mike Rylander
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Medium
Unassigned
2.4
Fix Released
Medium
Unassigned

Bug Description

First, we didn't need to make $last_type local, and it broke explicit
grouping anyway. That's removed, and we now reset that (and a few more
like it) at calls to the top level parse() method.

Second, we are smarter about finding the boundary of atoms. Previous
to this commit, a curly brace could send the parser into a tailspin
from which it would not recover. Now we use alternation instead of
a character class, which is much safer with the default multi-character
float syntax specifier.

Third, as a catch-all, if we can't parse the remainder of a query we
now simply say so (when in debug mode) and go away, instead of risking
an infinite loop. We do this via a final, unqualified "else" clause
in decompose().

Finally, instead of building 10+ regexp objects on each query parse, cache them per
QP subclass and reuse them.

Top 2 commits on:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/QP-boolean-alternation-and-parsing

This branch also resolves bug 1251353.

Tags: pullrequest
Revision history for this message
Liam Whalen (whalen-ld) wrote :

I looked over the changes tonight, and they looked good to me. I tested the various search options on my dev machine and they worked. So, I pushed this commit to Sitka's production because we were having searches being run with {} in them due to a staff clients that are sending accented characters in a different format that normal.

For example:

Cancer du sujet {circ}ag{acute}e

I am not sure why the staff clients are sending the characters like this. I could not reproduce it with my staff clients. I am guessing it has something to do with localization or maybe the type of character sets available on the computers running those staff clients that send accented characters like that.

Either way, it was causing our production loads to run too high, which prompted the push to production.

Everything looks good on our servers, including the original problem with searching three ISBNs.

Revision history for this message
Mike Rylander (mrylander) wrote :

Thanks for testing, Liam! I've squashed these changes into one commit, increased the protective plan depth limit, and added tests for the two types of failing queries to query_tests.pl.

Revision history for this message
Dan Wells (dbw2) wrote :

Based on Liam's input and my own positive experience running this in production for a week or two, I went ahead and pushed this to master through rel_2_4. Thank you to both Liam for testing and Mike for producing the fixes.

Changed in evergreen:
status: New → Fix Committed
importance: Undecided → Medium
Ben Shum (bshum)
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.