Query Parser chokes on some input, generates incorrect data structure on others

Bug #1251359 reported by Mike Rylander on 2013-11-14
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Medium
Unassigned
2.4
Medium
Unassigned

Bug Description

First, we didn't need to make $last_type local, and it broke explicit
grouping anyway. That's removed, and we now reset that (and a few more
like it) at calls to the top level parse() method.

Second, we are smarter about finding the boundary of atoms. Previous
to this commit, a curly brace could send the parser into a tailspin
from which it would not recover. Now we use alternation instead of
a character class, which is much safer with the default multi-character
float syntax specifier.

Third, as a catch-all, if we can't parse the remainder of a query we
now simply say so (when in debug mode) and go away, instead of risking
an infinite loop. We do this via a final, unqualified "else" clause
in decompose().

Finally, instead of building 10+ regexp objects on each query parse, cache them per
QP subclass and reuse them.

Top 2 commits on:

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/miker/QP-boolean-alternation-and-parsing

This branch also resolves bug 1251353.

Liam Whalen (whalen-ld) wrote :

I looked over the changes tonight, and they looked good to me. I tested the various search options on my dev machine and they worked. So, I pushed this commit to Sitka's production because we were having searches being run with {} in them due to a staff clients that are sending accented characters in a different format that normal.

For example:

Cancer du sujet {circ}ag{acute}e

I am not sure why the staff clients are sending the characters like this. I could not reproduce it with my staff clients. I am guessing it has something to do with localization or maybe the type of character sets available on the computers running those staff clients that send accented characters like that.

Either way, it was causing our production loads to run too high, which prompted the push to production.

Everything looks good on our servers, including the original problem with searching three ISBNs.

Mike Rylander (mrylander) wrote :

Thanks for testing, Liam! I've squashed these changes into one commit, increased the protective plan depth limit, and added tests for the two types of failing queries to query_tests.pl.

Dan Wells (dbw2) wrote :

Based on Liam's input and my own positive experience running this in production for a week or two, I went ahead and pushed this to master through rel_2_4. Thank you to both Liam for testing and Mike for producing the fixes.

Changed in evergreen:
status: New → Fix Committed
importance: Undecided → Medium
Ben Shum (bshum) on 2013-12-06
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers