Query Parser chokes on some input, generates incorrect data structure on others
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Evergreen |
Fix Released
|
Medium
|
Unassigned | ||
2.4 |
Fix Released
|
Medium
|
Unassigned |
Bug Description
First, we didn't need to make $last_type local, and it broke explicit
grouping anyway. That's removed, and we now reset that (and a few more
like it) at calls to the top level parse() method.
Second, we are smarter about finding the boundary of atoms. Previous
to this commit, a curly brace could send the parser into a tailspin
from which it would not recover. Now we use alternation instead of
a character class, which is much safer with the default multi-character
float syntax specifier.
Third, as a catch-all, if we can't parse the remainder of a query we
now simply say so (when in debug mode) and go away, instead of risking
an infinite loop. We do this via a final, unqualified "else" clause
in decompose().
Finally, instead of building 10+ regexp objects on each query parse, cache them per
QP subclass and reuse them.
Top 2 commits on:
This branch also resolves bug 1251353.
Changed in evergreen: | |
status: | Fix Committed → Fix Released |
I looked over the changes tonight, and they looked good to me. I tested the various search options on my dev machine and they worked. So, I pushed this commit to Sitka's production because we were having searches being run with {} in them due to a staff clients that are sending accented characters in a different format that normal.
For example:
Cancer du sujet {circ}ag{acute}e
I am not sure why the staff clients are sending the characters like this. I could not reproduce it with my staff clients. I am guessing it has something to do with localization or maybe the type of character sets available on the computers running those staff clients that send accented characters like that.
Either way, it was causing our production loads to run too high, which prompted the push to production.
Everything looks good on our servers, including the original problem with searching three ISBNs.