Multi-char escapes wrongly forbidden in character class
Bug #1022762 reported by
Paul J. Lucas
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Zorba |
Fix Released
|
High
|
Paul J. Lucas |
Bug Description
If you have a character range, e.g., A-Z, then the end-point chars in the range can be SingleCharEsc. A while ago, a "fix" was made for this, but the "fix" went too far and forbids MultiCharEsc within charClassExpr.
Related branches
lp:~paul-lucas/zorba/bug-1022762
- Matthias Brantner: Approve
- Paul J. Lucas: Approve
-
Diff: 210 lines (+82/-97)2 files modifiedsrc/util/regex.cpp (+80/-97)
test/rbkt/Queries/CMakeLists.txt (+2/-0)
Changed in zorba: | |
status: | New → In Progress |
Changed in zorba: | |
status: | In Progress → Fix Committed |
Changed in zorba: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
Removing the "fix" code results in the regex_err16.xq test failing. That test is:
fn:matches("a", "[\s-e]")
The charClassExpr is invalid because, in character ranges, only SingleCharEsc are allowed and \s is a MultiCharEsc. ICU doesn't detect this and the test just returns "false."
Adding a proper fix for this would involve adding more state to the regex parser and knowing when we're within a character class *and* within a character range, i.e.:
if ( in_char_class && c == '-' && prev_c_was_an_esc && !prev_c_ was_a_single_ char_esc )
throw an exception