re00737 fails
Bug #1131990 reported by
Paul J. Lucas
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Zorba |
Fix Released
|
Critical
|
Paul J. Lucas |
Bug Description
The FOTS test re00737:
(every $s in tokenize('', ',')
satisfies matches($s, '^(?:[^
and
(every $s in tokenize(
satisfies not(matches($s, '^(?:[^
fails. It may be due to the interaction between negative characters classes and range subtraction. This may be a limitation of ICU.
Related branches
lp:~paul-lucas/zorba/bug-1131990
- Matthias Brantner: Approve
- Paul J. Lucas: Approve
-
Diff: 107 lines (+37/-3)3 files modifiedChangeLog (+2/-0)
src/util/icu_regex.cpp (+35/-1)
test/fots/CMakeLists.txt (+0/-2)
Changed in zorba: | |
importance: | Undecided → Critical |
milestone: | none → 2.9 |
summary: |
- FOTS: re00737 fails + re00737 fails |
Changed in zorba: | |
status: | Triaged → New |
tags: | removed: regex |
description: | updated |
Changed in zorba: | |
assignee: | Paul J. Lucas (paul-lucas) → Markos Zaharioudakis (markos-za) |
Changed in zorba: | |
status: | New → In Progress |
Changed in zorba: | |
status: | In Progress → Fix Committed |
Changed in zorba: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
It turns out that the failure has nothing to do with regular expressions. The query:
matches( "","^(? :[^cde- [ag]]+) $")
correctly returns false because the empty string "" does not match one or more ('+') characters that are not [cde]. However the query:
every $s in tokenize("",",") satisfies matches( $s,"^(? :[^cde- [ag]]+) $")
incorrectly returns true. Setting a breakpoint in strings_ impl.cpp: 1521 does nothing because in the latter case, the code never gets there so matches() is never called. Why not?