Sentence is incorrectly incremented when token characters end without sentence terminator, take 2
Bug #924063 reported by
Paul J. Lucas
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Zorba |
Fix Released
|
Medium
|
Paul J. Lucas |
Bug Description
The original bug (bug #863320) was fixed, but then it caused other tests to fail (bug #897800), so the fix was reverted so the release could be done. This new bug is to fix the original bug without causing any other tests to fail.
The original bug was:
The following query:
let $x := <msg>hello world</msg>
return $x contains text "hello" ftand "world" same sentence
incorrectly returns "false" because tokenizer incorrectly increments the sentence number when there are no more characters without encountering a sentence terminating character.
Related branches
lp:~paul-lucas/zorba/bug-924063
- Matthias Brantner: Approve
- Paul J. Lucas: Approve
-
Diff: 106 lines (+17/-13)7 files modifiedChangeLog (+1/-0)
src/runtime/full_text/icu_tokenizer.cpp (+12/-5)
test/rbkt/Queries/CMakeLists.txt (+0/-4)
test/rbkt/Queries/zorba/fulltext/ft-same-sentence-false-2.xq (+1/-1)
test/rbkt/Queries/zorba/fulltext/ft-same-sentence-true-2.xq (+1/-1)
test/rbkt/Queries/zorba/fulltext/ft-same-sentence-true-3.xq (+1/-1)
test/rbkt/Queries/zorba/fulltext/ft-same-sentence-true-4.xq (+1/-1)
Changed in zorba: | |
status: | New → In Progress |
Changed in zorba: | |
status: | In Progress → Fix Committed |
Changed in zorba: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
It turns out that the original bug fixes were correct. It happens that ICU uses more than just sentence terminating characters (like '.') to know when a sentence ends: the first letter of the first word after the '.' has to be capitalized. Hence the tests were wrong, e.g., "hello. world". Once that test was changed to "Hello. World" it passed.