fn:match fails if the string is non-utf8
Bug #867159 reported by
Daniel Turcanu
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Zorba |
Fix Released
|
Medium
|
Paul J. Lucas |
Bug Description
I have a query that reads a lot of files and apply fn:match on them.
Some files have non-utf8 characters, and file:read-text reads with no problem.
fn:matches calls to_string to convert to ICU string, but that fails. So fn:matches returns false, although I think it should raise an error. Actually to_string should raise an error, otherwise the non-utf8 problem gets unnoticed.
Related branches
lp:~zorba-coders/zorba/feature-transcode_streambuf
- Matthias Brantner: Approve
- Paul J. Lucas: Approve
-
Diff: 2967 lines (+1874/-555)37 files modifiedChangeLog (+4/-0)
include/zorba/internal/proxy.h (+48/-0)
include/zorba/pregenerated/diagnostic_list.h (+4/-0)
include/zorba/transcode_stream.h (+213/-0)
modules/ExternalModules.conf (+1/-1)
modules/com/zorba-xquery/www/modules/http-client.xq (+2/-2)
modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.cpp (+337/-338)
modules/com/zorba-xquery/www/modules/http-client.xq.src/curl_stream_buffer.h (+164/-143)
modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.cpp (+71/-21)
modules/com/zorba-xquery/www/modules/http-client.xq.src/http_response_parser.h (+10/-6)
modules/com/zorba-xquery/www/modules/pregenerated/errors.xq (+8/-0)
modules/org/expath/ns/file.xq.src/file.cpp (+25/-10)
modules/org/expath/ns/file.xq.src/file_function.cpp (+0/-5)
modules/org/expath/ns/file.xq.src/file_function.h (+5/-9)
modules/org/expath/ns/file.xq.src/file_module.cpp (+2/-5)
modules/org/expath/ns/file.xq.src/file_module.h (+13/-6)
src/api/CMakeLists.txt (+1/-0)
src/api/transcode_streambuf.cpp (+102/-0)
src/diagnostics/diagnostic_en.xml (+8/-0)
src/diagnostics/pregenerated/diagnostic_list.cpp (+6/-0)
src/diagnostics/pregenerated/dict_en.cpp (+2/-0)
src/unit_tests/CMakeLists.txt (+4/-6)
src/unit_tests/test_icu_streambuf.cpp (+151/-0)
src/unit_tests/unit_test_list.h (+5/-0)
src/unit_tests/unit_tests.cpp (+3/-0)
src/util/CMakeLists.txt (+6/-1)
src/util/icu_streambuf.cpp (+300/-0)
src/util/icu_streambuf.h (+140/-0)
src/util/passthru_streambuf.cpp (+105/-0)
src/util/passthru_streambuf.h (+76/-0)
src/util/transcode_streambuf.h (+47/-0)
test/rbkt/ExpQueryResults/zorba/file/cp1252.xml.res (+1/-0)
test/rbkt/Queries/zorba/file/cp1252.txt (+1/-0)
test/rbkt/Queries/zorba/file/cp1252.xq (+3/-0)
test/rbkt/Queries/zorba/file/invalid_encoding.spec (+1/-0)
test/rbkt/Queries/zorba/file/invalid_encoding.xq (+3/-0)
test/rbkt/Queries/zorba/http-client/send-request/http2-read-svg.xq (+2/-2)
Changed in zorba: | |
assignee: | nobody → Matthias Brantner (matthias-brantner) |
milestone: | none → 2.2 |
Changed in zorba: | |
status: | New → Fix Committed |
Changed in zorba: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
We (meaning several Zorba team members) had a long discussion about what to do for invalid UTF-8 byte sequences a while ago and the consensus reached was that the validity of UTF-8 byte sequence should be checked only on entry into Zorba and not after it's been read in. So if the byte sequence is to be checked at all, it should be checked in read-text and any error should be raised there.