Zorba hangs with invalid utf-8 input
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Zorba |
Fix Released
|
Medium
|
Paul J. Lucas |
Bug Description
import module namespace file = "http://
file:read-
or
fn:unparsed-
Either of these queries causes Zorba to hang at 100% CPU.
The problem is that the input file (attached) is actually utf-16 with a BOM. Since we lie and tell Zorba the input is utf-8 (the same thing also happens if we don't specify the encoding since Zorba assumes utf-8), the stream is passed through untouched. This causes the implementation of string-length() to busy-loop, specifically in utf8::length(), because the char_length() of the byte \xFE (the first byte in the file) is 0.
Related branches
- Matthias Brantner: Approve
- Paul J. Lucas: Approve
-
Diff: 978 lines (+294/-191)11 files modifiedChangeLog (+1/-0)
src/api/serialization/serializer.cpp (+6/-2)
src/runtime/strings/strings_impl.cpp (+65/-72)
src/util/unicode_util.cpp (+10/-5)
src/util/unicode_util.h (+10/-0)
src/util/utf8_streambuf.cpp (+6/-2)
src/util/utf8_util.cpp (+82/-54)
src/util/utf8_util.h (+7/-4)
src/util/utf8_util.tcc (+14/-11)
src/util/utf8_util_base.h (+88/-37)
src/util/zorba_regex_engine.cpp (+5/-4)
Changed in zorba: | |
status: | New → In Progress |
Changed in zorba: | |
status: | In Progress → Fix Committed |
Changed in zorba: | |
status: | Fix Committed → Fix Released |
My recollection is that in earlier discussions about Zorba verifying user input, we agreed that it should only do so at the boundaries - ie, data should be checked for validity when it first enters Zorba, and not as it flows through.
However, we aren't currently performing this input check for streaming data. Paul has written a streaming UTF-8 checker, utf8::streambuf, which verifies the data is valid UTF-8 as it comes in. However, it is very challenging and fragile to use correctly, as he explains here: https:/ /code.launchpad .net/~zorba- coders/ zorba/bug107317 5/+merge/ 144762/ comments/ 318167
I do not believe this is a viable solution, so I have asked for a more encapsulated solution here: https:/ /bugs.launchpad .net/zorba/ +bug/1073175/ comments/ 3
But... given that this bug is no longer causing FOTS failures and is apparently only triggered by providing bogus input, I am currently marking this bug as "Medium" priority without a specific milestone.