JSON parser doesn't recognize UTF-16 surrogate pairs
Bug #1024448 reported by
Dennis Knochenwefel
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Zorba |
Fix Released
|
High
|
Paul J. Lucas |
Bug Description
The JSON parser doesn't recognize UTF-16 surrogate pairs, e.g., the byte sequence "\ud83d\udc4a" is currently converted to two separate Unicode code-points when it ought to recognize that as a UTF-16 surrogate pair and result in the Unicode code-point of 1F44A.
Related branches
lp:~paul-lucas/zorba/bug-1024448
- Dennis Knochenwefel: Approve
- Paul J. Lucas: Approve
-
Diff: 290 lines (+123/-43)3 files modifiedsrc/unit_tests/test_json_parser.cpp (+39/-28)
src/util/json_parser.cpp (+46/-11)
src/util/unicode_util.h (+38/-4)
Changed in zorba: | |
importance: | Undecided → High |
description: | updated |
Changed in zorba: | |
status: | Incomplete → In Progress |
Changed in zorba: | |
status: | In Progress → Fix Committed |
Changed in zorba: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
If there are 3 problems, there should be 3 different bugs, not all lumped together into a single bug. I have nothing to do with either html:parse() or tidy.
As for the 3rd bug, you don't say what the error is that it should report. What exactly is wrong with JSON parsing?