Unassigned Unicode codepoints are reported as upper-case alphabetic characters with decimal value 0
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
SBCL |
Fix Released
|
Low
|
Christophe Rhodes |
Bug Description
SBCL's character functions report unassigned Unicode codepoints as upper-case letters.
For example, U+0378 is unassigned (as of Unicode 6.2: http://
* (alpha-char-p #\u0378)
T ;; expected: NIL
Internal functions are also affected. Here's why they claim to be uppercase letters:
* (sb-impl:
0 ;; "Lu" -- see *general-
They also claim to represent the decimal value 0:
* (sb-impl:
0 ;; expected: NIL
One part of the problem could be that SBCL lacks the general category "Cn" ("Unassigned"):
SBCL general categories: https:/
Unicode general categories: http://
I don't know that it's as easy as adding "Cn" to this list, though, because there's code in SBCL (like in target-char.lisp) that checks general category by index, like (< gc 5) or (= gc 12). Adding a new value here would change the indexes. (Maybe it's safe to add to the end?)
VERSION INFORMATION:
$ sbcl --version
SBCL 1.1.3
$ uname -a
Darwin Ken-Harris-
* *features*
(:ALIEN-CALLBACKS :ANSI-CL :BSD :C-STACK-
:COMPARE-
:DARWIN9-OR-BETTER :FLOAT-EQL-VOPS :GENCGC :IEEE-FLOATING-
:INLINE-CONSTANTS :INODE64 :LINKAGE-TABLE :LITTLE-ENDIAN
:MACH-
:OS-PROVIDES-
:OS-PROVIDES-PUTWC :OS-PROVIDES-
:SB-EVAL :SB-LDB :SB-PACKAGE-LOCKS :SB-SOURCE-
:SB-UNICODE :SBCL :STACK-
:STACK-
:STACK-
:UNWIND-
Changed in sbcl: | |
status: | In Progress → Fix Committed |
information type: | Public → Public Security |
information type: | Public Security → Public |
Changed in sbcl: | |
status: | Fix Committed → Fix Released |
Ken Harris <email address hidden> writes:
status inprogress
importance low
assignee csr21-cantab
done
> One part of the problem could be that SBCL lacks the general category /github. com/sbcl/ sbcl/blob/ master/ tools-for- build/ucd. lisp#L169- L172 www.unicode. org/reports/ tr44/#Property_ Values
> "Cn" ("Unassigned"):
>
> SBCL general categories: https:/
> Unicode general categories: http://
>
> I don't know that it's as easy as adding "Cn" to this list, though,
> because there's code in SBCL (like in target-char.lisp) that checks
> general category by index, like (< gc 5) or (= gc 12). Adding a new
> value here would change the indexes. (Maybe it's safe to add to the
> end?)
It is basically safe to add to the end, with some wrinkles in buiding
the tables in the first place. I have a fix for this, unfortunately
currently tangled up in the middle of all the rest of the Unicode tree
that I'm working on.