Comment 1 for bug 1178038

Revision history for this message
Christophe Rhodes (csr21-cantab) wrote : Re: [Bug 1178038] [NEW] Unassigned Unicode codepoints are reported as upper-case alphabetic characters with decimal value 0

Ken Harris <email address hidden> writes:

 status inprogress
 importance low
 assignee csr21-cantab
 done

> One part of the problem could be that SBCL lacks the general category
> "Cn" ("Unassigned"):
>
> SBCL general categories: https://github.com/sbcl/sbcl/blob/master/tools-for-build/ucd.lisp#L169-L172
> Unicode general categories: http://www.unicode.org/reports/tr44/#Property_Values
>
> I don't know that it's as easy as adding "Cn" to this list, though,
> because there's code in SBCL (like in target-char.lisp) that checks
> general category by index, like (< gc 5) or (= gc 12). Adding a new
> value here would change the indexes. (Maybe it's safe to add to the
> end?)

It is basically safe to add to the end, with some wrinkles in buiding
the tables in the first place. I have a fix for this, unfortunately
currently tangled up in the middle of all the rest of the Unicode tree
that I'm working on.