Comment 2 for bug 1906584

Revision history for this message
Christophe Rhodes (csr21-cantab) wrote : Re: [Bug 1906584] Re: char-upcase, char-downcase misbehave on some characters

Douglas Katzman <email address hidden> writes:

> It's debatable. We're trying to accord with Unicode, not with an
> obsolete spec that didn't fully anticipate characters that themselves
> are neither upper nor lower-case, but have a way to convert to upper
> or lower case.
> To take one of your examples, https://www.compart.com/en/unicode/U+01F2 says that
> (code-char #x1f2) has an upper-case character (#x1f1) and a lower-case character (#1f3).

I think we have the ability to distinguish between the Unicode
operations, which probably should never be done character-by-character
anyway, and the specified Lisp operations, and given that we have to do
that I think we should try to follow the invariants in the specified
lisp operations if we can.

I think we do already support treating various characters as not having
case even though Unicode says they do:

  (lower-case-p #\ß) ; => NIL
  (sb-unicode:lowercase-p #\ß) ; => T

and I think that we should probably do the same thing for these
characters and case changing, to preserve the specified invariants.
(Yes, they might do a different thing from Unicode-specified case
functions; that's fine, we have exported functions in SB-UNICODE for
that.) This affects the (currently) four titlecase characters with case
mappings to lower and upper case.

So the remaining problem is that tools-for-build/ucd.lisp is an
unreadable pile of magic, and every time I upgrade the version of
Unicode I say to myself that I need to rewrite it completely so that
it's understandable, and every time I run out of energy (and lately I've
run out of energy even to start the Unicode upgrade process, so we're
substantially out of date). There is code in tools-for-build/ucd.lisp
that builds a cases table; it probably "just" needs a small modification
to hold this additional data.