Comment 2 for bug 1177986

Revision history for this message
Christophe Rhodes (csr21-cantab) wrote : Re: [Bug 1177986] [NEW] DIGIT-CHAR-P not correct for non-ASCII digit chars

 summary "ALPHANUMERIC/DIGIT-CHAR-P invariant broken with Unicode"
 status triaged
 importance medium
 done

Ken Harris <email address hidden> writes:

> The Hyperspec page for ALPHANUMERICP even makes this relationship
> explicit:
>
> (alphanumericp x)
> == (or (alpha-char-p x) (not (null (digit-char-p x))))

I haven't thought this through properly, but I think that my preferred
resolution to this invariance breakage is actually to restrict
digit-char-p to the ascii set, rather than extending it to fullwidth
digit variants and similar. The reason I say that is that if you expect
(digit-char-p #\FULLWIDTH_DIGIT_TWO 11) to be 2 (and I agree that that's
reasonable, if not the only possible thing) you might also expect
(digit-char-p #\FULLWIDTH_LATIN_CAPITAL_LETTER_A 11) to be 10, which is
perhaps a little more surprising but still not impossible, because we
could just take compatibility decompositions of characters, right?
Except that then (digit-char-p #\FEMININE_ORDINAL_INDICATOR 11) would
also be 10, which is frankly not expected at all.

Of course, restricting digit-char-p to interpreting only ascii digits as
numbers is irritating to those who want to work with Unicode. But I
think the answer to that is to provide and export richer Unicode
functionality, so that users can legitimately work with the Unicode data
that we store. (In my own slow way I am working on this; my github fork
of sbcl has an update to Unicode 6.2 and the beginnings of
normalization, sadly not yet complete).