Comment 4 for bug 1177986

Revision history for this message
Ken Harris (kengruven+lp) wrote :

To keep the ball rolling on this, I wrote a sample (inefficient) implementation of my latter idea. No Unicode decomposition involved, and the ALPHANUMERICP invariant works. It follows the Decimal_Digit and Hex_Digit properties, plus adds ASCII letters G-Z for higher radices (which I consider a 'legacy' part of the CL spec, as I've never seen it used).

While writing this, I came up with another reason that DIGIT-CHAR-P should return T for non-ASCII digits: consistency with ALPHA-CHAR-P. Since ALPHA-CHAR-P returns T for (lots of) non-ASCII alphabetic characters, I would expect DIGIT-CHAR-P to do the same for non-ASCII digit characters.

Another possible choice, then, would be to make both ALPHA-CHAR-P and DIGIT-CHAR-P only return T for ASCII characters. It'd be internally consistent. I don't personally think that would be preferable to extending DIGIT-CHAR-P to other Unicode digits, but I would accept that that's one way to solve this problem -- especially if SBCL is going to be adding more powerful Unicode functionality. (Then we'd probably end up with a "trivial-unicode" package, to unify what all the different compilers do. Again, not my favorite solution, since I think we'd be throwing away the flexibility that the CL spec gave us here, and making people learn and use a completely new set of functions for Unicode-aware programs.)