Comment 3 for bug 1177986

Revision history for this message
Ken Harris (kengruven+lp) wrote :

Hmm, I see your point. I'd like to suggest a third possible option.

The Unicode standard has another flag on each character: Hex_Digit. This includes characters like FULLWIDTH_DIGIT_TWO and FULLWIDTH_LATIN_CAPITAL_LETTER_A, but not FEMININE_ORDINAL_INDICATOR or SUPERSCRIPT_TWO.

[http://en.wikipedia.org/wiki/Unicode_numerals#Hexadecimal_digits]

I don't know the exact definition of this flag, but it seems to me to be things which a user might reasonably use as hex digits, without either trying to be either super-tricky (and throwing weird numerals at us), or needing to change keyboard layouts (in order to get pure-ASCII from their number keys).

The big upside (and the reason that Unicode provides these properties, I believe) is that a user with a Japanese/Chinese/Korean keyboard setting can press the key marked "5" on their keyboard, and their software will recognize the fullwidth digit as the digit 5, even though it's not ASCII "5". It also means we only need to add the 16 "Fullwidth Form" digits, and don't need to do any decomposition.

The only downside I see is that this doesn't really scale beyond radix=16, but I think that allowing the Unicode Hex_Digit set up to hex, and then only ASCII for "g"/"G" through "z"/"Z" would be a fair compromise. I don't think I've ever actually seen a program that relied on parsing numbers of radix higher than 16, using the 0-9,A-Z set. (There's Base-64 encoding, but that uses a different ordering, and is case-sensitive, and adds other symbols at the end -- you can't use Common Lisp's numeric reading/printing support for that, anyway, no matter what we choose here.) Clearly it can't be that important to support Unicode decomposition out there, since SBCL has never supported non-ASCII letters for this.

I would be perfectly happy saying that G-Z radix support is ASCII-only, to meet the specification, and radix<=16 also works with the 16 Unicode fullwidth forms.