Comment 8 for bug 1868232

Revision history for this message
Dan Watkins (oddbloke) wrote :

> in punnycode -- is the encoding of -, thus i wouldn't want to use that.
>
> In [36]: "-_☃.com".encode('idna')
> Out[36]: b'xn---_-gsx.com'

This isn't the case; "xn--" is the prefix used to indicate a domain name label is encoded with Punycode. I don't believe Punycode modifies ASCII characters in its input (though it of course does change their position (and order relative to non-ASCII characters)):

In [17]: "-\u3061".encode('idna') == "a\u3061".encode('idna').replace(b'a', b'-')
Out[17]: True

Regardless, I think increasing the length of the string is asking for problems, so I think "--" would a mistake.

Replacing them with "-" also poses a problem: "-" isn't a valid character at the start or end of domain name labels; rewriting "foo_.example.com" to "foo-.example.com" isn't any more valid (and, in fact, as "-"s are _explicitly_ disallowed there, I bet it's actually worse in practical terms as naive implementations may reject the latter but not the former).

I think I'm going to go for "replace non-LDH characters with a single dash, then strip leading/trailing dashes". (It won't be too costly to modify, so if someone has a better idea please let me know.)