Comment 15 for bug 1233501

Revision history for this message
Matthew S (matts8484) wrote : Re: Security group names cannot contain at characters

I also don't understand the EC2-strict-validation conditional. I think it's supposed to mean 'strict validation', as opposed to 'EC2 versus Amazon' - as far as I know, EC2 *is* Amazon's API for this.

I think that either this strict validation is too strict, or that StarCluster (http://star.mit.edu/cluster/) is wrong to use '@' characters in group names.
But since StarCluster is intended to work (and as far as I know, does work) with Amazon EC2 APIs, from a high level, it should work with OpenStack's EC2 APIs, too. If Amazon don't follow their own specification, then maybe we shouldn't too, for bug-for-bug compatibility.

Re UTF-8, this code looks to me like it has a separate Unicode bug. But maybe this should be tracked separately to the rejection of '@' characters?

Here's my understanding of this code:

First, at line 701, the input string gets converted to UTF-8 if it's unicode. So characters which were printable Unicode characters will get converted to multi-byte UTF-8 sequences, where some of the bytes may have their high bits set.

Then, either at line 705 or 713 (depending whether strict validation is on or not), a regex is generated which will reject bytes with their high bits set.

For correct operation, either:
a. Good idea: the character-based regex check needs to be performed before the characters are converted to multi-byte UTF-8 sequences; or
b. Very bad idea: the regex needs to accept multi-byte UTF-8 sequences, and needs to enforce the UTF-8 validity rules. I'm not even sure this is possible with a regex. It shouldn't be attempted anyway; it would look hideous and would make the purpose of the regex obscure to readers of this code.