Comment 9 for bug 1233501

Revision history for this message
Matthew S (matts8484) wrote : Re: Security group names cannot contain at characters

The empty string shouldn't be a valid security-group name - it might be better to use:
^[\x20-\x7E\xC0-\xD6\xD8-\xF6\xF8\xFF]+$
Note the '+' instead of a '*' towards the end.

But if you're going to go as far as supporting ISO 8859-x, then I think you may as well go 'all the way', and allow all UTF-8 constituent bytes, except for the dangerous/confusing low-ASCII control characters like CR, LF, TAB etc.

Then we would have full Unicode support (apart from the control characters), which I imagine would make East Asian users happy as well as us Europeans.
But more importantly, then I really could name my security group crazy things such as Klingon ;-)

So that would be:
^[\x20-\xfd]+$

I had a look; Django seems to have good Unicode support:

https://docs.djangoproject.com/en/dev/ref/unicode/

And in our Grizzly deployment, the column that eventually stored these names is UTF-8:

mysql> SELECT
    -> COLUMN_NAME,
    -> TABLE_NAME,
    -> CHARACTER_SET_NAME,
    -> COLUMN_TYPE
    -> COLLATION_NAME
    -> FROM information_schema.COLUMNS
    -> WHERE TABLE_SCHEMA = 'nova' and table_name='security_groups';
+-------------+-----------------+--------------------+----------------+
| COLUMN_NAME | TABLE_NAME | CHARACTER_SET_NAME | COLLATION_NAME |
+-------------+-----------------+--------------------+----------------+
| created_at | security_groups | NULL | datetime |
| updated_at | security_groups | NULL | datetime |
| deleted_at | security_groups | NULL | datetime |
| id | security_groups | NULL | int(11) |
| name | security_groups | utf8 | varchar(255) |
| description | security_groups | utf8 | varchar(255) |
| user_id | security_groups | utf8 | varchar(255) |
| project_id | security_groups | utf8 | varchar(255) |
| deleted | security_groups | NULL | int(11) |
+-------------+-----------------+--------------------+----------------+
9 rows in set (0.01 sec)

I'm worried that if users feed in ISO-8859-x name data, then one of two things might happen:
1. It might get interpreted as UTF-8, which for most but not all sequences with the high bit set, would yield invalid UTF-8 byte sequences, which are garbage;
or
2. MySQL could waste a lot of CPU converting incoming ISO8859-x data in queries to UTF-8 in order to execute the queries, then converting the results back to ISO8859-1 again. Or some other component would have to do this for it, with similar CPU costs.

The extended-ASCII encodings such as ISO8859-x are considered legacy these days anyway. I don't like the idea of using them for the first time now.

Je rêve d'un monde complètement Unicode! :-)
Ça marche assez bien ici avec Launchpad, mais ailleurs...