The empty string shouldn't be a valid security-group name - it might be better to use:
^[\x20-\x7E\xC0-\xD6\xD8-\xF6\xF8\xFF]+$
Note the '+' instead of a '*' towards the end.
But if you're going to go as far as supporting ISO 8859-x, then I think you may as well go 'all the way', and allow all UTF-8 constituent bytes, except for the dangerous/confusing low-ASCII control characters like CR, LF, TAB etc.
Then we would have full Unicode support (apart from the control characters), which I imagine would make East Asian users happy as well as us Europeans.
But more importantly, then I really could name my security group crazy things such as Klingon ;-)
So that would be:
^[\x20-\xfd]+$
I had a look; Django seems to have good Unicode support:
I'm worried that if users feed in ISO-8859-x name data, then one of two things might happen:
1. It might get interpreted as UTF-8, which for most but not all sequences with the high bit set, would yield invalid UTF-8 byte sequences, which are garbage;
or
2. MySQL could waste a lot of CPU converting incoming ISO8859-x data in queries to UTF-8 in order to execute the queries, then converting the results back to ISO8859-1 again. Or some other component would have to do this for it, with similar CPU costs.
The extended-ASCII encodings such as ISO8859-x are considered legacy these days anyway. I don't like the idea of using them for the first time now.
Je rêve d'un monde complètement Unicode! :-)
Ça marche assez bien ici avec Launchpad, mais ailleurs...
The empty string shouldn't be a valid security-group name - it might be better to use: \x7E\xC0- \xD6\xD8- \xF6\xF8\ xFF]+$
^[\x20-
Note the '+' instead of a '*' towards the end.
But if you're going to go as far as supporting ISO 8859-x, then I think you may as well go 'all the way', and allow all UTF-8 constituent bytes, except for the dangerous/confusing low-ASCII control characters like CR, LF, TAB etc.
Then we would have full Unicode support (apart from the control characters), which I imagine would make East Asian users happy as well as us Europeans.
But more importantly, then I really could name my security group crazy things such as Klingon ;-)
So that would be:
^[\x20-\xfd]+$
I had a look; Django seems to have good Unicode support:
https:/ /docs.djangopro ject.com/ en/dev/ ref/unicode/
And in our Grizzly deployment, the column that eventually stored these names is UTF-8:
mysql> SELECT schema. COLUMNS 'security_ groups' ; ------- +------ ------- ----+-- ------- ------- ----+-- ------- ------- + ------- +------ ------- ----+-- ------- ------- ----+-- ------- ------- + ------- +------ ------- ----+-- ------- ------- ----+-- ------- ------- +
-> COLUMN_NAME,
-> TABLE_NAME,
-> CHARACTER_SET_NAME,
-> COLUMN_TYPE
-> COLLATION_NAME
-> FROM information_
-> WHERE TABLE_SCHEMA = 'nova' and table_name=
+------
| COLUMN_NAME | TABLE_NAME | CHARACTER_SET_NAME | COLLATION_NAME |
+------
| created_at | security_groups | NULL | datetime |
| updated_at | security_groups | NULL | datetime |
| deleted_at | security_groups | NULL | datetime |
| id | security_groups | NULL | int(11) |
| name | security_groups | utf8 | varchar(255) |
| description | security_groups | utf8 | varchar(255) |
| user_id | security_groups | utf8 | varchar(255) |
| project_id | security_groups | utf8 | varchar(255) |
| deleted | security_groups | NULL | int(11) |
+------
9 rows in set (0.01 sec)
I'm worried that if users feed in ISO-8859-x name data, then one of two things might happen:
1. It might get interpreted as UTF-8, which for most but not all sequences with the high bit set, would yield invalid UTF-8 byte sequences, which are garbage;
or
2. MySQL could waste a lot of CPU converting incoming ISO8859-x data in queries to UTF-8 in order to execute the queries, then converting the results back to ISO8859-1 again. Or some other component would have to do this for it, with similar CPU costs.
The extended-ASCII encodings such as ISO8859-x are considered legacy these days anyway. I don't like the idea of using them for the first time now.
Je rêve d'un monde complètement Unicode! :-)
Ça marche assez bien ici avec Launchpad, mais ailleurs...