Security group names cannot contain at sign @ characters

Bug #1233501 reported by Matthew S
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Dashboard (Horizon)
Fix Released
Medium
Romain Hardouin

Bug Description

Attempting to use Dashboard to create a security group whose name contains an '@' (at) character fails.

Similar symptom and probably same root cause as bug #1224576 'Security group names cannot contain spaces': Django's 'validate_slug' validator only accepts letters, numbers, underscores and hyphens.

The purpose of this dashboard-level validation remains obscure to me. Is anyone able to explain the intent of this validation?

I can create security groups with names that contain an '@' character from the command line using "nova secgroup-create '@foo' bar", and after I've done so, Horizon seems functional so far - it can show and can even delete such a security group. It just can't create them.

I don't think validate_slug is the appropriate validator for these data.

Also, does 'validate_slug' really accept *all* "characters" as its documentation states? I am dubious.
What's a 'character' in this context, anyway?

Is a string consisting of a mixture of East Asian wide characters, Unicode private-use area characters that I personally intended to be interpreted as Klingon, and a Euro character for good measure, a valid security-group name?
These are all "characters" as defined by ISO-10646.

If a Django character is not the same as an ISO-10646 character, then I'd like to know what the differences are.

Revision history for this message
Matthew S (matts8484) wrote :

Actually its documentation says "letters", not "characters". So I'm wondering how a letter is different to a character, if it is.

Changed in horizon:
importance: Undecided → Medium
milestone: none → icehouse-1
status: New → Confirmed
Revision history for this message
Matthew S (matts8484) wrote :

Further to this, the reason why we want '@' characters in security group names to work in general is that StarCluster (http://star.mit.edu/cluster/) wants to create secgroups with names like these when it drives OpenStack via EC2 API.

It may be possible for the StarCluster/OpenStack EC2 interop to work even if Horizon continues to be unable to create such security groups, but even then, I think Horizon should be fixed for reasons of completeness and consistency with 'nova secgroup-create'.

Matthew S (matts8484)
tags: added: low-hanging-fruit
Changed in horizon:
assignee: nobody → Romain Hardouin (romain-hardouin)
Revision history for this message
Romain Hardouin (romain-hardouin) wrote :

Hi,

I looked at Nova client sources and it doesn't perform any validation.
Thus we are consistent if we just delete validate_slug.

Revision history for this message
Romain Hardouin (romain-hardouin) wrote :

Akihiro Motoki has added explanations about validation logic in comments of the bug #1224576.
We have to set up a validation logic, but not as restrictive as validate_slug.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to horizon (master)

Fix proposed to branch: master
Review: https://review.openstack.org/54007

Changed in horizon:
status: Confirmed → In Progress
Revision history for this message
Romain Hardouin (romain-hardouin) wrote : Re: Security group names cannot contain at characters

The validation that I've submitted consists of:
 * A [:print:] regex.
 * A length limited to 255 characters in accordance with Nova and Neutron back ends.

Feedbacks are welcome :)

Revision history for this message
Matthew S (matts8484) wrote :

Hi, this looks great to me. Thanks!

In my opinion it would be even better (maybe even perfect?) if it also allowed diacritics.
Then one could name one's security group (for example) "mon groupe sécurité".

But I understand this would add runtime overhead in Python's regex engine.
I'm an English speaker, and so are our users. So I defer to you speakers of languages with diacritics :-)

Revision history for this message
Romain Hardouin (romain-hardouin) wrote :

The most straightforward regex that come to my mind is to define a subset of the first 256 code points of Unicode (same as ISO 8859-1):
^[\x20-\x7E\xC0-\xD6\xD8-\xF6\xF8\xFF]*$

Thanks to this regex, we can typed diacritics e.g:
 - "Sécurité", as you mention
 - And of course many others: Ü, ß, Ø, ñ, ...

If I submit this regex, I have to comment it!

Revision history for this message
Matthew S (matts8484) wrote :

The empty string shouldn't be a valid security-group name - it might be better to use:
^[\x20-\x7E\xC0-\xD6\xD8-\xF6\xF8\xFF]+$
Note the '+' instead of a '*' towards the end.

But if you're going to go as far as supporting ISO 8859-x, then I think you may as well go 'all the way', and allow all UTF-8 constituent bytes, except for the dangerous/confusing low-ASCII control characters like CR, LF, TAB etc.

Then we would have full Unicode support (apart from the control characters), which I imagine would make East Asian users happy as well as us Europeans.
But more importantly, then I really could name my security group crazy things such as Klingon ;-)

So that would be:
^[\x20-\xfd]+$

I had a look; Django seems to have good Unicode support:

https://docs.djangoproject.com/en/dev/ref/unicode/

And in our Grizzly deployment, the column that eventually stored these names is UTF-8:

mysql> SELECT
    -> COLUMN_NAME,
    -> TABLE_NAME,
    -> CHARACTER_SET_NAME,
    -> COLUMN_TYPE
    -> COLLATION_NAME
    -> FROM information_schema.COLUMNS
    -> WHERE TABLE_SCHEMA = 'nova' and table_name='security_groups';
+-------------+-----------------+--------------------+----------------+
| COLUMN_NAME | TABLE_NAME | CHARACTER_SET_NAME | COLLATION_NAME |
+-------------+-----------------+--------------------+----------------+
| created_at | security_groups | NULL | datetime |
| updated_at | security_groups | NULL | datetime |
| deleted_at | security_groups | NULL | datetime |
| id | security_groups | NULL | int(11) |
| name | security_groups | utf8 | varchar(255) |
| description | security_groups | utf8 | varchar(255) |
| user_id | security_groups | utf8 | varchar(255) |
| project_id | security_groups | utf8 | varchar(255) |
| deleted | security_groups | NULL | int(11) |
+-------------+-----------------+--------------------+----------------+
9 rows in set (0.01 sec)

I'm worried that if users feed in ISO-8859-x name data, then one of two things might happen:
1. It might get interpreted as UTF-8, which for most but not all sequences with the high bit set, would yield invalid UTF-8 byte sequences, which are garbage;
or
2. MySQL could waste a lot of CPU converting incoming ISO8859-x data in queries to UTF-8 in order to execute the queries, then converting the results back to ISO8859-1 again. Or some other component would have to do this for it, with similar CPU costs.

The extended-ASCII encodings such as ISO8859-x are considered legacy these days anyway. I don't like the idea of using them for the first time now.

Je rêve d'un monde complètement Unicode! :-)
Ça marche assez bien ici avec Launchpad, mais ailleurs...

Revision history for this message
Romain Hardouin (romain-hardouin) wrote :

Matthew, don't worry about empty string, fields are required. So, if we leave the name or description empty, a message "This field is required" will be printed.
The "*" is here to be more generic if this validator need to be apply somewhere, on an potentially optional field (blank=True).

As for ISO-8859-1, maybe I was unclear. Forget the old name, terrific, ISO-8859-1 ;-)
Just read "subset of the first 256 code points of Unicode". This was my idea to allow diacritics.

I use the \x notation since to my mind: '\x20' == u'\u0020'. Is it confusing?
Also, I disallow \x7F to \x9F since they are control characters.

I agree with you, the best solution would be to allow all printable Unicode characters.
But AFAIK Python 2 regex doesn't allow unicode stuff [1] like \X or \p{L}\p{M}*+ which would also require possessive quantifiers.
So I don't know how to implement this full Unicode solution.

[1] : http://www.regular-expressions.info/unicode.html

Revision history for this message
Matthew S (matts8484) wrote :

The only problem I see is this: you're right that in most of the ISO 8859-x encodings (including -1), bytes with values 0x7f - 0x9f are control characters, in the UTF-8 encoding of Unicode, bytes with the high bit set ( > 0x80) are the lead byte of a multi-byte character sequence, except for 0xc0, 0xc1, 0xfe and 0xff, which are guaranteed never to appear in UTF-8 encoded data.

So if the regex disallows the ISO 8859-1 control characters, some valid UTF-8 data might also be rejected.

But all of this is assuming that the code is operating on this string as bytes, not as characters. If something has already converted this into a Python wide string a la u'foo', then characters with these values really are control characters, and not bytes that could be part of a UTF-8 multi-byte sequence. I don't know Django well enough to know what's going on here (sorry).

The key thing to grok is that 7-bit ASCII is a proper subset of UTF-8 - all ASCII is valid UTF-8.
But not all 8-bit 'extended ASCII' such as ISO 8859-1 is valid UTF-8.

On my Debian box, 'man utf-8' explains this much better than my attempt above.

This all is already quite confusing to me. I apologise if I have increased the complexity/confusion yet further!

Thierry Carrez (ttx)
Changed in horizon:
milestone: icehouse-1 → icehouse-2
Revision history for this message
Julie Pichon (jpichon) wrote :

I suspect perhaps the initial validation was trying to satisfy the strictest possible group name validation, which seems to be for EC2 group naming:

https://git.openstack.org/cgit/openstack/nova/tree/nova/api/ec2/cloud.py?id=feda2bf65e358303e46933b8b73e36e011389679#n702

Revision history for this message
Romain Hardouin (romain-hardouin) wrote :

Yes I had saw this. The if/else's comment (EC2 vs Amazon) had puzzled me.
So what do you think about [:print:]? Or UTF-8 as Matthew have suggested?

Revision history for this message
Romain Hardouin (romain-hardouin) wrote :

*has* suggested ;-)

Revision history for this message
Matthew S (matts8484) wrote :

I also don't understand the EC2-strict-validation conditional. I think it's supposed to mean 'strict validation', as opposed to 'EC2 versus Amazon' - as far as I know, EC2 *is* Amazon's API for this.

I think that either this strict validation is too strict, or that StarCluster (http://star.mit.edu/cluster/) is wrong to use '@' characters in group names.
But since StarCluster is intended to work (and as far as I know, does work) with Amazon EC2 APIs, from a high level, it should work with OpenStack's EC2 APIs, too. If Amazon don't follow their own specification, then maybe we shouldn't too, for bug-for-bug compatibility.

Re UTF-8, this code looks to me like it has a separate Unicode bug. But maybe this should be tracked separately to the rejection of '@' characters?

Here's my understanding of this code:

First, at line 701, the input string gets converted to UTF-8 if it's unicode. So characters which were printable Unicode characters will get converted to multi-byte UTF-8 sequences, where some of the bytes may have their high bits set.

Then, either at line 705 or 713 (depending whether strict validation is on or not), a regex is generated which will reject bytes with their high bits set.

For correct operation, either:
a. Good idea: the character-based regex check needs to be performed before the characters are converted to multi-byte UTF-8 sequences; or
b. Very bad idea: the regex needs to accept multi-byte UTF-8 sequences, and needs to enforce the UTF-8 validity rules. I'm not even sure this is possible with a regex. It shouldn't be attempted anyway; it would look hideous and would make the purpose of the regex obscure to readers of this code.

David Lyle (david-lyle)
Changed in horizon:
milestone: icehouse-2 → icehouse-3
Matthias Runge (mrunge)
Changed in horizon:
status: In Progress → Confirmed
assignee: Romain Hardouin (romain-hardouin) → nobody
David Medberry (med)
summary: - Security group names cannot contain at characters
+ Security group names cannot contain at sign @ characters
Thierry Carrez (ttx)
Changed in horizon:
milestone: icehouse-3 → icehouse-rc1
Changed in horizon:
assignee: nobody → Romain Hardouin (romain-hardouin)
Revision history for this message
Romain Hardouin (romain-hardouin) wrote :

We are under StringFreeze so I don't think that this bug could be done in icehouse RC 1 since we change the validation message.

David Lyle (david-lyle)
Changed in horizon:
milestone: icehouse-rc1 → next
Changed in horizon:
status: Confirmed → In Progress
Revision history for this message
Akihiro Motoki (amotoki) wrote :

I don't have a strong opinion on UTF8 names.

IIRC there was a discussion on UTF-8 name for Nova side (and it can be expanded to other OpenStack projects), though I don't remember and find the pointer of the discussion.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to horizon (master)

Reviewed: https://review.openstack.org/54007
Committed: https://git.openstack.org/cgit/openstack/horizon/commit/?id=80d9049fb6e6c7bf5eb262612e6ad3e8a541c051
Submitter: Jenkins
Branch: master

commit 80d9049fb6e6c7bf5eb262612e6ad3e8a541c051
Author: Romain Hardouin <email address hidden>
Date: Thu Apr 3 23:52:13 2014 +0200

    Allow all printable ASCII characters in security group names

    Thus far, the security group name was checked with Django's validate_slug.
    Consequently, some characters like spaces (bug #1224576) and @ (bug #1233501) were forbidden.
    This fix removes validate_slug and provides a less restrictive validation.
    Now, all printable ASCII characters -- i.e. from 0x20 to 0x7E -- are valid.
    Note that diacritics are disallowed.
    On top of that the length is now checked and limited to 255 characters in accordance with Nova.

    Security group forms have been factorized in one abstract base class: GroupBase.
    CreateGroup and UpdateGroup subclass GroupBase in order to be more DRY.

    Change-Id: Ifc8e5f75419a73a353b2138641773d36138ecea8
    Closes-Bug: #1224576
    Closes-Bug: #1233501

Changed in horizon:
status: In Progress → Fix Committed
Akihiro Motoki (amotoki)
Changed in horizon:
milestone: next → kilo-rc1
Thierry Carrez (ttx)
Changed in horizon:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in horizon:
milestone: kilo-rc1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.