Bug #1233501 “Security group names cannot contain at sign @ char...” : Bugs : OpenStack Dashboard (Horizon)

Revision history for this message

Matthew S (matts8484) wrote on 2013-10-01:

#1

Actually its documentation says "letters", not "characters". So I'm wondering how a letter is different to a character, if it is.

Gabriel Hurley (gabriel-hurley) on 2013-10-01

Changed in horizon:
importance:	Undecided → Medium
milestone:	none → icehouse-1
status:	New → Confirmed

Revision history for this message

Matthew S (matts8484) wrote on 2013-10-02:

#2

Further to this, the reason why we want '@' characters in security group names to work in general is that StarCluster (http://star.mit.edu/cluster/) wants to create secgroups with names like these when it drives OpenStack via EC2 API.

It may be possible for the StarCluster/OpenStack EC2 interop to work even if Horizon continues to be unable to create such security groups, but even then, I think Horizon should be fixed for reasons of completeness and consistency with 'nova secgroup-create'.

Matthew S (matts8484) on 2013-10-03

tags:

added: low-hanging-fruit

Romain Hardouin (romain-hardouin) on 2013-10-03

Changed in horizon:
assignee:	nobody → Romain Hardouin (romain-hardouin)

Revision history for this message

Romain Hardouin (romain-hardouin) wrote on 2013-10-03:

#3

Hi,

I looked at Nova client sources and it doesn't perform any validation.
Thus we are consistent if we just delete validate_slug.

Revision history for this message

Romain Hardouin (romain-hardouin) wrote on 2013-10-03:

#4

Akihiro Motoki has added explanations about validation logic in comments of the bug #1224576.
We have to set up a validation logic, but not as restrictive as validate_slug.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-10-26: Fix proposed to horizon (master)

#5

Fix proposed to branch: master
Review: https://review.openstack.org/54007

Changed in horizon:
status:	Confirmed → In Progress

Revision history for this message

Romain Hardouin (romain-hardouin) wrote on 2013-10-27: Re: Security group names cannot contain at characters

#6

The validation that I've submitted consists of:
* A [:print:] regex.
* A length limited to 255 characters in accordance with Nova and Neutron back ends.

Feedbacks are welcome :)

Revision history for this message

Matthew S (matts8484) wrote on 2013-10-28:

#7

Hi, this looks great to me. Thanks!

In my opinion it would be even better (maybe even perfect?) if it also allowed diacritics.
Then one could name one's security group (for example) "mon groupe sécurité".

But I understand this would add runtime overhead in Python's regex engine.
I'm an English speaker, and so are our users. So I defer to you speakers of languages with diacritics :-)

Revision history for this message

Romain Hardouin (romain-hardouin) wrote on 2013-10-28:

#8

The most straightforward regex that come to my mind is to define a subset of the first 256 code points of Unicode (same as ISO 8859-1):
^[\x20-\x7E\xC0-\xD6\xD8-\xF6\xF8\xFF]*$

Thanks to this regex, we can typed diacritics e.g:
- "Sécurité", as you mention
- And of course many others: Ü, ß, Ø, ñ, ...

If I submit this regex, I have to comment it!

Revision history for this message

Matthew S (matts8484) wrote on 2013-10-30:

#9

The empty string shouldn't be a valid security-group name - it might be better to use:
^[\x20-\x7E\xC0-\xD6\xD8-\xF6\xF8\xFF]+$
Note the '+' instead of a '*' towards the end.

But if you're going to go as far as supporting ISO 8859-x, then I think you may as well go 'all the way', and allow all UTF-8 constituent bytes, except for the dangerous/confusing low-ASCII control characters like CR, LF, TAB etc.

Then we would have full Unicode support (apart from the control characters), which I imagine would make East Asian users happy as well as us Europeans.
But more importantly, then I really could name my security group crazy things such as Klingon ;-)

So that would be:
^[\x20-\xfd]+$

I had a look; Django seems to have good Unicode support:

https://docs.djangoproject.com/en/dev/ref/unicode/

And in our Grizzly deployment, the column that eventually stored these names is UTF-8:

I'm worried that if users feed in ISO-8859-x name data, then one of two things might happen:
1. It might get interpreted as UTF-8, which for most but not all sequences with the high bit set, would yield invalid UTF-8 byte sequences, which are garbage;
or
2. MySQL could waste a lot of CPU converting incoming ISO8859-x data in queries to UTF-8 in order to execute the queries, then converting the results back to ISO8859-1 again. Or some other component would have to do this for it, with similar CPU costs.

The extended-ASCII encodings such as ISO8859-x are considered legacy these days anyway. I don't like the idea of using them for the first time now.

Je rêve d'un monde complètement Unicode! :-)
Ça marche assez bien ici avec Launchpad, mais ailleurs...

The empty string shouldn't be a valid security-group name - it might be better to  use:
^[\x20-\x7E\xC0-\xD6\xD8-\xF6\xF8\xFF]+$
Note the '+' instead of a '*' towards the end.

But if you're going to go as far as supporting ISO 8859-x, then I think you may as well go 'all the way', and allow all UTF-8 constituent bytes, except for the dangerous/confusing low-ASCII control characters like CR, LF, TAB etc.

Then we would have full Unicode support (apart from the control characters), which I imagine would make East Asian users happy as well as us Europeans.
But more importantly, then I really could name my security group crazy things such as Klingon ;-)

So that would be:
^[\x20-\xfd]+$

I had a look; Django seems to have good Unicode support:

https://docs.djangoproject.com/en/dev/ref/unicode/

And in our Grizzly deployment, the column that eventually stored these names is UTF-8:

I'm worried that if users feed in ISO-8859-x name data, then one of two things might happen:
1. It might get interpreted as UTF-8, which for most but not all sequences with the high bit set, would yield invalid UTF-8 byte sequences, which are garbage;
or
2. MySQL could waste a lot of CPU converting incoming ISO8859-x data in queries to UTF-8 in order to execute the queries, then converting the results back to ISO8859-1 again. Or some other component would have to do this for it, with similar CPU costs.

The extended-ASCII encodings such as ISO8859-x are considered legacy these days anyway. I don't like the idea of using them for the first time now.

Je rêve d'un monde complètement Unicode! :-)
Ça marche assez bien ici avec Launchpad, mais ailleurs...

Revision history for this message

Romain Hardouin (romain-hardouin) wrote on 2013-10-30:

#10

Matthew, don't worry about empty string, fields are required. So, if we leave the name or description empty, a message "This field is required" will be printed.
The "*" is here to be more generic if this validator need to be apply somewhere, on an potentially optional field (blank=True).

As for ISO-8859-1, maybe I was unclear. Forget the old name, terrific, ISO-8859-1 ;-)
Just read "subset of the first 256 code points of Unicode". This was my idea to allow diacritics.

I use the \x notation since to my mind: '\x20' == u'\u0020'. Is it confusing?
Also, I disallow \x7F to \x9F since they are control characters.

I agree with you, the best solution would be to allow all printable Unicode characters.
But AFAIK Python 2 regex doesn't allow unicode stuff [1] like \X or \p{L}\p{M}*+ which would also require possessive quantifiers.
So I don't know how to implement this full Unicode solution.

[1] : http://www.regular-expressions.info/unicode.html

Revision history for this message

Matthew S (matts8484) wrote on 2013-10-30:

#11

The only problem I see is this: you're right that in most of the ISO 8859-x encodings (including -1), bytes with values 0x7f - 0x9f are control characters, in the UTF-8 encoding of Unicode, bytes with the high bit set ( > 0x80) are the lead byte of a multi-byte character sequence, except for 0xc0, 0xc1, 0xfe and 0xff, which are guaranteed never to appear in UTF-8 encoded data.

So if the regex disallows the ISO 8859-1 control characters, some valid UTF-8 data might also be rejected.

But all of this is assuming that the code is operating on this string as bytes, not as characters. If something has already converted this into a Python wide string a la u'foo', then characters with these values really are control characters, and not bytes that could be part of a UTF-8 multi-byte sequence. I don't know Django well enough to know what's going on here (sorry).

The key thing to grok is that 7-bit ASCII is a proper subset of UTF-8 - all ASCII is valid UTF-8.
But not all 8-bit 'extended ASCII' such as ISO 8859-1 is valid UTF-8.

On my Debian box, 'man utf-8' explains this much better than my attempt above.

This all is already quite confusing to me. I apologise if I have increased the complexity/confusion yet further!

Thierry Carrez (ttx) on 2013-12-03

Changed in horizon:
milestone:	icehouse-1 → icehouse-2

Revision history for this message

Julie Pichon (jpichon) wrote on 2013-12-17:

#12

I suspect perhaps the initial validation was trying to satisfy the strictest possible group name validation, which seems to be for EC2 group naming:

https://git.openstack.org/cgit/openstack/nova/tree/nova/api/ec2/cloud.py?id=feda2bf65e358303e46933b8b73e36e011389679#n702

Revision history for this message

Romain Hardouin (romain-hardouin) wrote on 2013-12-17:

#13

Yes I had saw this. The if/else's comment (EC2 vs Amazon) had puzzled me.
So what do you think about [:print:]? Or UTF-8 as Matthew have suggested?

Revision history for this message

Romain Hardouin (romain-hardouin) wrote on 2013-12-17:

#14

*has* suggested ;-)

Revision history for this message

Matthew S (matts8484) wrote on 2013-12-17:

#15

I also don't understand the EC2-strict-validation conditional. I think it's supposed to mean 'strict validation', as opposed to 'EC2 versus Amazon' - as far as I know, EC2 *is* Amazon's API for this.

I think that either this strict validation is too strict, or that StarCluster (http://star.mit.edu/cluster/) is wrong to use '@' characters in group names.
But since StarCluster is intended to work (and as far as I know, does work) with Amazon EC2 APIs, from a high level, it should work with OpenStack's EC2 APIs, too. If Amazon don't follow their own specification, then maybe we shouldn't too, for bug-for-bug compatibility.

Re UTF-8, this code looks to me like it has a separate Unicode bug. But maybe this should be tracked separately to the rejection of '@' characters?

Here's my understanding of this code:

First, at line 701, the input string gets converted to UTF-8 if it's unicode. So characters which were printable Unicode characters will get converted to multi-byte UTF-8 sequences, where some of the bytes may have their high bits set.

Then, either at line 705 or 713 (depending whether strict validation is on or not), a regex is generated which will reject bytes with their high bits set.

For correct operation, either:
a. Good idea: the character-based regex check needs to be performed before the characters are converted to multi-byte UTF-8 sequences; or
b. Very bad idea: the regex needs to accept multi-byte UTF-8 sequences, and needs to enforce the UTF-8 validity rules. I'm not even sure this is possible with a regex. It shouldn't be attempted anyway; it would look hideous and would make the purpose of the regex obscure to readers of this code.

David Lyle (david-lyle) on 2014-01-21

Changed in horizon:
milestone:	icehouse-2 → icehouse-3

Matthias Runge (mrunge) on 2014-01-24

Changed in horizon:
status:	In Progress → Confirmed
assignee:	Romain Hardouin (romain-hardouin) → nobody

David Medberry (med) on 2014-02-18

summary:

- Security group names cannot contain at characters
+ Security group names cannot contain at sign @ characters

Thierry Carrez (ttx) on 2014-03-05

Changed in horizon:
milestone:	icehouse-3 → icehouse-rc1

Romain Hardouin (romain-hardouin) on 2014-03-18

Changed in horizon:
assignee:	nobody → Romain Hardouin (romain-hardouin)

Revision history for this message

Romain Hardouin (romain-hardouin) wrote on 2014-03-18:

#16

We are under StringFreeze so I don't think that this bug could be done in icehouse RC 1 since we change the validation message.

David Lyle (david-lyle) on 2014-03-18

Changed in horizon:
milestone:	icehouse-rc1 → next

OpenStack Infra (hudson-openstack) on 2014-04-03

Changed in horizon:
status:	Confirmed → In Progress

Revision history for this message

Akihiro Motoki (amotoki) wrote on 2014-04-05:

#17

I don't have a strong opinion on UTF8 names.

IIRC there was a discussion on UTF-8 name for Nova side (and it can be expanded to other OpenStack projects), though I don't remember and find the pointer of the discussion.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-03-14: Fix merged to horizon (master)

#18

Reviewed: https://review.openstack.org/54007
Committed: https://git.openstack.org/cgit/openstack/horizon/commit/?id=80d9049fb6e6c7bf5eb262612e6ad3e8a541c051
Submitter: Jenkins
Branch: master

commit 80d9049fb6e6c7bf5eb262612e6ad3e8a541c051
Author: Romain Hardouin <email address hidden>
Date: Thu Apr 3 23:52:13 2014 +0200

Allow all printable ASCII characters in security group names

    Thus far, the security group name was checked with Django's validate_slug.
    Consequently, some characters like spaces (bug #1224576) and @ (bug #1233501) were forbidden.
    This fix removes validate_slug and provides a less restrictive validation.
    Now, all printable ASCII characters -- i.e. from 0x20 to 0x7E -- are valid.
    Note that diacritics are disallowed.
    On top of that the length is now checked and limited to 255 characters in accordance with Nova.

Security group forms have been factorized in one abstract base class: GroupBase.
CreateGroup and UpdateGroup subclass GroupBase in order to be more DRY.

    Change-Id: Ifc8e5f75419a73a353b2138641773d36138ecea8
    Closes-Bug: #1224576
    Closes-Bug: #1233501

Changed in horizon:
status:	In Progress → Fix Committed

Akihiro Motoki (amotoki) on 2015-04-09

Changed in horizon:
milestone:	next → kilo-rc1

Thierry Carrez (ttx) on 2015-04-14

Changed in horizon:
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2015-04-30

Changed in horizon:
milestone:	kilo-rc1 → 2015.1.0

OpenStack Dashboard (Horizon)

Security group names cannot contain at sign @ characters

Bug Description

Other bug subscribers

Remote bug watches