Comment 1 for bug 724819

Revision history for this message
Chris Rossi (chris-archimedeanco) wrote :

I've looked into this and I think I could fix this if this check were being done server side. Validation for tags is done using this regular expression:

"^[a-zA-Z0-9\-\._]+$"

This could be changed to this, which is equivalent for ascii input:

"^[\w\d\-\._]+$"

The difference is that in Python if the re.UNICODE flag is supplied, the \w and \d metacharacters match any "word" or "digit" characters as defined in Unicode, and not just ascii a-zA-Z and 0-9 respectively.

While there does seem to be an implementation of this log server side, I can't tell exactly when, if ever, it is called. In most cases the equivalent client side logic is called. This logic uses the same regular expression for validation, but per the ecmascript standard, \w and \d are not unicode-aware and match only their ascii equivalents. (Some cursory googling reveals that some browsers break the standard here and implement these as unicode-aware, but behavior is not uniform. Chromium appears to follow the standard.)

I did find as a result of some Googling, this js library which claims to provide Unicode regular expressions:

http://xregexp.com/plugins/

Potentially, also, we could use a server call to do the validation and consolidate all tag validation into a single place in Python code.

Since the solution for this is going to involve the client side code, this might be a good candidate for handing to Balasz.