When we try to use chinese characters in a tag, we an error:
Bug #724819 reported by
Edvard
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
KARL3 |
Invalid
|
Low
|
Unassigned |
Bug Description
Tag is as follow: ( utf-8):
國語
Error is as follows:
"Adding tag failed: Value contains characters that are not allowed in a tag."
Changed in karl3: | |
assignee: | nobody → Chris Rossi (chris-archimedeanco) |
importance: | Undecided → Low |
milestone: | none → m56 |
Changed in karl3: | |
status: | New → In Progress |
Changed in karl3: | |
milestone: | m61 → m62 |
Changed in karl3: | |
milestone: | m62 → m63 |
Changed in karl3: | |
status: | Confirmed → In Progress |
To post a comment you must log in.
I've looked into this and I think I could fix this if this check were being done server side. Validation for tags is done using this regular expression:
"^[a-zA- Z0-9\-\ ._]+$"
This could be changed to this, which is equivalent for ascii input:
"^[\w\d\-\._]+$"
The difference is that in Python if the re.UNICODE flag is supplied, the \w and \d metacharacters match any "word" or "digit" characters as defined in Unicode, and not just ascii a-zA-Z and 0-9 respectively.
While there does seem to be an implementation of this log server side, I can't tell exactly when, if ever, it is called. In most cases the equivalent client side logic is called. This logic uses the same regular expression for validation, but per the ecmascript standard, \w and \d are not unicode-aware and match only their ascii equivalents. (Some cursory googling reveals that some browsers break the standard here and implement these as unicode-aware, but behavior is not uniform. Chromium appears to follow the standard.)
I did find as a result of some Googling, this js library which claims to provide Unicode regular expressions:
http:// xregexp. com/plugins/
Potentially, also, we could use a server call to do the validation and consolidate all tag validation into a single place in Python code.
Since the solution for this is going to involve the client side code, this might be a good candidate for handing to Balasz.