KARL3

When we try to use chinese characters in a tag, we an error:

Bug #724819 reported by Edvard on 2011-02-25

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	KARL3	Invalid	Low	Unassigned	KARL3 m63

Bug Description

Tag is as follow: ( utf-8):

國語

Error is as follows:

"Adding tag failed: Value contains characters that are not allowed in a tag."

Paul Everitt (paul-agendaless) on 2011-04-23

Changed in karl3:
assignee:	nobody → Chris Rossi (chris-archimedeanco)
importance:	Undecided → Low
milestone:	none → m56

Chris Rossi (chris-archimedeanco) on 2011-06-20

Changed in karl3:
status:	New → In Progress

Revision history for this message

Chris Rossi (chris-archimedeanco) wrote on 2011-06-20:

I've looked into this and I think I could fix this if this check were being done server side. Validation for tags is done using this regular expression:

"^[a-zA-Z0-9\-\._]+$"

This could be changed to this, which is equivalent for ascii input:

"^[\w\d\-\._]+$"

The difference is that in Python if the re.UNICODE flag is supplied, the \w and \d metacharacters match any "word" or "digit" characters as defined in Unicode, and not just ascii a-zA-Z and 0-9 respectively.

While there does seem to be an implementation of this log server side, I can't tell exactly when, if ever, it is called. In most cases the equivalent client side logic is called. This logic uses the same regular expression for validation, but per the ecmascript standard, \w and \d are not unicode-aware and match only their ascii equivalents. (Some cursory googling reveals that some browsers break the standard here and implement these as unicode-aware, but behavior is not uniform. Chromium appears to follow the standard.)

I did find as a result of some Googling, this js library which claims to provide Unicode regular expressions:

http://xregexp.com/plugins/

Potentially, also, we could use a server call to do the validation and consolidate all tag validation into a single place in Python code.

Since the solution for this is going to involve the client side code, this might be a good candidate for handing to Balasz.

Changed in karl3:
status:	In Progress → Confirmed

Paul Everitt (paul-agendaless) on 2011-06-21

Changed in karl3:
milestone:	m61 → m62

Revision history for this message

Paul Everitt (paul-agendaless) wrote on 2011-06-24:

Balazs, could you read Chris's last comment and see what you think?

Changed in karl3:
assignee:	Chris Rossi (chris-archimedeanco) → Balazs Ree (ree)

Paul Everitt (paul-agendaless) on 2011-07-01

Changed in karl3:
milestone:	m62 → m63

Balazs Ree (ree) on 2011-07-05

Changed in karl3:
status:	Confirmed → In Progress

Revision history for this message

Balazs Ree (ree) wrote on 2011-07-05:

First, a policy question. I may be wrong at this point but the original policy I was aware of, was that the tags do not contain anything else than ascii. This excludes not only Chinese, but any unicode characters. Other examples currently prohibited are the Hungarian unicodes: áéíóöőúüű

Then, the issue of filtering the accepted tags on the server. Although it would be possible to actually do this on the server, because we make a tag search at the same time when adding, even on pages (so ajax is involved anyway) I would be fo

For the implementation, I agree with Chris that the most sane solution is using xregexp. I tested this on the client and it works satisfying the following requirements:

- from the ascii set 0-127, only the alphanumeric and the _-. are accepted.

- from anything else of ascii (any unicode): only the word characters are accepted.

Once Paul confirms that this complies the policy, I can do the remaining fixup on the server.

Revision history for this message

Paul Everitt (paul-agendaless) wrote on 2011-07-05: Re: [Bug 724819] Re: When we try to use chinese characters in a tag, we an error:

Correct, the policy is that tags are pure ASCII. Stuff that requires no quoting to display in a URL. So the only thing we need to do is, ensure they fail gracefully.

--Paul

On Jul 5, 2011, at 7:51 AM, Balazs Ree wrote:

> First, a policy question. I may be wrong at this point but the original
> policy I was aware of, was that the tags do not contain anything else
> than ascii. This excludes not only Chinese, but any unicode characters.
> Other examples currently prohibited are the Hungarian unicodes:
> áéíóöőúüű
>
>
> Then, the issue of filtering the accepted tags on the server. Although it would be possible to actually do this on the server, because we make a tag search at the same time when adding, even on pages (so ajax is involved anyway) I would be fo
>
> For the implementation, I agree with Chris that the most sane solution
> is using xregexp. I tested this on the client and it works satisfying
> the following requirements:
>
> - from the ascii set 0-127, only the alphanumeric and the _-. are
> accepted.
>
> - from anything else of ascii (any unicode): only the word characters
> are accepted.
>
> Once Paul confirms that this complies the policy, I can do the remaining
> fixup on the server.
>
> --
> You received this bug notification because you are subscribed to KARL3.
> https://bugs.launchpad.net/bugs/724819
>
> Title:
> When we try to use chinese characters in a tag, we an error:
>
> Status in KARL3:
> In Progress
>
> Bug description:
> Tag is as follow: ( utf-8):
>
> 國語
>
> Error is as follows:
>
> "Adding tag failed: Value contains characters that are not allowed in
> a tag."
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/karl3/+bug/724819/+subscriptions

Revision history for this message

Balazs Ree (ree) wrote on 2011-07-05:

I believe currently they do. It gives the error message that those characters are not allowed: all fine.

Revision history for this message

Paul Everitt (paul-agendaless) wrote on 2011-07-05:

OSF's decision early on was to only have ASCII tags.

Changed in karl3:
assignee:	Balazs Ree (ree) → nobody
status:	In Progress → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.