Comment 3 for bug 842223

Revision history for this message
Martin Packman (gz) wrote :

> Please note that the issue that was pointed out by the reviewer is
> already fixed in bzr.dev, and is different from what you're pointing out
> here. That was about displaying "locations", which is different from the
> representation we use internally anyway.

I put that in the bug report not because I thought anyone had dropped the ball, just that it was relevant that the change was deliberate and not an oversight.

> rfc1738 considers tilde one of the unsafe characters and says that all
> unsafe characters must always be encoded within a URL.
> (page 30)

And in rfc3986 it's an unreserved character, hence the endless confusion over tildes in urls. :)

It also calls out the problem with comparing equivalent but unequal values, and makes a recommendation:

<http://www.ietf.org/rfc/rfc3986.txt>

   URIs that differ in the replacement of an unreserved character with
   its corresponding percent-encoded US-ASCII octet are equivalent: they
   identify the same resource. However, URI comparison implementations
   do not always perform normalization prior to comparison (see Section
   6). For consistency, percent-encoded octets in the ranges of ALPHA
   (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
   underscore (%5F), or tilde (%7E) should not be created by URI
   producers and, when found in a URI, should be decoded to their
   corresponding unreserved characters by URI normalizers.