Ubuntu Website Product

Ubuntu using non-printable characters in URL/URI's

Reported by NoOp on 2011-05-08
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Website
Undecided
Unassigned

Bug Description

Ubuntu web pages use non-printable characters in URL/URI's. Example:

https://wiki.ubuntu.com/NattyNarwhal/ReleaseNotes#Known Issues

There are many more (help.ubuntu.com is full of them), but that particular URL has both a # sign as well as a non-printable space.

Another:
https://help.ubuntu.com/community/ReportingBugs#4. Collect information about the bug
The browser client is required to %20 the non-printable spaces:
https://help.ubuntu.com/community/ReportingBugs#4.%20Collect%20information%20about%20the%20bug

RFC 1738
http://tools.ietf.org/html/rfc1738
2.2. URL Character Encoding Issues
...
<quote>
No corresponding graphic US-ASCII:

   URLs are written only with the graphic printable characters of the
   US-ASCII coded character set. The octets 80-FF hexadecimal are not
   used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent
   control characters; these must be encoded.

   Unsafe:

   Characters can be unsafe for a number of reasons. The space
   character is unsafe because significant spaces may disappear and
   insignificant spaces may be introduced when URLs are transcribed or
   typeset or subjected to the treatment of word-processing programs.
   The characters "<" and ">" are unsafe because they are used as the
   delimiters around URLs in free text; the quote mark (""") is used to
   delimit URLs in some systems. The character "#" is unsafe and should
   always be encoded because it is used in World Wide Web and in other
   systems to delimit a URL from a fragment/anchor identifier that might
   follow it. The character "%" is unsafe because it is used for
   encodings of other characters. Other characters are unsafe because
   gateways and other transport agents are known to sometimes modify
   such characters. These characters are "{", "}", "|", "\", "^", "~",
   "[", "]", and "`".

   All unsafe characters must always be encoded within a URL. For
   example, the character "#" must be encoded within URLs even in
   systems that do not normally deal with fragment or anchor
   identifiers, so that if the URL is copied into another system that
   does use them, it will not be necessary to change the URL encoding.
</quote>

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers