Comment 2 for bug 271559

Revision history for this message
Björn Tillenius (bjornt) wrote : Re: [Bug 271559] [NEW] International characters in an URL are not recognized as part of the URL in any text fields in Launchpad

On Thu, Sep 18, 2008 at 01:24:12AM -0000, Benoit St-André wrote:
> Public bug reported:
>
> I recently saw that an URL containing an international character (such
> as "é" for example) is not recognized as part of the URL by Launchpad
> (at least, in the Homepage Content field).
>
> If you look at my profile https://launchpad.net/~benoit-st-andre ,
> you'll see that the link to my Ubuntu Wiki page doesn't work, because it
> lacks the last letter, which is with an "é" as my name, like this
> https://wiki.ubuntu.com/BenoitStAndré
>
> By posting this bug, I'll see if this affects Launchpad entirely (if all
> the text fields are treated that way).
>
> I also tried putting an url encoded URL , without success (like
> https://wiki.ubuntu.com/BenoitStAndr%E9 , which doesn't go to the right
> page in the Ubuntu Wiki).

I think you've showed why international characters aren't allowed in
URLs. The browser needs to encode the character somehow, before sending
it to the web server. How should the browser encode 'é'? Well, it
depends on the web server.

If you encode 'é' using utf-8, you get the correct URL to your page:

    https://wiki.ubuntu.com/BenoitStAndr%C3%A9

There are other web servers expecting %E9 instead of %C3%A9, and there's
no way for Launchpad to know the correct encoding.