International characters in an URL are not recognized as part of the URL in any text fields in Launchpad

Bug #271559 reported by Benoit St-André
This bug report is a duplicate of:  Bug #78898: URL linkification not Unicode aware. Edit Remove
4
Affects Status Importance Assigned to Milestone
Launchpad itself
New
Undecided
Unassigned

Bug Description

I recently saw that an URL containing an international character (such as "é" for example) is not recognized as part of the URL by Launchpad (at least, in the Homepage Content field).

If you look at my profile https://launchpad.net/~benoit-st-andre , you'll see that the link to my Ubuntu Wiki page doesn't work, because it lacks the last letter, which is with an "é" as my name, like this https://wiki.ubuntu.com/BenoitStAndré

By posting this bug, I'll see if this affects Launchpad entirely (if all the text fields are treated that way).

I also tried putting an url encoded URL , without success (like https://wiki.ubuntu.com/BenoitStAndr%E9 , which doesn't go to the right page in the Ubuntu Wiki).

Revision history for this message
Benoit St-André (benoit-st-andre) wrote :

Exact same behavior here, so it affects all texts fields in Launchpad.

Revision history for this message
Björn Tillenius (bjornt) wrote : Re: [Bug 271559] [NEW] International characters in an URL are not recognized as part of the URL in any text fields in Launchpad

On Thu, Sep 18, 2008 at 01:24:12AM -0000, Benoit St-André wrote:
> Public bug reported:
>
> I recently saw that an URL containing an international character (such
> as "é" for example) is not recognized as part of the URL by Launchpad
> (at least, in the Homepage Content field).
>
> If you look at my profile https://launchpad.net/~benoit-st-andre ,
> you'll see that the link to my Ubuntu Wiki page doesn't work, because it
> lacks the last letter, which is with an "é" as my name, like this
> https://wiki.ubuntu.com/BenoitStAndré
>
> By posting this bug, I'll see if this affects Launchpad entirely (if all
> the text fields are treated that way).
>
> I also tried putting an url encoded URL , without success (like
> https://wiki.ubuntu.com/BenoitStAndr%E9 , which doesn't go to the right
> page in the Ubuntu Wiki).

I think you've showed why international characters aren't allowed in
URLs. The browser needs to encode the character somehow, before sending
it to the web server. How should the browser encode 'é'? Well, it
depends on the web server.

If you encode 'é' using utf-8, you get the correct URL to your page:

    https://wiki.ubuntu.com/BenoitStAndr%C3%A9

There are other web servers expecting %E9 instead of %C3%A9, and there's
no way for Launchpad to know the correct encoding.

Revision history for this message
Ursula Junque (ursinha) wrote :

<BjornT> Ursinha: well, i though i'd leave that decision to someone else, but imo it's invalid
<BjornT> Ursinha: it would be possible to assume an ideal world, and encode non-ascii characters with utf-8

I'm assigning this to the foundations team. Francis, Is this doable? Can you think of any problems that this might cause by encoding every non-ascii with utf-8?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.