Eszett (ß) not recognised as part of URL

Bug #672704 reported by Paul Sladen
This bug report is a duplicate of:  Bug #78898: URL linkification not Unicode aware. Edit Remove
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Incomplete
Undecided
Unassigned

Bug Description

If the URL:

  http://de.wikipedia.org/wiki/ß

is pasted via Malone, then only the region up the "/wiki/" is included as part of the link.

Tags: lp-bugs
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 672704] [NEW] Eszett (ß) not recognised as part of URL

Do you mean as a url in launchpad, or as free text with launchpad url
highlights?

If the latter, it cannot be part of a url; urls are a (strict) subset
of ascii : if your browser is choosing to /guess/ that a percent
encoded fragment of a url should be encoded in utf8 - well it can do
that, but its still not a url.

Revision history for this message
Robert Collins (lifeless) wrote :

For instance, the actual url you've supplied in this bug report is
'http://de.wikipedia.org/wiki/%C3%9F'.

Gavin Panella (allenap)
Changed in malone:
status: New → Incomplete
Revision history for this message
Paul Sladen (sladen) wrote :

Yup, I pasted this URL manually. I would have to go and re-read RFC3986 to figure out exactly what a "URL producing agent" (in this case Launchpad synethising the <a href="..."></a> for the bug display) should do in the case of non-reserved characters.

Revision history for this message
Stuart Bishop (stub) wrote :

I imagine we should do what users expect and what works, rather than slavishly follow the spec - I don't think that gains us anything. We already do this for instance by not marking up trailing punctuation in a URL - technically, it could form part of the URL but in practice that is almost never the case.

We will need to work on this to sort I18N domain names too. http://www.☣.net/

Revision history for this message
Paul Sladen (sladen) wrote :

Indeed, an unclear problem domain across a wider-scope, I just noticed:

  http://commons.wikimedia.org/wiki/File:Georgetown_PowerPlant_interior_pano.jpg

from reporting another bug, where Launchpad gets it right, but the algorithm in Gnome Terminal truncates the link at the colon.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.