Comment 4 for bug 78898

Revision history for this message
Stuart Bishop (stub) wrote : Re: [Bug 78898] Re: URL linkification not Unicode aware

James Henstridge wrote:
> Sure. I am not suggesting that we ignore such links. We just need to
> work out what the regexps would need to look like to find IRIs in text.

I would suggest:

(?ux)((?:telnet:|mailto:|\w+://)[^\s]*[^\s.,()\[\]{}+_=\-\*!'"`;:?<>&|]+)

(which deliberately doesn't match ? and & at the end of strings, which will
be either harmless or what you want in almost all cases).

Do we have any other well known protocols that use just : instead of the
more common :// ? I would hesitate to use just \w+: as the protocol match as
 it would give too many false positives.

Do we care about special urls like about: and blank: ?

--
Stuart Bishop <email address hidden> http://www.canonical.com/
Canonical Ltd. http://www.ubuntu.com/