James Henstridge wrote:
> Sure. I am not suggesting that we ignore such links. We just need to
> work out what the regexps would need to look like to find IRIs in text.
(which deliberately doesn't match ? and & at the end of strings, which will
be either harmless or what you want in almost all cases).
Do we have any other well known protocols that use just : instead of the
more common :// ? I would hesitate to use just \w+: as the protocol match as
it would give too many false positives.
Do we care about special urls like about: and blank: ?
James Henstridge wrote:
> Sure. I am not suggesting that we ignore such links. We just need to
> work out what the regexps would need to look like to find IRIs in text.
I would suggest:
(?ux)(( ?:telnet: |mailto: |\w+:// )[^\s]* [^\s.,( )\[\]{} +_=\-\* !'"`;:? <>&|]+)
(which deliberately doesn't match ? and & at the end of strings, which will
be either harmless or what you want in almost all cases).
Do we have any other well known protocols that use just : instead of the
more common :// ? I would hesitate to use just \w+: as the protocol match as
it would give too many false positives.
Do we care about special urls like about: and blank: ?
-- www.canonical. com/ www.ubuntu. com/
Stuart Bishop <email address hidden> http://
Canonical Ltd. http://