URL match does not include some valid characters at the end of the URL

Bug #1559635 reported by .eepp
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Terminator
Triaged
Low
Unassigned

Bug Description

I don't think this is an actual _bug_, but I see from the code that the URL match ends with a word boundary, and explicitly with a character that's not part of [\]'.}>) \t\r\n,\"], whereas some of those characters are legal in an URL.

My use case is receiving an URL in my IRC client, which runs in Terminator, that contains one of those permitted characters at the end: if I Ctrl+Click the link, it opens the URL but the last character is missing. This is especially true with Wikipedia links like this: https://en.wikipedia.org/wiki/The_Offspring_(album) (closing parenthesis is not part of the clickable link).

I don't know what to suggest here. I understand that by allowing those legal characters at the end of an URL, it would bother more people than make more people happy. For example, often in man pages and other documents, there's a sentence like "For more information, see http://something.hi/bla-bla." In this case, the ending period would be part of the URL, but it's the sentence's period, although such URLs should always be prefixed with < and suffixed with > (or double-quoted), which are illegal characters in URLs.

Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

Just FYI:

gnome-terminal just recently got a complete rewrite of its regexes:
https://bugzilla.gnome.org/show_bug.cgi?id=756038

The_Offspring_(album) is still not handled correctly though:
https://bugzilla.gnome.org/show_bug.cgi?id=763980

It won't be easy...

Revision history for this message
.eepp (eeppeliteloop) wrote :

There's no easy fix for that, because people do not systematically enclose the URLs in <> or "" or spaces.

There's probably a way to be smarter than now, but it will never solve 100% of use cases.

The question is always: is the character following the last matched character also part of the URL or not? I could imagine another key binding for that, for example Ctrl+Click opens the URL as matched, and Ctrl+Shift+Click opens the URL plus the following character. But I'm not sure that's possible considering Terminator is using libvte... I'm guessing there's a list of matches somewhere and maybe you can't know their positions in the output buffer.

Revision history for this message
Stephen Boddy (stephen-j-boddy) wrote :

There are a number of corner cases that become very hard to handle. The period '.' and the closing parentheses ')' are hard to deal with. Is it the last character of the URL, or a terminator? (no pun intended :-) I can't think of a valid way to determine for a period. I would say that it is very rare for a period to be the last char of a URL, and much more common for it to be the end of sentence.

For the parentheses I think it may be possible to do a look-behind test. i.e. if the entire URL begins with an opening parentheses, then it makes sense the last closing one is not part of the URL. If the last part of the URL (typically a filename) contains an opening parentheses it makes sense that the last one is closing it. I'm not a genius when it comes to regex's, so I think someone with deep skills would need to come up with a viable parser for that.

I really don't like the idea of multiple bindings to select between treatments. Off the top of my head it would also be a evil thing to implement.

Changed in terminator:
importance: Undecided → Low
status: New → Triaged
Revision history for this message
Egmont Koblinger (egmont-gmail) wrote :

Please see the update at the aforementioned second gnome-terminal bug. :)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.