Hyperlink detection incorrectly matches trailing characters

Bug #1702274 reported by Torbjörn Lönnemark
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Sakura
Fix Released
Medium
Unassigned

Bug Description

Currently, enclosed URLs, like the following examples:

* «http://example.com/»
* <http://example.com/>
* (http://example.com/)
* "http://example.com/"

will match the trailing enclosing character, causing the Open link action to open an incorrect URL.

sakura will also match punctuation (e.g, '.', ',') at the end of URLs, which commonly occurs in written sentences. Matching these is almost never what you would want nor expect.

gnome-terminal handles both of these classes correctly.

For reference: https://bugzilla.gnome.org/show_bug.cgi?id=756038.

David Gómez (dabisu)
Changed in sakura:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Torbjörn Lönnemark (tobbez) wrote :

It would be possible to achieve this by copying gnome-terminal's src/terminal-regex.h, including that in sakura.c and replacing the current HTTP_REGEXP with REGEX_URL_AS_IS.

David Gómez (dabisu)
Changed in sakura:
status: Confirmed → Fix Committed
David Gómez (dabisu)
Changed in sakura:
status: Fix Committed → Fix Released
Revision history for this message
Markus Kurtz (mgkurtz) wrote :

This bug seems not to be fixed. The http regex still matches the URLs given in the bug description and the mail regex matches '<<email address hidden>'. Simple regexes that should work in most cases would be

#define HTTP_REGEXP "(ftp|http)s?://[^ \t\n\b]+[^.,!? \t\n\b()<>{}«»„“”‚‘’\\[\\]\'\"]"
#define MAIL_REGEXP "[^ \t\n\b()<>{}«»„“”‚‘’\\[\\]\'\"][^ \t\n\b]*@([^ \t\n\b()<>{}«»„“”‚‘’\\[\\]\'\"]+\\.)+([a-zA-Z]{2,})"

There are more complex regexes around for this job, like the different versions at
https://daringfireball.net/2010/07/improved_regex_for_matching_urls

Revision history for this message
David Gómez (dabisu) wrote :

Thanks Markus, I've updated the regular expressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.