Lernid

URL's highlighted and opened include trailing punctuation

Bug #810122 reported by John S. Gruber on 2011-07-13

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Lernid	In Progress	Low	Mohammad AbuShady

Bug Description

For example with "http://upstart.ubuntu.com, " the trailing comma is considered part of the URL and shouldn't be. "Unable to load page" is produced in the browser. Removing the trailing comma produces a good URL.

Related branches

lp:~coalwater/lernid/fix-810122

Ready for review for merging into lp:lernid

John S. Gruber: Needs Fixing on 2012-04-19

Mohammad AbuShady (community): Needs Resubmitting (not sure :d) on 2012-04-18

Lernid Development Team: Pending requested 2012-04-18

John S. Gruber (jsjgruber) on 2011-07-13

Changed in lernid:
status:	New → Confirmed
importance:	Undecided → Low

Revision history for this message

Mohammad AbuShady (coalwater) wrote on 2011-07-13:

#1

Any hints of which files I should look into ? Or where the error appears? I still didn't play with lernid much so i don't know.

Revision history for this message

John S. Gruber (jsjgruber) wrote on 2011-07-13:

#2

You might want to start with lernid/widgets/Ircwidget.py line 126 where a regular expression is used to spot URL's in text that goes by. I assume ".", ",", and ")", etc. would produce similar problems.

If you decide to work on this bug report you should probably mark in "In Progress" with your self as "Assigned to".

It might be a good idea if you send me an e-mail, too before working on it. I'd love to help you fix this.

Revision history for this message

Mohammad AbuShady (coalwater) wrote on 2011-07-13:

#3

I'm having problems running lernid from source, first if i run it without sudo i get segmentation error, if i do it fails saying it cant access property 'connect' from type None.
Any thing i should know?

Mohammad AbuShady (coalwater) on 2011-07-19

Changed in lernid:
status:	Confirmed → In Progress
assignee:	nobody → Coalwater (coalwater)

Revision history for this message

Mohammad AbuShady (coalwater) wrote on 2011-07-19:

#4

The problem wasn't in the highlighting function, it was in the regex that detects the url, i changed the algorithm, the old one was looking for a space to stop seeking, i changed it to by asking it to seek till it finds an invalid character, and so far i'm just using commas, but before i propose a merge i would like to add all invalid characters to that check, so if any body has any invalid characters i should put in the regex check please tell me
The regex i changed:
http://bazaar.launchpad.net/~coalwater/lernid/fix-810122/revision/228#lernid/widgets/IrcWidget.py

John S. Gruber (jsjgruber) on 2011-07-19

summary:

- URL's highlighted and opened include trailing comma
+ URL's highlighted and opened include trailing punctuation

Revision history for this message

John S. Gruber (jsjgruber) wrote on 2011-07-19:

#5

Coalwater:
Before going too far, it may be wise to read the appropriate URI and URL RFC's. It appears to me that there is no fool-proof way to do this for all characters the instructor might be adding as punctuation rather than part of the URL, but I don't know that requiring URL's to be encapsulated in < > will work, it's hard to type.

Since whatever we do will have to be approximate, I wonder if looking for whitespace, and then backing out what seems to be punctuation from the end of the resulting URL would work. While an embedded "." is essential (as in http://ubuntu.com) a trailing one is probably punctuation. We will be risking that trailing "." is really part of the URL and not intended as punctuation, but it's much more probable that a trailing "." will be punctuation. A trailing close parenthesis ")" or "]" should also probably be eliminated, too, as should combinations, such as ")." "]." ".)" ".]" ")," ]," .

I'm interested in the thoughts of all others on this.

Revision history for this message

Mohammad AbuShady (coalwater) wrote on 2011-07-19:

#6

What about trying to use an already made regex, like i found this link by looking around a bit
http://alanstorm.com/url_regex_explained

Revision history for this message

Mohammad AbuShady (coalwater) wrote on 2011-07-19:

#7

funny thing it had a url to the owner of the regex, and it has your name.. http://daringfireball.net/2010/07/improved_regex_for_matching_urls

Revision history for this message

Elizabeth K. Joseph (lyz) wrote on 2011-07-19:

#8

I agree with John, since commas actually are valid I agree that it should only look for a trailing one with a space afterwords (so ", " etc).

Even this isn't fool-proof, but I think the likelihood of having a valid URL ending with a special character seems unlikely.

Revision history for this message

John S. Gruber (jsjgruber) wrote on 2011-07-21:

#9

@Coalwater,

Believe me, I'm no 'daring fireball' (on either count).

I see that the URL is located for highlighting in the place you are looking, and it is also located for automatically changing the browser page in Browser.py . Also note that the browser will do the appropriate thing if the instructor posts a link of the form http://www.google.com. (note the trailing "."). "http://www.google.com," doesn't work. So there are actually two things that need to work, tagging the text for color and to make URL's clickable, and the automatic browser changing in Browser.py (which already handles trailing periods).

Revision history for this message

Mohammad AbuShady (coalwater) wrote on 2011-08-07:

#10

http://bazaar.launchpad.net/~coalwater/lernid/fix-810122/revision/229#lernid/widgets/IrcWidget.py
so this version only ignores trailing commas and dots, cause from the conversation i understood that this would be sufficient right?
and i didn't change the browser widget because i thought it's unnecessary because it ignores only trailing dots and it works fine with links that are passed without the dot too, but if someone thinks i should i could do it.

Revision history for this message

John S. Gruber (jsjgruber) wrote on 2011-08-07:

#11

I think it would be best to ignore trailing punctuation from both Ircwidget.py and Browser.py and I still like the approach I mentioned in https://bugs.launchpad.net/lernid/+bug/810122/comments/5. IIRC, there is some code that now removes trailing periods that could be used as a model for what could be done to remove commas, too.

Revision history for this message

John S. Gruber (jsjgruber) wrote on 2011-09-04:

#12

coalwater, do you need any help on this one?

Revision history for this message

John S. Gruber (jsjgruber) wrote on 2011-09-05:

#13

Another example just happened:
<http://myapps.developer.ubuntu.com>.

The period was omitted but the matching > was kept, meaning it went to theh wrong place.

Revision history for this message

Mohammad AbuShady (coalwater) wrote on 2012-01-24:

#14

How's this one ?
http://www.rubular.com/r/6snbljICDQ
note that the one not beginning with 'http' doesn't work, test it with different cases, and if it's good I'll push it

Revision history for this message

John S. Gruber (jsjgruber) wrote on 2012-03-29:

#15

coalwater, I tried about everything I could think of and it worked. Please propose it if you are ready. I've been importing your other branches.

Thanks.

Revision history for this message

Mohammad AbuShady (coalwater) wrote on 2012-04-01:

#16

It's pushed, but i have a question first, currently it doesn't recognize those patterns ( example.com | www.example.com ) should i at least add support for the www one ?

Revision history for this message

John S. Gruber (jsjgruber) wrote on 2012-04-01: Re: [Bug 810122] Re: URL's highlighted and opened include trailing punctuation

#17

I don't think we should trigger without a complete url. Instructors should
be able to mention a site without starting a load, IMHO.
On Apr 1, 2012 2:01 PM, "Mohammad AbuShady" <email address hidden> wrote:

> It's pushed, but i have a question first, currently it doesn't recognize
> those patterns ( example.com | www.example.com ) should i at least add
> support for the www one ?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/810122
>
> Title:
> URL's highlighted and opened include trailing punctuation
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/lernid/+bug/810122/+subscriptions
>

Revision history for this message

Mohammad AbuShady (coalwater) wrote on 2012-04-01:

#18

I've done the merge proposal, you could test the branch, I've tested it with the same cases that were in the rubular test they all worked.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.