&amp entity fix

Bug #482046 reported by CoD
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Dosage
Fix Committed
High
Tristan Seligmann

Bug Description

According to all *HTML standards the & character in URLs should be replaced with its corresponding entity &

Dosage can't retrieve these URLs so I did a little change to fetchManyMatches (in util.py) to make it work. With this bugfix dosage supports all HTML compliant websites.

Just change line 23 in util.py from:

page = urlopen(url)

to:

page = urlopen(url.replace('&','&'))

Related branches

Revision history for this message
Seth Bodine (thepilgrim) wrote :

<empty comment>

Revision history for this message
Tristan Seligmann (mithrandi) wrote :

I've implemented a version of this fix as a temporary workaround, but we really need to look at dealing with HTML entities properly in some fashion.

Changed in dosage:
assignee: CoD (cod-fsfe) → nobody
importance: Unknown → High
status: New → Confirmed
Revision history for this message
CoD (cod-italy) wrote : Re: [Bug 482046] Re: &amp entity fix

domenica 3 gennaio 2010 ore 17:50 Tristan Seligmann ha scritto:
> I've implemented a version of this fix as a temporary workaround, but
> we really need to look at dealing with HTML entities properly in some
> fashion.

Uhm... I did it too when I created that bug.

Did you check my patch?

claudio

Revision history for this message
Tristan Seligmann (mithrandi) wrote :

Sorry, I must have missed something; which patch are you referring to? The only patch attached to this bug report is the trivial one that only handles &amp;

Revision history for this message
CoD (cod-italy) wrote :

Oh yeah.. trivial... &amp;... must be that one :-)
It was my first contribution so I didn't go deeper in the code.

The entities we should deal with, imho, are listed in "Reserved
Characters" section of this page:
http://www.w3schools.com/tags/ref_entities.asp

Maybe we could create an external file listing all the replacing rules to
be applied before any other parsing. In this way we could have all the
modules working regardless of entities and we could add more rules later
if we need to.

Revision history for this message
Jonathan Jacobs (jjacobs) wrote :

A more complete method has already been committed to the "bunch-of-comics" branch.

Revision history for this message
CoD (cod-italy) wrote :

Ok... sorry for being an annoyance ;-)

Changed in dosage:
milestone: none → 1.7.0
status: Confirmed → Fix Committed
assignee: nobody → Tristan Seligmann (mithrandi)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.