urls found in stylesheets have extra quotes

Bug #530756 reported by Jean-Paul Calderone
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Medium
scoder

Bug Description

If a css url is quoted, the quotes confuse the link extraction logic:

>>> from lxml.html import fromstring
>>> fromstring("""
... <style>background: url('/foo/bar.png');</style>
... """)
<Element html at b7687bfc>
>>> tree = _
>>> list(tree.iterlinks())
[(<Element style at b7143ecc>, None, "'/foo/bar.png'", 16)]
>>> tree.make_links_absolute('http://example.com/')
>>> list(tree.iterlinks())
[(<Element style at b7508aac>, None, "http://example.com/'/foo/bar.png'", 16)]

Revision history for this message
scoder (scoder) wrote :

Fixed in current trunk:

>>> import lxml.etree as et
>>> print et.__version__, et.LIBXML_VERSION
2.3.dev (2, 7, 6)

>>> from lxml.html import fromstring
>>> tree = fromstring("""
... <style>background: url('/foo/bar.png');</style>
... """)
>>> list(tree.iterlinks())
[(<Element style at 2526d50>, None, '/foo/bar.png', 17)]
>>> tree.make_links_absolute('http://example.com/')
>>> list(tree.iterlinks())
[(<Element style at 2526d50>, None, 'http://example.com/foo/bar.png', 17)]

Changed in lxml:
assignee: nobody → Stefan Behnel (scoder)
importance: Undecided → Medium
milestone: none → 2.3
status: New → Fix Committed
Revision history for this message
scoder (scoder) wrote :

Fixed in lxml 2.3alpha1.

Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.