urls found in stylesheets have extra quotes
Bug #530756 reported by
Jean-Paul Calderone
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Fix Released
|
Medium
|
scoder |
Bug Description
If a css url is quoted, the quotes confuse the link extraction logic:
>>> from lxml.html import fromstring
>>> fromstring("""
... <style>background: url('/foo/
... """)
<Element html at b7687bfc>
>>> tree = _
>>> list(tree.
[(<Element style at b7143ecc>, None, "'/foo/bar.png'", 16)]
>>> tree.make_
>>> list(tree.
[(<Element style at b7508aac>, None, "http://
To post a comment you must log in.
Fixed in current trunk:
>>> import lxml.etree as et
>>> print et.__version__, et.LIBXML_VERSION
2.3.dev (2, 7, 6)
>>> from lxml.html import fromstring bar.png' );</style> iterlinks( )) links_absolute( 'http:// example. com/') iterlinks( )) example. com/foo/ bar.png', 17)]
>>> tree = fromstring("""
... <style>background: url('/foo/
... """)
>>> list(tree.
[(<Element style at 2526d50>, None, '/foo/bar.png', 17)]
>>> tree.make_
>>> list(tree.
[(<Element style at 2526d50>, None, 'http://