Cleaner removes all <link>s when cleaning javascript regardless of host_whitelist
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Fix Released
|
Undecided
|
Christine Koppelt |
Bug Description
When cleaning html with lxml.html.
links = False
page_structure = False
javascript = True
It's because of this part of code in clean.py:
# line 311
elif self.style or self.javascript:
# We must get rid of included stylesheets if Javascript is not
# allowed, as you can put Javascript in them
for el in list(doc.
if 'stylesheet' in el.get('rel', '').lower():
all links are removed by drop_tree, but it seems they should remove by kill_tags.
Version info:
Python : sys.version_
lxml.etree : (2, 3, -99, 0)
libxml used : (2, 7, 7)
libxml compiled : (2, 7, 7)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)
Changed in lxml: | |
status: | New → In Progress |
Changed in lxml: | |
status: | In Progress → Fix Committed |
Changed in lxml: | |
milestone: | none → 3.2 |
See pull request 115 on github (https:/ /github. com/lxml/ lxml/pull/ 115)