Cleaner removes all <link>s when cleaning javascript regardless of host_whitelist

Bug #715687 reported by Mohammad Taha Jahangir on 2011-02-09
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Undecided
Christine Koppelt

Bug Description

When cleaning html with lxml.html.clean.Cleaner with these options set, all <link>s will be removed regardless of host_whitelist:
links = False
page_structure = False
javascript = True

It's because of this part of code in clean.py:
        # line 311
        elif self.style or self.javascript:
            # We must get rid of included stylesheets if Javascript is not
            # allowed, as you can put Javascript in them
            for el in list(doc.iter('link')):
                if 'stylesheet' in el.get('rel', '').lower():
                    # Note this kills alternate stylesheets as well
                    el.drop_tree()

all links are removed by drop_tree, but it seems they should remove by kill_tags.add('link')

Version info:
Python : sys.version_info(major=3, minor=1, micro=2, releaselevel='final', serial=0)
lxml.etree : (2, 3, -99, 0)
libxml used : (2, 7, 7)
libxml compiled : (2, 7, 7)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)

Christine Koppelt (ch-ko123) wrote :

See pull request 115 on github (https://github.com/lxml/lxml/pull/115)

Changed in lxml:
assignee: nobody → Christine Koppelt (ch-ko123)
Changed in lxml:
status: New → In Progress
Changed in lxml:
status: In Progress → Fix Committed
scoder (scoder) wrote :

Fixed in lxml 3.2.0.

Changed in lxml:
status: Fix Committed → Fix Released
scoder (scoder) on 2013-04-28
Changed in lxml:
milestone: none → 3.2
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers