Cleaning html file cleans it wrong

Bug #671636 reported by Ravi on 2010-11-05
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Undecided
Unassigned

Bug Description

I am using the default Cleaner i.e. lxml.html.clean.Cleaner() . I am cleaning an html file which is the home page of FSF. It is cleaning the file wrong way. It removes some of the page structure and then it removes the start of the style tag and the end of the style tag but the contents in between persist.

Attached is the tar of the original file and the cleaned version.

Ravi (ra-ravi-rav-gmail) wrote :
Ravi (ra-ravi-rav-gmail) wrote :

Similar happens with rest of the style tags too, the content between the start tag and the end tag is not removed.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers