sanitizer mostly broken when used with html5lib

Bug #292401 reported by Håkan W
2
Affects Status Importance Assigned to Milestone
Python HTML Sanitizer
Confirmed
High
dan mackinlay

Bug Description

Almost all doctests are failing when used with html5lib. See attached fail log for detailed info

The errors seem to be mostly these two:

1) Adding unecessary <p> elements

Expected:
    u'<p>A B C</p>'
Got:
    u'<p>A </p>B C<p></p>'

2) Outputting complete documents instead of the html fragment

Expected:
    u'<p>A </p><p>B C</p><p>D</p>'
Got:
    u'<p><html><head></head><body>A <p>B C</p>D</body></html></p>'

Revision history for this message
Håkan W (hwaara-gmail-deactivatedaccount) wrote :
Revision history for this message
dan mackinlay (dan-possumpalace) wrote :

*blush* - what a disaster. I don't know how I committed a version that broke 14 doctests. I can only suspect I pressed undo on that rather critical change without noticing before commit. Then I went on my annual holiday. Much shame on my part.

the fixed version (which still fails 4 doctests due to different handling of the body tag) is now pushed into the repository, but I'm leaving this one open until those are also resolved.

Changed in python-html-sanitizer:
assignee: nobody → dan-possumpalace
importance: Undecided → High
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.