Comment 1 for bug 1925723

Revision history for this message
Isaac Muse (facelessuser) wrote :

This is not incorrect parsing but expected. You are using the col tag incorrectly. None of the parsers will parse the HTML as you expect. As a matter of fact, if you use the lxml library directly, you also won't get what you expect:

>>> Code:

from lxml import etree
from io import StringIO, BytesIO

HTML = '<p><col>text</col></p>'

parser = etree.HTMLParser()
tree = etree.parse(StringIO(HTML), parser)
result = etree.tostring(tree.getroot(), pretty_print=True, method="html")
print(result)

>>> Result:

b'<html><body>\n<p></p>\n<col>text</body></html>\n'