Beautiful Soup

Overview
Code
Bugs
Blueprints
Translations
Answers

Bug #1925723
Comment #1

Comment 1 for bug 1925723

Revision history for this message

Isaac Muse (facelessuser) wrote on 2021-04-23:

This is not incorrect parsing but expected. You are using the col tag incorrectly. None of the parsers will parse the HTML as you expect. As a matter of fact, if you use the lxml library directly, you also won't get what you expect:

>>> Code:

from lxml import etree
from io import StringIO, BytesIO

HTML = '<p><col>text</col></p>'

parser = etree.HTMLParser()
tree = etree.parse(StringIO(HTML), parser)
result = etree.tostring(tree.getroot(), pretty_print=True, method="html")
print(result)

>>> Result:

b'<html><body>\n<p></p>\n<col>text</body></html>\n'