Beautiful Soup

lxml incorrect parsing

Bug #1925723 reported by Harnek on 2021-04-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Beautiful Soup	Invalid	Undecided	Unassigned

Bug Description

Python version: 3.8.9
bs4 version: 4.9.3
OS: Windows

smallest example:
<col>Text</col>

gives:
<col/>Text

expected:
<col>Text</col>

Tags:

Revision history for this message

Isaac Muse (facelessuser) wrote on 2021-04-23:

This is not incorrect parsing but expected. You are using the col tag incorrectly. None of the parsers will parse the HTML as you expect. As a matter of fact, if you use the lxml library directly, you also won't get what you expect:

>>> Code:

from lxml import etree
from io import StringIO, BytesIO

HTML = '<col>text</col>'

parser = etree.HTMLParser()
tree = etree.parse(StringIO(HTML), parser)
result = etree.tostring(tree.getroot(), pretty_print=True, method="html")
print(result)

>>> Result:

b'<html><body>\n\n<col>text</body></html>\n'

Revision history for this message

Harnek (harnek) wrote on 2021-04-23:

thanks for reply.
you are correct.
you can close this topic.

Leonard Richardson (leonardr) on 2021-04-23

Changed in beautifulsoup:
status:	New → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.