An XML processing instruction is enclosed within <? and ?>, and contains a target and optionally some content, which is the node value, that cannot contain the sequence ?>.
"""
Since HTML is based on SGML and not XML, this means that the parser is actually correct, but the display/repr isn't.
BTW, note that this:
p = doc.getroot().getchildren()[0].getchildren()[0].getchildren()[0]
is substantially less readable/efficient than just
p = doc.getroot()[0][0][0]
Interesting. According to https:/ /en.wikipedia. org/wiki/ Processing_ Instruction
"""
An SGML processing instruction is enclosed within <? and >.
An XML processing instruction is enclosed within <? and ?>, and contains a target and optionally some content, which is the node value, that cannot contain the sequence ?>.
"""
Since HTML is based on SGML and not XML, this means that the parser is actually correct, but the display/repr isn't.
BTW, note that this: ).getchildren( )[0].getchildre n()[0]. getchildren( )[0] )[0][0] [0]
p = doc.getroot(
is substantially less readable/efficient than just
p = doc.getroot(