Comment 1 for bug 1708138

Revision history for this message
scoder (scoder) wrote :

Interesting. According to https://en.wikipedia.org/wiki/Processing_Instruction
"""
An SGML processing instruction is enclosed within <? and >.

An XML processing instruction is enclosed within <? and ?>, and contains a target and optionally some content, which is the node value, that cannot contain the sequence ?>.
"""

Since HTML is based on SGML and not XML, this means that the parser is actually correct, but the display/repr isn't.

BTW, note that this:
p = doc.getroot().getchildren()[0].getchildren()[0].getchildren()[0]
is substantially less readable/efficient than just
p = doc.getroot()[0][0][0]