I was tracking down a bug in a larger Python project, and have isolated it to lxml. The bug occurs with malformed HTML. I've created a simplified test script, with just the relevant malformed HTML. When the line of code "child.insert(0, parent)" is run, the script hangs, and one CPU is pinned at 100%. This is probably caused by an infinite loop. Here's the test script:
#!/usr/bin/env python3
from lxml import etree, html, cssselect
Hi,
I was tracking down a bug in a larger Python project, and have isolated it to lxml. The bug occurs with malformed HTML. I've created a simplified test script, with just the relevant malformed HTML. When the line of code "child.insert(0, parent)" is run, the script hangs, and one CPU is pinned at 100%. This is probably caused by an infinite loop. Here's the test script:
#!/usr/bin/env python3
from lxml import etree, html, cssselect
import sys
print("%-20s: %s" % ('Python', sys.version_info)) VERSION) ) VERSION) ) COMPILED_ VERSION) ) VERSION) ) COMPILED_ VERSION) )
print("%-20s: %s" % ('lxml.etree', etree.LXML_
print("%-20s: %s" % ('libxml used', etree.LIBXML_
print("%-20s: %s" % ('libxml compiled', etree.LIBXML_
print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_
print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_
PARSER = etree.HTMLParse r(recover= True) CSSSelector( "#parent" ) CSSSelector( '#child' )
select_parent = cssselect.
select_child = cssselect.
doc = html.fromstring(""" parent( doc)[0] child(doc) [0]
<div id="parent">
<div id="child">
<div></div>
""", parser=PARSER)
print(doc)
parent = select_
print(parent)
child = select_
print(child)
# THIS LINE HANGS
child.insert(0, parent)
print("DONE!")
output:
Python : sys.version_ info(major= 3, minor=8, micro=5, releaselevel= 'final' , serial=0)
lxml.etree : (4, 9, 3, 0)
libxml used : (2, 10, 3)
libxml compiled : (2, 10, 3)
libxslt used : (1, 1, 38)
libxslt compiled : (1, 1, 38)
<Element div at 0x7ff24668f500>
<Element div at 0x7ff24668f500>
<Element div at 0x7ff24668f4c0>
Cheers,
Nick