inserting a parent into it's child causes lxml to hang
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Fix Released
|
Medium
|
scoder |
Bug Description
Hi,
I was tracking down a bug in a larger Python project, and have isolated it to lxml. The bug originally occurred due to malformed HTML, but also occurs if I fix the HTML. I've created a simplified test script, with just the relevant HTML. When the line of code "child.insert(0, parent)" is run, the script hangs, and one CPU is pinned at 100%. This is probably caused by an infinite loop. Here's the test script:
#!/usr/bin/env python3
from lxml import etree, html, cssselect
import sys
print("%-20s: %s" % ('Python', sys.version_info))
print("%-20s: %s" % ('lxml.etree', etree.LXML_
print("%-20s: %s" % ('libxml used', etree.LIBXML_
print("%-20s: %s" % ('libxml compiled', etree.LIBXML_
print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_
print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_
PARSER = etree.HTMLParse
select_parent = cssselect.
select_child = cssselect.
doc = html.fromstring("""
<div id="parent">
<div id="child">
<div></div>
</div>
</div>
""", parser=PARSER)
print(doc)
parent = select_
print(parent)
child = select_
print(child)
# THIS LINE HANGS
child.insert(0, parent)
print("DONE!")
output:
Python : sys.version_
lxml.etree : (4, 9, 3, 0)
libxml used : (2, 10, 3)
libxml compiled : (2, 10, 3)
libxslt used : (1, 1, 38)
libxslt compiled : (1, 1, 38)
<Element div at 0x7ff24668f500>
<Element div at 0x7ff24668f500>
<Element div at 0x7ff24668f4c0>
If the inner <div></div> is removed, lxml throws "ValueError: cannot append parent to itself"
Cheers,
Nick
summary: |
- malformed HTML causes lxml to hang + inserting a parent into it's child causes lxml to hang |
description: | updated |
Changed in lxml: | |
status: | Fix Committed → Fix Released |
Thanks for the report, fixed in https:/ /github. com/lxml/ lxml/commit/ 2343fc99c48858b 4dffc67d3a52091 053df6ce04