inserting a parent into it's child causes lxml to hang

Bug #2046398 reported by Nick Young
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Medium
scoder

Bug Description

Hi,

I was tracking down a bug in a larger Python project, and have isolated it to lxml. The bug originally occurred due to malformed HTML, but also occurs if I fix the HTML. I've created a simplified test script, with just the relevant HTML. When the line of code "child.insert(0, parent)" is run, the script hangs, and one CPU is pinned at 100%. This is probably caused by an infinite loop. Here's the test script:

#!/usr/bin/env python3
from lxml import etree, html, cssselect

import sys

print("%-20s: %s" % ('Python', sys.version_info))
print("%-20s: %s" % ('lxml.etree', etree.LXML_VERSION))
print("%-20s: %s" % ('libxml used', etree.LIBXML_VERSION))
print("%-20s: %s" % ('libxml compiled', etree.LIBXML_COMPILED_VERSION))
print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_VERSION))
print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION))

PARSER = etree.HTMLParser(recover=True)
select_parent = cssselect.CSSSelector("#parent")
select_child = cssselect.CSSSelector('#child')

doc = html.fromstring("""
<div id="parent">
    <div id="child">
        <div></div>
    </div>
</div>
""", parser=PARSER)
print(doc)
parent = select_parent(doc)[0]
print(parent)
child = select_child(doc)[0]
print(child)
# THIS LINE HANGS
child.insert(0, parent)
print("DONE!")

output:

Python : sys.version_info(major=3, minor=8, micro=5, releaselevel='final', serial=0)
lxml.etree : (4, 9, 3, 0)
libxml used : (2, 10, 3)
libxml compiled : (2, 10, 3)
libxslt used : (1, 1, 38)
libxslt compiled : (1, 1, 38)
<Element div at 0x7ff24668f500>
<Element div at 0x7ff24668f500>
<Element div at 0x7ff24668f4c0>

If the inner <div></div> is removed, lxml throws "ValueError: cannot append parent to itself"

Cheers,
Nick

Nick Young (nyou045)
summary: - malformed HTML causes lxml to hang
+ inserting a parent into it's child causes lxml to hang
description: updated
Revision history for this message
scoder (scoder) wrote :
Changed in lxml:
assignee: nobody → scoder (scoder)
importance: Undecided → Medium
status: New → Fix Committed
milestone: none → 4.9.4
scoder (scoder)
Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.