segfault when appending parsed xml with resolve_entities=False to another element
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Fix Released
|
High
|
scoder |
Bug Description
A library I'm working with is appending some parsed XML to an arbitrary element. Usually this works fine, but when the parser has resolve_
I was not able to figure out if this was a libxml2 or lxml issue. I tested this with multiple versions of Python and multiple Linux distributions.
#!/usr/bin/env python
import sys
from lxml import etree
print("%-20s: %s" % ('Python', sys.version_info))
print("%-20s: %s" % ('lxml.etree', etree.LXML_
print("%-20s: %s" % ('libxml used', etree.LIBXML_
print("%-20s: %s" % ('libxml compiled', etree.LIBXML_
print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_
print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_
parser = etree.XMLParser
broken = etree.XML(
<!ENTITY a "a">
<!ENTITY b "&a;">
]>
<data>&b;</data>
''', parser)
el = etree.Element(
el.append(broken) # this is what crashes
print('everything is okay!')
$ python lxmltest.py
Python : sys.version_
lxml.etree : (4, 3, 0, 0)
libxml used : (2, 9, 9)
libxml compiled : (2, 9, 9)
libxslt used : (1, 1, 33)
libxslt compiled : (1, 1, 33)
Segmentation fault
Thanks for the report and the excellent reproducer. It crashes due to infinite recursion in libxml2. /github. com/lxml/ lxml/commit/ 201b712edf0478e 6a94ace984c1e84 35bf3bc3c3
Luckily, it's easy to replace the call into libxml2 with custom code here – it's both more correct and faster.
Fix is here:
https:/