segfault when appending parsed xml with resolve_entities=False to another element

Bug #1814522 reported by Andreas Lutro
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Fix Released
High
scoder

Bug Description

A library I'm working with is appending some parsed XML to an arbitrary element. Usually this works fine, but when the parser has resolve_entities=False, appending the parsed XML causes a segmentation fault. This only seems to happen when the parsed XML has an entity which refers to another entity, but there is no recursion going on in the XML itself.

I was not able to figure out if this was a libxml2 or lxml issue. I tested this with multiple versions of Python and multiple Linux distributions.

#!/usr/bin/env python
import sys
from lxml import etree
print("%-20s: %s" % ('Python', sys.version_info))
print("%-20s: %s" % ('lxml.etree', etree.LXML_VERSION))
print("%-20s: %s" % ('libxml used', etree.LIBXML_VERSION))
print("%-20s: %s" % ('libxml compiled', etree.LIBXML_COMPILED_VERSION))
print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_VERSION))
print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION))

parser = etree.XMLParser(resolve_entities=False)
broken = etree.XML('''<!DOCTYPE data [
<!ENTITY a "a">
<!ENTITY b "&a;">
]>
<data>&b;</data>
''', parser)

el = etree.Element('test')
el.append(broken) # this is what crashes
print('everything is okay!')

$ python lxmltest.py
Python : sys.version_info(major=3, minor=5, micro=3, releaselevel='final', serial=0)
lxml.etree : (4, 3, 0, 0)
libxml used : (2, 9, 9)
libxml compiled : (2, 9, 9)
libxslt used : (1, 1, 33)
libxslt compiled : (1, 1, 33)
Segmentation fault

Revision history for this message
scoder (scoder) wrote :

Thanks for the report and the excellent reproducer. It crashes due to infinite recursion in libxml2.
Luckily, it's easy to replace the call into libxml2 with custom code here – it's both more correct and faster.
Fix is here:
https://github.com/lxml/lxml/commit/201b712edf0478e6a94ace984c1e8435bf3bc3c3

Changed in lxml:
assignee: nobody → scoder (scoder)
importance: Undecided → High
status: New → Fix Committed
milestone: none → 4.3.1
Revision history for this message
Andreas Lutro (anlutro) wrote :

Thank you! Any clue on when 4.3.1 will be released on pypi?

Revision history for this message
scoder (scoder) wrote :

Clue? No. Work? Yes.

Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.