segfault when appending parsed xml with resolve_entities=False to another element

Bug #1814522 reported by Andreas Lutro
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fix Released

Bug Description

A library I'm working with is appending some parsed XML to an arbitrary element. Usually this works fine, but when the parser has resolve_entities=False, appending the parsed XML causes a segmentation fault. This only seems to happen when the parsed XML has an entity which refers to another entity, but there is no recursion going on in the XML itself.

I was not able to figure out if this was a libxml2 or lxml issue. I tested this with multiple versions of Python and multiple Linux distributions.

#!/usr/bin/env python
import sys
from lxml import etree
print("%-20s: %s" % ('Python', sys.version_info))
print("%-20s: %s" % ('lxml.etree', etree.LXML_VERSION))
print("%-20s: %s" % ('libxml used', etree.LIBXML_VERSION))
print("%-20s: %s" % ('libxml compiled', etree.LIBXML_COMPILED_VERSION))
print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_VERSION))
print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION))

parser = etree.XMLParser(resolve_entities=False)
broken = etree.XML('''<!DOCTYPE data [
<!ENTITY a "a">
<!ENTITY b "&a;">
''', parser)

el = etree.Element('test')
el.append(broken) # this is what crashes
print('everything is okay!')

$ python
Python : sys.version_info(major=3, minor=5, micro=3, releaselevel='final', serial=0)
lxml.etree : (4, 3, 0, 0)
libxml used : (2, 9, 9)
libxml compiled : (2, 9, 9)
libxslt used : (1, 1, 33)
libxslt compiled : (1, 1, 33)
Segmentation fault

Revision history for this message
scoder (scoder) wrote :

Thanks for the report and the excellent reproducer. It crashes due to infinite recursion in libxml2.
Luckily, it's easy to replace the call into libxml2 with custom code here – it's both more correct and faster.
Fix is here:

Changed in lxml:
assignee: nobody → scoder (scoder)
importance: Undecided → High
status: New → Fix Committed
milestone: none → 4.3.1
Revision history for this message
Andreas Lutro (anlutro) wrote :

Thank you! Any clue on when 4.3.1 will be released on pypi?

Revision history for this message
scoder (scoder) wrote :

Clue? No. Work? Yes.

Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers