segfault when parsing docbook XML with several external entities

Bug #502959 reported by Matthias Klose
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
libxml2
Fix Released
Critical
lxml
High
Unassigned
lxml (Debian)
Fix Released
Unknown

Bug Description

seen with lxml 2.2.2 and 2.2.4

Steps to reproduce:

$ wget http://www.diveintopython.org/download/diveintopython-xml-5.4.zip
$ unzip diveintopython-xml-5.4.zip
$ cd diveintopython-5.4/xml/
$ python
Python 2.6.4 (r264:75706, Dec 8 2009, 12:03:07)
[GCC 4.4.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml.etree import parse
>>> parse("diveintopython.xml")
Segmentation fault

Lorenzo De Liso (blackz)
Changed in lxml:
assignee: nobody → Lorenzo De Liso (blackz)
status: New → In Progress
Revision history for this message
Lorenzo De Liso (blackz) wrote :

I don't see the library "lxml.etree", maybe the name is wrong?

Changed in lxml:
status: In Progress → Invalid
Revision history for this message
scoder (scoder) wrote : Re: [Bug 502959] [NEW] segfaults when importing xml

I can reproduce this. However, it seems to crash deeply inside of libxml2's
parser, and it isn't immediately obvious how lxml could trigger this.
(xmllint parses the file ok, but doesn't use a dictionary in doing so)

(gdb) bt 25
#0 strlen () at ../sysdeps/x86_64/strlen.S:31

#1 0x00007ffff6593a94 in xmlDictLookup () from /usr/lib/libxml2.so.2

#2 0x00007ffff64f6d24 in ?? () from /usr/lib/libxml2.so.2

#3 0x00007ffff64e8fc3 in xmlParseReference () from /usr/lib/libxml2.so.2

#4 0x00007ffff64e6c28 in xmlParseContent () from /usr/lib/libxml2.so.2

#5 0x00007ffff64e675b in xmlParseElement () from /usr/lib/libxml2.so.2

#6 0x00007ffff64e6c1a in xmlParseContent () from /usr/lib/libxml2.so.2

#7 0x00007ffff64e675b in xmlParseElement () from /usr/lib/libxml2.so.2

#8 0x00007ffff64e6c1a in xmlParseContent () from /usr/lib/libxml2.so.2

#9 0x00007ffff64e675b in xmlParseElement () from /usr/lib/libxml2.so.2

#10 0x00007ffff64e6c1a in xmlParseContent () from /usr/lib/libxml2.so.2

#11 0x00007ffff64e675b in xmlParseElement () from /usr/lib/libxml2.so.2

#12 0x00007ffff64e6c1a in xmlParseContent () from /usr/lib/libxml2.so.2

#13 0x00007ffff64e675b in xmlParseElement () from /usr/lib/libxml2.so.2
#14 0x00007ffff64e6c1a in xmlParseContent () from /usr/lib/libxml2.so.2
#15 0x00007ffff64e675b in xmlParseElement () from /usr/lib/libxml2.so.2
#16 0x00007ffff64e6c1a in xmlParseContent () from /usr/lib/libxml2.so.2
#17 0x00007ffff64e7f3a in xmlParseCtxtExternalEntity () from
/usr/lib/libxml2.so.2
#18 0x00007ffff6597dbf in xmlSAX2GetEntity () from /usr/lib/libxml2.so.2
#19 0x00007ffff64dcf89 in xmlParseEntityRef () from /usr/lib/libxml2.so.2
#20 0x00007ffff64e8b3f in xmlParseReference () from /usr/lib/libxml2.so.2
#21 0x00007ffff64e6c28 in xmlParseContent () from /usr/lib/libxml2.so.2
#22 0x00007ffff64e675b in xmlParseElement () from /usr/lib/libxml2.so.2
#23 0x00007ffff64ed81a in xmlParseDocument () from /usr/lib/libxml2.so.2
#24 0x00007ffff64edb05 in ?? () from /usr/lib/libxml2.so.2
#25 0x00007ffff6cb0223 in
__pyx_f_4lxml_5etree_11_BaseParser__parseDocFromFile
(__pyx_v_self=0x7ffff5bf7478, __pyx_v_c_filename=0x7ffff7f5e094
"diveintopython.xml")
     at src/lxml/lxml.etree.c:71908

Revision history for this message
Lorenzo De Liso (blackz) wrote : Re: segfaults when importing xml

Maybe you should try with "StringIO" and then "parse"

Changed in lxml:
status: Invalid → In Progress
Revision history for this message
scoder (scoder) wrote : Re: [Bug 502959] Re: segfaults when importing xml

> I don't see the library "lxml.etree", maybe the name is wrong?

python-lxml needs to be installed separately. "diveintopython" is only used
because the contained XML files happens to trigger the crash.

Stefan

Changed in lxml (Debian):
status: Unknown → Confirmed
Revision history for this message
scoder (scoder) wrote : Re: segfaults when importing xml

This is an upstream bug in libxml2. There's nothing lxml can do about this.

Changed in lxml:
assignee: Lorenzo De Liso (blackz) → nobody
importance: Undecided → High
status: In Progress → Confirmed
Revision history for this message
scoder (scoder) wrote :

This can be reproduced in plain libxml2 using

xmllint --noent diveintopython-5.4/xml/diveintopython.xml

summary: - segfaults when importing xml
+ segfault when parsing docbook XML with several external entities
Changed in libxml2:
status: Unknown → New
Changed in libxml2:
importance: Unknown → Critical
Revision history for this message
scoder (scoder) wrote :

This is fixed in libxml2 2.9.0.

Changed in lxml:
milestone: none → 3.0
status: Confirmed → Fix Released
Changed in libxml2:
status: New → Fix Released
Changed in lxml (Debian):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.