crash when mixing with C lib that calls libxml2

Bug #1748019 reported by Cole Robinson on 2018-02-07
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Undecided
Unassigned

Bug Description

Reproducer:
$ cat test.py
import os
import libvirt
import lxml.etree
libvirt.open("test://%s/testdriver.xml" % os.getcwd())

$ wget https://raw.githubusercontent.com/virt-manager/virt-manager/master/tests/testdriver.xml
$ python test.py
Segmentation fault (core dumped)

Backtrace is:

Program received signal SIGSEGV, Segmentation fault.
__pyx_f_4lxml_5etree__local_resolver (__pyx_v_c_url=0x55555578e630 "/home/crobinso/src/virt-manager/testdriver.xml", __pyx_v_c_pubid=0x0, __pyx_v_c_context=0x55555595ea30) at src/lxml/etree.c:40478
40478 __pyx_t_3 = __Pyx_PyObject_Call(__pyx_t_4, __pyx_t_7, NULL); if (unlikely(!__pyx_t_3)) __PYX_ERR(1, 280, __pyx_L1_error)
gdb$ bt
#0 __pyx_f_4lxml_5etree__local_resolver (__pyx_v_c_url=0x55555578e630 "/home/crobinso/src/virt-manager/testdriver.xml", __pyx_v_c_pubid=0x0, __pyx_v_c_context=0x55555595ea30) at src/lxml/etree.c:40478
#1 0x00007fffee17d88d in xmlLoadExternalEntity () from /lib64/libxml2.so.2
#2 0x00007fffee16a767 in xmlCtxtReadFile () from /lib64/libxml2.so.2
#3 0x00007fffef6d1494 in virXMLParseHelper () from /lib64/libvirt.so.0
#4 0x00007fffef7b6281 in testConnectOpen () from /lib64/libvirt.so.0
#5 0x00007fffef76eb3f in virConnectOpenInternal () from /lib64/libvirt.so.0
#6 0x00007fffef76fd70 in virConnectOpen () from /lib64/libvirt.so.0
#7 0x00007fffefc46470 in libvirt_virConnectOpen () from /usr/lib64/python2.7/site-packages/libvirtmod.so
#8 0x00007ffff7afe62e in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#9 0x00007ffff7aff288 in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#10 0x00007ffff7afc283 in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0
#11 0x00007ffff7aff288 in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0
#12 0x00007ffff7aff499 in PyEval_EvalCode () from /lib64/libpython2.7.so.1.0
#13 0x00007ffff7b0578f in run_mod () from /lib64/libpython2.7.so.1.0
#14 0x00007ffff7b0573a in PyRun_FileExFlags () from /lib64/libpython2.7.so.1.0
#15 0x00007ffff7b0562e in PyRun_SimpleFileExFlags () from /lib64/libpython2.7.so.1.0
#16 0x00007ffff7b0b8ce in Py_Main () from /lib64/libpython2.7.so.1.0
#17 0x00007ffff6c5900a in __libc_start_main () from /lib64/libc.so.6
#18 0x000055555555478a in _start ()

I'm on Fedora 27:

$ rpm -q python2-lxml python2-libvirt python2-libxml2
python2-lxml-4.1.1-1.fc27.x86_64
python2-libvirt-3.7.0-1.fc27.x86_64
python2-libxml2-2.9.7-1.fc27.x86_64

Python : sys.version_info(major=2, minor=7, micro=14, releaselevel='final', serial=0)
lxml.etree : (4, 1, 1, 0)
libxml used : (2, 9, 7)
libxml compiled : (2, 9, 5)
libxslt used : (1, 1, 29)
libxslt compiled : (1, 1, 29)

Crashes with python3 as well

Cole Robinson (crobinso) wrote :

I reproduced with lxml 4.0.0 and 3.7.2 packages from fedora, but I didn't reproduce with pip installed lxml, so maybe this is distribution build specific

Cole Robinson (crobinso) wrote :

I filed a Fedora bug: https://bugzilla.redhat.com/show_bug.cgi?id=1544019

to the lxml maintainers, what distro do you build lxml pip archives on?

scoder (scoder) wrote :

The PyPI wheels are manylinux1 builds (PEP 513) but statically link the latest libxml2 and libxslt versions (at the time of their creation). That's probably why they do not conflict with the external libraries.

Fedora (and most Linux distros) most likely build their lxml package against the system provided libraries, which totally makes sense for them. But it also means that multiple users of libxml2 will use the same library, and their configuration might conflict. Specifically, if both try to use different URL/file resolvers and configure them globally in libxml2, then their will definitely be a conflict between the two.

scoder (scoder) wrote :

I'll close this as "won't fix", since it's been documented for years. Unless someone finds a better way to work around this, static builds will be the way to handle it.
http://lxml.de/installation.html#using-lxml-with-python-libxml2

Changed in lxml:
status: New → Won't Fix
Cole Robinson (crobinso) wrote :

I think libvirt can be extended here to workaround the conflict with something like:

diff --git a/src/util/virxml.c b/src/util/virxml.c
index 6e87605ea..248c5d7b2 100644
--- a/src/util/virxml.c
+++ b/src/util/virxml.c
@@ -810,9 +810,12 @@ virXMLParseHelper(int domcode,
     pctxt->sax->error = catchXMLError;

     if (filename) {
+ xmlExternalEntityLoader origloader = xmlGetExternalEntityLoader();
+ xmlSetExternalEntityLoader(xmlNoNetExternalEntityLoader);
         xml = xmlCtxtReadFile(pctxt, filename, NULL,
                               XML_PARSE_NONET |
                               XML_PARSE_NOWARNING);
+ xmlSetExternalEntityLoader(origloader);
     } else {
         xml = xmlCtxtReadDoc(pctxt, BAD_CAST xmlStr, url, NULL,
                              XML_PARSE_NONET |

That said, maybe lxml can do something similar too? Rather than register a libxml2 callback globally, only do it before entering the APIs that actually do file resolving. Seems there's only a few entry points for that, basically initial parsing and xinclude, but I didn't dig too deeply. Calling those functions shouldn't have much performance impact, they are just swapping pointers in C. I'm curious your thoughts

scoder (scoder) wrote :

I agree that this can be done in some places, and it would probably improve the situation.

However, lxml also allows callbacks into user code from within the parser. Setting and resetting a couple of thread-local variables at each callback (e.g. for every tag and every chunk of text) would probably still end up being costly.

Changed in lxml:
status: Won't Fix → Triaged
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.