Crash on accessing text nodes by xpath
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Fix Released
|
High
|
scoder |
Bug Description
Accessing text nodes with an xslt extension function causes a hard crash in python
I am trying to integrate pygments into a docbook tool change using the technique recommended at http://
This crash causes windows to display the python has crashed window, and there is no error message from lxml itself. I have narrowed the actual cause of the crash to an xpath expression, I will include some files that represent a minimal test case to recreate the error (which does not use pygments) below, and attach them as a .zip.
The script attempts to process testxml.xml with testxsl.xsl to create test.fo. It registers an extension function that is called by a template called by the template for programlisting which in this minimal test case simply returns an emphasis element containing the matched element name and number of text nodes. It attempt to access the text nodes of the context element by .xpath("./text()") which causes the crash. Removing this line eliminates the error, and using other versions of the xpath which access these text nodes (.//text(), text(), ./node(), etc) all cause the crash as well but accessing the document text nodes (//text()) does not (although it did in my actual more complex version). Accessing elements here is fine, it is only the text or node versions that cause a problem, and text nodes can be accessed with .itertext().
Accessing the same element and running these xpath expressions in the console do not cause a crash (not using xslt, just using xpath to grab elements), it only occurs in the extension function when it is called by a stylesheet. Additionally, counting the text nodes does not cause a crash (count(text())) - only when attempting to read and return them.
I am unsure if this is caused by libxml2 or by lxml. Find below testxml.xml which is transformed by testxsl.xsl with test.py. These three files are also attached as a zip.
Versions used:
Python : sys.version_
lxml.etree : (3, 3, 5, 0)
libxml used : (2, 9, 0)
libxml compiled : (2, 9, 0)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)
OS: Windows 7
Docbook stylesheets: 1.78.0
Testxml.xml (document to be transformed):
<?xml version="1.0" encoding="utf-8"?>
<article xmlns="http://
<info>
<title>Test article</title>
<author>
<personname>
<honorific>
<firstname>
<surname>
</personname>
</author>
</info>
<section>
<title>Section 1</title>
<para>Test section</para>
<programlisting language="java">
public class Hello {
public static void main(String[] args) {
System.
}
}
</programlisting>
</section>
</article>
testxsl.xsl (document to do transform, docbook stylesheets are at c:\stylesheets\
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://
<xsl:import href="file:
<xsl:param name="highlight
<xsl:template name="apply-
<fo:block>
<xsl:if test="function-
<xsl:variable name="language"
<xsl:
</xsl:if>
</fo:block>
</xsl:template>
</xsl:stylesheet>
test.py (program to perform transform, including extension function):
#!python2
from lxml import etree
def fo_highlight(
root = etree.Element(
root.text=
print("OK BEFORE THIS LINE")
# This next line causes a crash #
# Works ok if this line is commented out #
text_nodes = context.
print("OK AFTER THIS LINE")
return root
tst = etree.FunctionN
tst.prefix = 'test'
tst['highlight'] = fo_highlight
transform = etree.XSLT(
document = etree.parse(
result = transform(document)
open("test.
print(transform
Thanks for the very detailed bug report. I could easily reproduce it with the docbook XSL files from here:
http:// sourceforge. net/projects/ docbook/ files/docbook- xsl/1.78. 1/
So easily that the bug was pretty obvious. I'll push a fix soon.