lxml

Crash on accessing text nodes by xpath

Bug #1354652 reported by Matthew Halverson on 2014-08-09

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	lxml	Fix Released	High	scoder	lxml 3.3

Bug Description

Accessing text nodes with an xslt extension function causes a hard crash in python

I am trying to integrate pygments into a docbook tool change using the technique recommended at http://www.lunaryorn.de/articles/docbook_pygments.html which must be accessed through http://web.archive.org/web/20110212210158/http://www.lunaryorn.de/articles/docbook_pygments.html, which is causing a crash. This error is not related to pygments as the below test cases do not use it.

This crash causes windows to display the python has crashed window, and there is no error message from lxml itself. I have narrowed the actual cause of the crash to an xpath expression, I will include some files that represent a minimal test case to recreate the error (which does not use pygments) below, and attach them as a .zip.

The script attempts to process testxml.xml with testxsl.xsl to create test.fo. It registers an extension function that is called by a template called by the template for programlisting which in this minimal test case simply returns an emphasis element containing the matched element name and number of text nodes. It attempt to access the text nodes of the context element by .xpath("./text()") which causes the crash. Removing this line eliminates the error, and using other versions of the xpath which access these text nodes (.//text(), text(), ./node(), etc) all cause the crash as well but accessing the document text nodes (//text()) does not (although it did in my actual more complex version). Accessing elements here is fine, it is only the text or node versions that cause a problem, and text nodes can be accessed with .itertext().

Accessing the same element and running these xpath expressions in the console do not cause a crash (not using xslt, just using xpath to grab elements), it only occurs in the extension function when it is called by a stylesheet. Additionally, counting the text nodes does not cause a crash (count(text())) - only when attempting to read and return them.

I am unsure if this is caused by libxml2 or by lxml. Find below testxml.xml which is transformed by testxsl.xsl with test.py. These three files are also attached as a zip.

Versions used:
Python : sys.version_info(major=2, minor=7, micro=8, releaselevel='final', serial=0)
lxml.etree : (3, 3, 5, 0)
libxml used : (2, 9, 0)
libxml compiled : (2, 9, 0)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)

OS: Windows 7
Docbook stylesheets: 1.78.0

Testxml.xml (document to be transformed):

<?xml version="1.0" encoding="utf-8"?>
<article xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude" version="5.0" xml:lang="en">
<info>
  <title>Test article</title>
  <author>
   <personname>
    <honorific>Mr</honorific>
    <firstname>Me</firstname>
    <surname>Myself</surname>
   </personname>
  </author>
</info>
<section>
  <title>Section 1</title>
  <para>Test section</para>
  <programlisting language="java">
public class Hello {
public static void main(String[] args) {
  System.out.println("Hello World!");
}
}
  </programlisting>
</section>
</article>

testxsl.xsl (document to do transform, docbook stylesheets are at c:\stylesheets\docbook):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:test="http://test.org" version="1.0">

<xsl:import href="file:///c:/stylesheets/docbook/fo/docbook.xsl"/>

<xsl:param name="highlight.source">1</xsl:param>

<xsl:template name="apply-highlighting">
  <fo:block>
   <xsl:if test="function-available('test:highlight')">
    <xsl:variable name="language">java</xsl:variable>
    <xsl:apply-templates select="test:highlight($language)"/>
   </xsl:if>
  </fo:block>
</xsl:template>

</xsl:stylesheet>

test.py (program to perform transform, including extension function):
#!python2
from lxml import etree

def fo_highlight(context,language):
root = etree.Element("emphasis")
root.text=context.context_node.tag+" "+str(context.context_node.xpath("count(text())"))
print("OK BEFORE THIS LINE")
# This next line causes a crash #
# Works ok if this line is commented out #
text_nodes = context.context_node.xpath("./text()")
print("OK AFTER THIS LINE")
return root

tst = etree.FunctionNamespace('http://test.org')
tst.prefix = 'test'
tst['highlight'] = fo_highlight

transform = etree.XSLT(etree.parse("testxsl.xsl"))
document = etree.parse("testxml.xml")
result = transform(document)
open("test.fo","w").write(str(result))
print(transform.error_log)

Tags: