Crash on accessing text nodes by xpath

Bug #1354652 reported by Matthew Halverson
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Fix Released
High
scoder

Bug Description

Accessing text nodes with an xslt extension function causes a hard crash in python

I am trying to integrate pygments into a docbook tool change using the technique recommended at http://www.lunaryorn.de/articles/docbook_pygments.html which must be accessed through http://web.archive.org/web/20110212210158/http://www.lunaryorn.de/articles/docbook_pygments.html, which is causing a crash. This error is not related to pygments as the below test cases do not use it.

This crash causes windows to display the python has crashed window, and there is no error message from lxml itself. I have narrowed the actual cause of the crash to an xpath expression, I will include some files that represent a minimal test case to recreate the error (which does not use pygments) below, and attach them as a .zip.

The script attempts to process testxml.xml with testxsl.xsl to create test.fo. It registers an extension function that is called by a template called by the template for programlisting which in this minimal test case simply returns an emphasis element containing the matched element name and number of text nodes. It attempt to access the text nodes of the context element by .xpath("./text()") which causes the crash. Removing this line eliminates the error, and using other versions of the xpath which access these text nodes (.//text(), text(), ./node(), etc) all cause the crash as well but accessing the document text nodes (//text()) does not (although it did in my actual more complex version). Accessing elements here is fine, it is only the text or node versions that cause a problem, and text nodes can be accessed with .itertext().

Accessing the same element and running these xpath expressions in the console do not cause a crash (not using xslt, just using xpath to grab elements), it only occurs in the extension function when it is called by a stylesheet. Additionally, counting the text nodes does not cause a crash (count(text())) - only when attempting to read and return them.

I am unsure if this is caused by libxml2 or by lxml. Find below testxml.xml which is transformed by testxsl.xsl with test.py. These three files are also attached as a zip.

Versions used:
Python : sys.version_info(major=2, minor=7, micro=8, releaselevel='final', serial=0)
lxml.etree : (3, 3, 5, 0)
libxml used : (2, 9, 0)
libxml compiled : (2, 9, 0)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)

OS: Windows 7
Docbook stylesheets: 1.78.0

Testxml.xml (document to be transformed):

<?xml version="1.0" encoding="utf-8"?>
<article xmlns="http://docbook.org/ns/docbook" xmlns:xi="http://www.w3.org/2001/XInclude" version="5.0" xml:lang="en">
 <info>
  <title>Test article</title>
  <author>
   <personname>
    <honorific>Mr</honorific>
    <firstname>Me</firstname>
    <surname>Myself</surname>
   </personname>
  </author>
 </info>
 <section>
  <title>Section 1</title>
  <para>Test section</para>
  <programlisting language="java">
public class Hello {
 public static void main(String[] args) {
  System.out.println("Hello World!");
 }
}
  </programlisting>
 </section>
</article>

testxsl.xsl (document to do transform, docbook stylesheets are at c:\stylesheets\docbook):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:test="http://test.org" version="1.0">

 <xsl:import href="file:///c:/stylesheets/docbook/fo/docbook.xsl"/>

 <xsl:param name="highlight.source">1</xsl:param>

 <xsl:template name="apply-highlighting">
  <fo:block>
   <xsl:if test="function-available('test:highlight')">
    <xsl:variable name="language">java</xsl:variable>
    <xsl:apply-templates select="test:highlight($language)"/>
   </xsl:if>
  </fo:block>
 </xsl:template>

</xsl:stylesheet>

test.py (program to perform transform, including extension function):
#!python2
from lxml import etree

def fo_highlight(context,language):
 root = etree.Element("emphasis")
 root.text=context.context_node.tag+" "+str(context.context_node.xpath("count(text())"))
 print("OK BEFORE THIS LINE")
 # This next line causes a crash #
 # Works ok if this line is commented out #
 text_nodes = context.context_node.xpath("./text()")
 print("OK AFTER THIS LINE")
 return root

tst = etree.FunctionNamespace('http://test.org')
tst.prefix = 'test'
tst['highlight'] = fo_highlight

transform = etree.XSLT(etree.parse("testxsl.xsl"))
document = etree.parse("testxml.xml")
result = transform(document)
open("test.fo","w").write(str(result))
print(transform.error_log)

Revision history for this message
Matthew Halverson (mhalver123) wrote :
Revision history for this message
scoder (scoder) wrote :

Thanks for the very detailed bug report. I could easily reproduce it with the docbook XSL files from here:

http://sourceforge.net/projects/docbook/files/docbook-xsl/1.78.1/

So easily that the bug was pretty obvious. I'll push a fix soon.

Revision history for this message
Matthew Halverson (mhalver123) wrote :

Glad to know that you were able to identify it that well from my description. I was worried that I was being overly verbose.

Revision history for this message
scoder (scoder) wrote :
Changed in lxml:
assignee: nobody → scoder (scoder)
importance: Undecided → High
milestone: none → 3.3
status: New → Fix Committed
Revision history for this message
Matthew Halverson (mhalver123) wrote :

I'll try to build it and test it out, but I will likely have to wait till the next build is posted to the python package index, as I have not had very good luck building anything that relies on libxml or libxslt under Windows myself (I am a bit lost in the C universe). For now, I have found an alternative method for my original purposes that sidesteps the bug.

Revision history for this message
scoder (scoder) wrote :

Fixed in lxml 3.3.6.

Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.