Error log locations are zeroed out

Bug #1756920 reported by Roma Klapaukh
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lxml
Triaged
Undecided
Unassigned

Bug Description

Errors in the context error_log may have their line and column set to zero (and the path is None) when reporting on errors from a schema.

This error occurs in python but not in xmllint.

OS info:
$ uname -a
Darwin Nix.local 17.4.0 Darwin Kernel Version 17.4.0: Sun Dec 17 09:19:54 PST 2017; root:xnu-4570.41.2~1/RELEASE_X86_64 x86_64

Requested information:
>>> print("%-20s: %s" % ('Python', sys.version_info))
Python : sys.version_info(major=3, minor=6, micro=3, releaselevel='final', serial=0)
>>> print("%-20s: %s" % ('lxml.etree', etree.LXML_VERSION))
lxml.etree : (4, 2, 0, 0)
>>> print("%-20s: %s" % ('libxml used', etree.LIBXML_VERSION))
libxml used : (2, 9, 8)
>>> print("%-20s: %s" % ('libxml compiled', etree.LIBXML_COMPILED_VERSION))
libxml compiled : (2, 9, 8)
>>> print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_VERSION))
libxslt used : (1, 1, 32)
>>> print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION))
libxslt compiled : (1, 1, 32)

Files / code to reproduce:

test.py
--------
#!/usr/bin/env python3

from lxml.etree import parse, XMLSchema, iterparse, XMLSyntaxError

xml_file = 'test.xml'
xsd_file = 'library.xsd'

xsd_document = parse(xsd_file)
schema = XMLSchema(xsd_document)

context = iterparse(xml_file, schema=schema)

try:
    for _, elem in context:
        pass
except XMLSyntaxError as error:
    for error in context.error_log:
        print('At', error.line, ':', error.column,'(', error.path, ')', error.message)

library.xsd
------------
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:d="http://test.com/library"
   xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified"
   targetNamespace="http://test.com/library">
   <xs:element name="document">
         <xs:complexType>
         <xs:sequence>
            <xs:element name="metadata" type="xs:string"/>
         </xs:sequence>
      </xs:complexType>
    </xs:element>
</xs:schema>

test.xml
---------
<?xml version="1.0"?>
<d:document xmlns:d="http://test.com/library" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://test.com/library">
  <d:dog/>
</d:document>

Python output:
---------------
$ ./test.py
At 0 : 0 ( None ) Element '{http://test.com/library}dog': This element is not expected. Expected is ( {http://test.com/library}metadata ).

^^ Note that the line is set to 0 rather than 3

xmllint output:
----------------
$ xmllint --version
xmllint: using libxml version 20905
   compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ICU ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib Lzma

$ xmllint --noout --schema library.xsd test.xml
test.xml:3: element dog: Schemas validity error : Element '{http://test.com/library}dog': This element is not expected. Expected is ( {http://test.com/library}metadata ).
test.xml fails to validate

Revision history for this message
scoder (scoder) wrote :

Probably worth investigating what xmllint does differently here. Or just put a print statement in xmlerror.pxi to see what exact error information libxml2 reports here, and if there is anything else that can be extracted from it.

Changed in lxml:
status: New → Triaged
Revision history for this message
Tb_ (thomasb81) wrote :

Seems duplicated of https://bugs.launchpad.net/lxml/+bug/2003322
Which suggest a work-arround

Revision history for this message
Tb_ (thomasb81) wrote :

By using lxml.etree.XMLSchema.assertValid method, we have xmllint result.

It seems to me that when we call:
https://github.com/lxml/lxml/blob/d9edac0bde3a982e4303f7f0d73ab4dd907fc43b/src/lxml/xmlschema.pxi#L133

or

https://github.com/lxml/lxml/blob/d9edac0bde3a982e4303f7f0d73ab4dd907fc43b/src/lxml/xmlschema.pxi#L198

the _xmlError structure line attribute seems not fill the same way.

Is SAX parsing methodology that prevent to capture line number on an issue ?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.