No line number when XmlSchema used in XMLParser

Bug #2003322 reported by Daniel Herding
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
lxml
Confirmed
Undecided
Unassigned

Bug Description

When passing an XMLSchema to the XMLParser constructor and then reading invalid XML, the XMLSyntaxError message doesn't contain the proper line number of the XML document. The stack trace always shows "File "<string>", line 0", regardless of where the error actually is.

See attached example, which is based on sniplets from:
https://lxml.de/validation.html

python3 pyGen/xmlValidationNoLineNumber.py
Traceback (most recent call last):
  File "pyGen/xmlValidationNoLineNumber.py", line 16, in <module>
    root = etree.fromstring(
  File "src/lxml/etree.pyx", line 3254, in lxml.etree.fromstring
  File "src/lxml/parser.pxi", line 1913, in lxml.etree._parseMemoryDocument
  File "src/lxml/parser.pxi", line 1793, in lxml.etree._parseDoc
  File "src/lxml/parser.pxi", line 1082, in lxml.etree._BaseParser._parseUnicodeDoc
  File "src/lxml/parser.pxi", line 615, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 725, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 654, in lxml.etree._raiseParseError
  File "<string>", line 0
lxml.etree.XMLSyntaxError: Element 'c': This element is not expected. Expected is ( b ).

Workaround: Instead of passing the XMLSchema to the XMLParser, parse the document without a schema, then call schema.assertValid() on the root node. Then the DocumentInvalid message will contain the correct line number.

parser = etree.XMLParser()
root = etree.fromstring(
'''<a>
    <c></c>
</a>''', parser)

schema = etree.XMLSchema(schema_root)
schema.assertValid(root)

Traceback (most recent call last):
  File "pyGen/xmlValidationCorrect.py", line 20, in <module>
    schema.assertValid(root)
  File "src/lxml/etree.pyx", line 3640, in lxml.etree._Validator.assertValid
lxml.etree.DocumentInvalid: Element 'c': This element is not expected. Expected is ( b )., line 2

Python : sys.version_info(major=3, minor=8, micro=5, releaselevel='final', serial=0)
lxml.etree : (4, 9, 1, 0)
libxml used : (2, 9, 14)
libxml compiled : (2, 9, 14)
libxslt used : (1, 1, 35)
libxslt compiled : (1, 1, 35)

Revision history for this message
Daniel Herding (dherding) wrote :
Revision history for this message
Daniel Herding (dherding) wrote :

The same problem occurs when using etree.parse() instead of etree.fromstring().

Tb_ (thomasb81)
Changed in lxml:
status: New → Confirmed
Revision history for this message
Tb_ (thomasb81) wrote :

Cross reference of what seems to be same issue : https://bugs.launchpad.net/lxml/+bug/1756920

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.