position in lxml error message seems wrong
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Invalid
|
Undecided
|
Unassigned |
Bug Description
submitted at : https:/
URL : https:/
Summary : error position in lxml exception message seems wrong
Further information:
Environment: virtual environment on Mac OS X 10.8
Output from bug reporting guidelines script:
Python : sys.version_
lxml.etree : (3, 4, 2, 0)
libxml used : (2, 7, 8)
libxml compiled : (2, 7, 8)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)
Problem: When extra contents after a root xml element is given to lxml for parsing, it correctly reports "Extra content at the end of the document", but the column number included in the error message seems wrong - IF the root element has attributes.
Expected behavior: The same as xmllint (using the same underlying libxml) which indicates the correct position of the error:
# verify version
(venv)host:~ user$ xmllint --version
xmllint: using libxml version 20708
compiled with: Threads Tree Output Push Reader Patterns Writer
SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer
XInclude ISO8859X Unicode Regexps Automata Expr Schemas Schematron
Modules Debug Zlib
# valid XML (for self-test):
(venv)host:~ user$ echo "<root/>" | xmllint -
<?xml version="1.0"?>
<root/>
# NOTE: This page (https:/
# and I don't know what this report looks like in the end; the ^ do point at the correct position
# invalid xml (extra content):
(venv)host:~ user$ echo "<root/> extra content" | xmllint -
-:1: parser error : Extra content at the end of the document
<root/> extra content
^
# invalid xml (extra content) with attribute:
(venv)host:~ user$ echo "<root attr01=
-:1: parser error : Extra content at the end of the document
<root attr01="value01"/> extra content
Actual behavior: Demonstrated by this script:
#!/usr/bin/env python
from lxml import etree
test_xml_list = ["<root/>", "<root/> extra content", "<root attr01=
for test_xml in test_xml_list:
print 'parse "%s":' % test_xml
try:
except etree.XMLSyntax
print e
print 'test_xml[
print
Output:
(venv)host:~ user$ ./lxml_test.py
parse "<root/>":
parse "<root/> extra content":
Extra content at the end of the document, line 1, column 9
test_xml[
parse "<root attr01="value01"/> extra content":
Extra content at the end of the document, line 1, column 16
test_xml[
The error messaege column information is correct for the first case, but wrong for the second.
description: | updated |
These error messages come from the parser in libxml2. There isn't much that lxml could do about them.