2015-05-23 14:02:28 |
Steven Samuel Cole |
description |
Summary: error position in lxml exception message seems wrong
Further information:
Environment: virtual environment on Mac OS X 10.8
Output from bug reporting guidelines script:
Python : sys.version_info(major=2, minor=7, micro=2, releaselevel='final', serial=0)
lxml.etree : (3, 4, 2, 0)
libxml used : (2, 7, 8)
libxml compiled : (2, 7, 8)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)
Problem: When extra contents after a root xml element is given to lxml for parsing, it correctly reports "Extra content at the end of the document", but the column number included in the error message seems wrong - IF the root element has attributes.
Expected behavior: The same as xmllint (using the same underlying libxml) which indicates the correct position of the error:
# verify version
(venv)host:~ user$ xmllint --version
xmllint: using libxml version 20708
compiled with: Threads Tree Output Push Reader Patterns Writer
SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer
XInclude ISO8859X Unicode Regexps Automata Expr Schemas Schematron
Modules Debug Zlib
# valid XML (for self-test):
(venv)host:~ user$ echo "<root/>" | xmllint -
<?xml version="1.0"?>
<root/>
# NOTE: This page (https://bugs.launchpad.net/lxml/+filebug) doesn't seem to support any markup
# and I don't know what this report looks like in the end; the ^ do point at the correct position
# invalid xml (extra content):
(venv)host:~ user$ echo "<root/> extra content" | xmllint -
-:1: parser error : Extra content at the end of the document
<root/><root/>
^
# invalid xml (extra content) with attribute:
(venv)host:~ user$ echo "<root attr01=\"value01\"/> extra content" | xmllint -
-:1: parser error : Extra content at the end of the document
<root attr01="value01"/> extra content
^
Actual behavior: Demonstrated by this script:
#!/usr/bin/env python
from lxml import etree
test_xml_list = ["<root/>", "<root/> extra content", "<root attr01=\"value01\"/> extra content"]
for test_xml in test_xml_list:
print 'parse "%s":' % test_xml
try:
etree.fromstring(test_xml)
except etree.XMLSyntaxError as e:
print e
print 'test_xml[:e.position[1]]:', test_xml[:e.position[1]]
print
Output:
(venv)host:~ user$ ./lxml_test.py
parse "<root/>":
parse "<root/> extra content":
Extra content at the end of the document, line 1, column 9
test_xml[:e.position[1]]: <root/> e
parse "<root attr01="value01"/> extra content":
Extra content at the end of the document, line 1, column 16
test_xml[:e.position[1]]: <root attr01="va
The error messaege column information is correct for the first case, but wrong for the second. |
submitted at : https://bugs.launchpad.net/lxml/+filebug
URL : https://bugs.launchpad.net/lxml/+bug/1458175
Summary : error position in lxml exception message seems wrong
Further information:
Environment: virtual environment on Mac OS X 10.8
Output from bug reporting guidelines script:
Python : sys.version_info(major=2, minor=7, micro=2, releaselevel='final', serial=0)
lxml.etree : (3, 4, 2, 0)
libxml used : (2, 7, 8)
libxml compiled : (2, 7, 8)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)
Problem: When extra contents after a root xml element is given to lxml for parsing, it correctly reports "Extra content at the end of the document", but the column number included in the error message seems wrong - IF the root element has attributes.
Expected behavior: The same as xmllint (using the same underlying libxml) which indicates the correct position of the error:
# verify version
(venv)host:~ user$ xmllint --version
xmllint: using libxml version 20708
compiled with: Threads Tree Output Push Reader Patterns Writer
SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer
XInclude ISO8859X Unicode Regexps Automata Expr Schemas Schematron
Modules Debug Zlib
# valid XML (for self-test):
(venv)host:~ user$ echo "<root/>" | xmllint -
<?xml version="1.0"?>
<root/>
# NOTE: This page (https://bugs.launchpad.net/lxml/+filebug) doesn't seem to support any markup
# and I don't know what this report looks like in the end; the ^ do point at the correct position
# invalid xml (extra content):
(venv)host:~ user$ echo "<root/> extra content" | xmllint -
-:1: parser error : Extra content at the end of the document
<root/> extra content
^
# invalid xml (extra content) with attribute:
(venv)host:~ user$ echo "<root attr01=\"value01\"/> extra content" | xmllint -
-:1: parser error : Extra content at the end of the document
<root attr01="value01"/> extra content
^
Actual behavior: Demonstrated by this script:
#!/usr/bin/env python
from lxml import etree
test_xml_list = ["<root/>", "<root/> extra content", "<root attr01=\"value01\"/> extra content"]
for test_xml in test_xml_list:
print 'parse "%s":' % test_xml
try:
etree.fromstring(test_xml)
except etree.XMLSyntaxError as e:
print e
print 'test_xml[:e.position[1]]:', test_xml[:e.position[1]]
print
Output:
(venv)host:~ user$ ./lxml_test.py
parse "<root/>":
parse "<root/> extra content":
Extra content at the end of the document, line 1, column 9
test_xml[:e.position[1]]: <root/> e
parse "<root attr01="value01"/> extra content":
Extra content at the end of the document, line 1, column 16
test_xml[:e.position[1]]: <root attr01="va
The error messaege column information is correct for the first case, but wrong for the second. |
|