Activity log for bug #1458175

Date Who What changed Old value New value Message
2015-05-23 13:58:21 Steven Samuel Cole bug added bug
2015-05-23 14:02:28 Steven Samuel Cole description Summary: error position in lxml exception message seems wrong Further information: Environment: virtual environment on Mac OS X 10.8 Output from bug reporting guidelines script: Python : sys.version_info(major=2, minor=7, micro=2, releaselevel='final', serial=0) lxml.etree : (3, 4, 2, 0) libxml used : (2, 7, 8) libxml compiled : (2, 7, 8) libxslt used : (1, 1, 26) libxslt compiled : (1, 1, 26) Problem: When extra contents after a root xml element is given to lxml for parsing, it correctly reports "Extra content at the end of the document", but the column number included in the error message seems wrong - IF the root element has attributes. Expected behavior: The same as xmllint (using the same underlying libxml) which indicates the correct position of the error: # verify version (venv)host:~ user$ xmllint --version xmllint: using libxml version 20708 compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib # valid XML (for self-test): (venv)host:~ user$ echo "<root/>" | xmllint - <?xml version="1.0"?> <root/> # NOTE: This page (https://bugs.launchpad.net/lxml/+filebug) doesn't seem to support any markup # and I don't know what this report looks like in the end; the ^ do point at the correct position # invalid xml (extra content): (venv)host:~ user$ echo "<root/> extra content" | xmllint - -:1: parser error : Extra content at the end of the document <root/><root/> ^ # invalid xml (extra content) with attribute: (venv)host:~ user$ echo "<root attr01=\"value01\"/> extra content" | xmllint - -:1: parser error : Extra content at the end of the document <root attr01="value01"/> extra content ^ Actual behavior: Demonstrated by this script: #!/usr/bin/env python from lxml import etree test_xml_list = ["<root/>", "<root/> extra content", "<root attr01=\"value01\"/> extra content"] for test_xml in test_xml_list: print 'parse "%s":' % test_xml try: etree.fromstring(test_xml) except etree.XMLSyntaxError as e: print e print 'test_xml[:e.position[1]]:', test_xml[:e.position[1]] print Output: (venv)host:~ user$ ./lxml_test.py parse "<root/>": parse "<root/> extra content": Extra content at the end of the document, line 1, column 9 test_xml[:e.position[1]]: <root/> e parse "<root attr01="value01"/> extra content": Extra content at the end of the document, line 1, column 16 test_xml[:e.position[1]]: <root attr01="va The error messaege column information is correct for the first case, but wrong for the second. submitted at : https://bugs.launchpad.net/lxml/+filebug URL : https://bugs.launchpad.net/lxml/+bug/1458175 Summary : error position in lxml exception message seems wrong Further information: Environment: virtual environment on Mac OS X 10.8 Output from bug reporting guidelines script: Python : sys.version_info(major=2, minor=7, micro=2, releaselevel='final', serial=0) lxml.etree : (3, 4, 2, 0) libxml used : (2, 7, 8) libxml compiled : (2, 7, 8) libxslt used : (1, 1, 26) libxslt compiled : (1, 1, 26) Problem: When extra contents after a root xml element is given to lxml for parsing, it correctly reports "Extra content at the end of the document", but the column number included in the error message seems wrong - IF the root element has attributes. Expected behavior: The same as xmllint (using the same underlying libxml) which indicates the correct position of the error: # verify version (venv)host:~ user$ xmllint --version xmllint: using libxml version 20708 compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude ISO8859X Unicode Regexps Automata Expr Schemas Schematron Modules Debug Zlib # valid XML (for self-test): (venv)host:~ user$ echo "<root/>" | xmllint - <?xml version="1.0"?> <root/> # NOTE: This page (https://bugs.launchpad.net/lxml/+filebug) doesn't seem to support any markup # and I don't know what this report looks like in the end; the ^ do point at the correct position # invalid xml (extra content): (venv)host:~ user$ echo "<root/> extra content" | xmllint - -:1: parser error : Extra content at the end of the document <root/> extra content ^ # invalid xml (extra content) with attribute: (venv)host:~ user$ echo "<root attr01=\"value01\"/> extra content" | xmllint - -:1: parser error : Extra content at the end of the document <root attr01="value01"/> extra content ^ Actual behavior: Demonstrated by this script: #!/usr/bin/env python from lxml import etree test_xml_list = ["<root/>", "<root/> extra content", "<root attr01=\"value01\"/> extra content"] for test_xml in test_xml_list: print 'parse "%s":' % test_xml try: etree.fromstring(test_xml) except etree.XMLSyntaxError as e: print e print 'test_xml[:e.position[1]]:', test_xml[:e.position[1]] print Output: (venv)host:~ user$ ./lxml_test.py parse "<root/>": parse "<root/> extra content": Extra content at the end of the document, line 1, column 9 test_xml[:e.position[1]]: <root/> e parse "<root attr01="value01"/> extra content": Extra content at the end of the document, line 1, column 16 test_xml[:e.position[1]]: <root attr01="va The error messaege column information is correct for the first case, but wrong for the second.
2019-01-30 18:15:15 scoder lxml: status New Invalid