2018-01-09 11:30:49 |
Volker Diels-Grabsch |
description |
The old bug #1341590 reappeared for etree.XMLParser(remove_blank_text=True)
https://bugs.launchpad.net/lxml/+bug/1341590
When loading an XML document with a large number lines, the sourceline is correct for early lines, later stays constants (and wrongly) on 65535, and is again correct for even later lines.
Tested with libxml2 version 2.9.4 and python-lxml version 4.1.0 in Python 2.7.14. |
The old bug #1341590 reappeared for etree.XMLParser(remove_blank_text=True).
When loading an XML document with a large number lines, the sourceline is correct for early lines, later stays constants (and wrongly) on 65535, and is again correct for even later lines.
Tested with libxml2 version 2.9.4 and python-lxml version 4.1.0 in Python 2.7.14. |
|
2018-01-09 12:16:55 |
Volker Diels-Grabsch |
description |
The old bug #1341590 reappeared for etree.XMLParser(remove_blank_text=True).
When loading an XML document with a large number lines, the sourceline is correct for early lines, later stays constants (and wrongly) on 65535, and is again correct for even later lines.
Tested with libxml2 version 2.9.4 and python-lxml version 4.1.0 in Python 2.7.14. |
The old bug #1341590 reappeared for etree.XMLParser(remove_blank_text=True).
When loading an XML document with a large number lines, the sourceline is correct for early lines, later stays constants (and wrongly) on 65535, and is again correct for even later lines.
Tested with libxml2 version 2.9.4 and python-lxml version 4.1.0 in Python 2.7.14.
To add to this, I just found an even more strange case where for remove_blank_text=True the 65535 issue appears, and for remove_blank_text=False
remove_blank_text=True
Expected: 65536, got: 65535
Expected: 65537, got: 65535
Expected: 65538, got: 65535
Expected: 65539, got: 65535
Expected: 65540, got: 65535
remove_blank_text=False
Expected: 65535, got: 65536
Expected: 65536, got: 65537
Expected: 65537, got: 65538
Expected: 65538, got: 65539
Expected: 65539, got: 65540
Expected: 65540, got: 65541
Example program:
from lxml.etree import XMLParser, fromstring
for remove_blank_text in [True, False]:
print('remove_blank_text={!r}'.format(remove_blank_text))
lines = 65540
xmldata = '<a>' + ('<b/>\n' * lines) + '</a>'
tree = fromstring(xmldata, XMLParser(remove_blank_text=remove_blank_text))
ok = True
for i, e in enumerate(tree.iterfind('b')):
line = i + 1
if line != e.sourceline:
ok = False
print(' Expected: {}, got: {}'.format(line, e.sourceline))
if ok:
print(' OK') |
|
2018-01-09 12:18:55 |
Volker Diels-Grabsch |
description |
The old bug #1341590 reappeared for etree.XMLParser(remove_blank_text=True).
When loading an XML document with a large number lines, the sourceline is correct for early lines, later stays constants (and wrongly) on 65535, and is again correct for even later lines.
Tested with libxml2 version 2.9.4 and python-lxml version 4.1.0 in Python 2.7.14.
To add to this, I just found an even more strange case where for remove_blank_text=True the 65535 issue appears, and for remove_blank_text=False
remove_blank_text=True
Expected: 65536, got: 65535
Expected: 65537, got: 65535
Expected: 65538, got: 65535
Expected: 65539, got: 65535
Expected: 65540, got: 65535
remove_blank_text=False
Expected: 65535, got: 65536
Expected: 65536, got: 65537
Expected: 65537, got: 65538
Expected: 65538, got: 65539
Expected: 65539, got: 65540
Expected: 65540, got: 65541
Example program:
from lxml.etree import XMLParser, fromstring
for remove_blank_text in [True, False]:
print('remove_blank_text={!r}'.format(remove_blank_text))
lines = 65540
xmldata = '<a>' + ('<b/>\n' * lines) + '</a>'
tree = fromstring(xmldata, XMLParser(remove_blank_text=remove_blank_text))
ok = True
for i, e in enumerate(tree.iterfind('b')):
line = i + 1
if line != e.sourceline:
ok = False
print(' Expected: {}, got: {}'.format(line, e.sourceline))
if ok:
print(' OK') |
The old bug #1341590 reappeared for etree.XMLParser(remove_blank_text=True).
When loading an XML document with a large number lines, the sourceline is correct for early lines, later stays constants (and wrongly) on 65535, and is again correct for even later lines.
Tested with libxml2 version 2.9.4 and python-lxml version 4.1.0 in Python 2.7.14.
To add to this, I just found an even more strange case where for remove_blank_text=False an off-by-one error in sourceline appears.
Example program:
----------------------------------------------------------------------
from lxml.etree import XMLParser, fromstring
for remove_blank_text in [True, False]:
print('remove_blank_text={!r}'.format(remove_blank_text))
lines = 65540
xmldata = '<a>' + ('<b/>\n' * lines) + '</a>'
tree = fromstring(xmldata, MLParser(remove_blank_text=remove_blank_text))
ok = True
for i, e in enumerate(tree.iterfind('b')):
line = i + 1
if line != e.sourceline:
ok = False
print(' Expected: {}, got: {}'.format(line, e.sourceline))
if ok:
print(' OK')
----------------------------------------------------------------------
Output:
----------------------------------------------------------------------
remove_blank_text=True
Expected: 65536, got: 65535
Expected: 65537, got: 65535
Expected: 65538, got: 65535
Expected: 65539, got: 65535
Expected: 65540, got: 65535
remove_blank_text=False
Expected: 65535, got: 65536
Expected: 65536, got: 65537
Expected: 65537, got: 65538
Expected: 65538, got: 65539
Expected: 65539, got: 65540
Expected: 65540, got: 65541
---------------------------------------------------------------------- |
|