Activity log for bug #1742121

Date Who What changed Old value New value Message
2018-01-09 11:26:20 Volker Diels-Grabsch bug added bug
2018-01-09 11:30:49 Volker Diels-Grabsch description The old bug #1341590 reappeared for etree.XMLParser(remove_blank_text=True) https://bugs.launchpad.net/lxml/+bug/1341590 When loading an XML document with a large number lines, the sourceline is correct for early lines, later stays constants (and wrongly) on 65535, and is again correct for even later lines. Tested with libxml2 version 2.9.4 and python-lxml version 4.1.0 in Python 2.7.14. The old bug #1341590 reappeared for etree.XMLParser(remove_blank_text=True). When loading an XML document with a large number lines, the sourceline is correct for early lines, later stays constants (and wrongly) on 65535, and is again correct for even later lines. Tested with libxml2 version 2.9.4 and python-lxml version 4.1.0 in Python 2.7.14.
2018-01-09 12:16:55 Volker Diels-Grabsch description The old bug #1341590 reappeared for etree.XMLParser(remove_blank_text=True). When loading an XML document with a large number lines, the sourceline is correct for early lines, later stays constants (and wrongly) on 65535, and is again correct for even later lines. Tested with libxml2 version 2.9.4 and python-lxml version 4.1.0 in Python 2.7.14. The old bug #1341590 reappeared for etree.XMLParser(remove_blank_text=True). When loading an XML document with a large number lines, the sourceline is correct for early lines, later stays constants (and wrongly) on 65535, and is again correct for even later lines. Tested with libxml2 version 2.9.4 and python-lxml version 4.1.0 in Python 2.7.14. To add to this, I just found an even more strange case where for remove_blank_text=True the 65535 issue appears, and for remove_blank_text=False remove_blank_text=True Expected: 65536, got: 65535 Expected: 65537, got: 65535 Expected: 65538, got: 65535 Expected: 65539, got: 65535 Expected: 65540, got: 65535 remove_blank_text=False Expected: 65535, got: 65536 Expected: 65536, got: 65537 Expected: 65537, got: 65538 Expected: 65538, got: 65539 Expected: 65539, got: 65540 Expected: 65540, got: 65541 Example program: from lxml.etree import XMLParser, fromstring for remove_blank_text in [True, False]: print('remove_blank_text={!r}'.format(remove_blank_text)) lines = 65540 xmldata = '<a>' + ('<b/>\n' * lines) + '</a>' tree = fromstring(xmldata, XMLParser(remove_blank_text=remove_blank_text)) ok = True for i, e in enumerate(tree.iterfind('b')): line = i + 1 if line != e.sourceline: ok = False print(' Expected: {}, got: {}'.format(line, e.sourceline)) if ok: print(' OK')
2018-01-09 12:18:55 Volker Diels-Grabsch description The old bug #1341590 reappeared for etree.XMLParser(remove_blank_text=True). When loading an XML document with a large number lines, the sourceline is correct for early lines, later stays constants (and wrongly) on 65535, and is again correct for even later lines. Tested with libxml2 version 2.9.4 and python-lxml version 4.1.0 in Python 2.7.14. To add to this, I just found an even more strange case where for remove_blank_text=True the 65535 issue appears, and for remove_blank_text=False remove_blank_text=True Expected: 65536, got: 65535 Expected: 65537, got: 65535 Expected: 65538, got: 65535 Expected: 65539, got: 65535 Expected: 65540, got: 65535 remove_blank_text=False Expected: 65535, got: 65536 Expected: 65536, got: 65537 Expected: 65537, got: 65538 Expected: 65538, got: 65539 Expected: 65539, got: 65540 Expected: 65540, got: 65541 Example program: from lxml.etree import XMLParser, fromstring for remove_blank_text in [True, False]: print('remove_blank_text={!r}'.format(remove_blank_text)) lines = 65540 xmldata = '<a>' + ('<b/>\n' * lines) + '</a>' tree = fromstring(xmldata, XMLParser(remove_blank_text=remove_blank_text)) ok = True for i, e in enumerate(tree.iterfind('b')): line = i + 1 if line != e.sourceline: ok = False print(' Expected: {}, got: {}'.format(line, e.sourceline)) if ok: print(' OK') The old bug #1341590 reappeared for etree.XMLParser(remove_blank_text=True). When loading an XML document with a large number lines, the sourceline is correct for early lines, later stays constants (and wrongly) on 65535, and is again correct for even later lines. Tested with libxml2 version 2.9.4 and python-lxml version 4.1.0 in Python 2.7.14. To add to this, I just found an even more strange case where for remove_blank_text=False an off-by-one error in sourceline appears. Example program: ---------------------------------------------------------------------- from lxml.etree import XMLParser, fromstring for remove_blank_text in [True, False]:     print('remove_blank_text={!r}'.format(remove_blank_text))     lines = 65540     xmldata = '<a>' + ('<b/>\n' * lines) + '</a>'     tree = fromstring(xmldata, MLParser(remove_blank_text=remove_blank_text))     ok = True     for i, e in enumerate(tree.iterfind('b')):         line = i + 1         if line != e.sourceline:             ok = False             print(' Expected: {}, got: {}'.format(line, e.sourceline))     if ok:         print(' OK') ---------------------------------------------------------------------- Output: ---------------------------------------------------------------------- remove_blank_text=True Expected: 65536, got: 65535 Expected: 65537, got: 65535 Expected: 65538, got: 65535 Expected: 65539, got: 65535 Expected: 65540, got: 65535 remove_blank_text=False Expected: 65535, got: 65536 Expected: 65536, got: 65537 Expected: 65537, got: 65538 Expected: 65538, got: 65539 Expected: 65539, got: 65540 Expected: 65540, got: 65541 ----------------------------------------------------------------------
2018-01-09 12:19:44 Volker Diels-Grabsch bug task added libxml2
2018-01-09 12:20:42 Volker Diels-Grabsch attachment added Test program to reproduce the issue https://bugs.launchpad.net/libxml2/+bug/1742121/+attachment/5033423/+files/test_1742121.py
2018-01-10 20:20:26 scoder lxml: status New Invalid