2017-02-20 11:56:49 |
Daniel PUIU |
description |
Python : sys.version_info(major=3, minor=4, micro=4, releaselevel='final', serial=0)
lxml.etree : (3, 6, 0, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 3)
libxslt used : (1, 1, 29)
libxslt compiled : (1, 1, 29)
For very long xmls, the returned sourceline is wrong.
I have build a custom xml to show my problem (which is attached to this bug):
<?xml version="1.0" encoding="UTF-8"?>
<root>
<child>
<grandchild>GC0</grandchild>
<grandchild>GC1</grandchild>
<grandchild>GC2</grandchild>
<grandchild>GC3</grandchild>
<grandchild>GC4</grandchild>
<grandchild>GC5</grandchild>
<grandchild>GC6</grandchild>
<grandchild>GC7</grandchild>
<grandchild>GC8</grandchild>
<grandchild>GC9</grandchild>
</child>
...
<child>
<grandchild>GC0</grandchild>
<grandchild>GC1</grandchild>
<grandchild>GC2</grandchild>
<grandchild>GC3</grandchild>
<grandchild>GC4</grandchild>
<grandchild>GC5</grandchild>
<grandchild>GC6</grandchild>
<grandchild>GC7</grandchild>
<grandchild>GC8</grandchild>
<grandchild>GC9</grandchild>
</child>
</root>
I have 32768 child nodes. Starting with 5461st child the returned sourceline is wrong by at least 1 line:
The 5461st node is at line 65535, but sourceline returns 65536.
The following code:
for grandchild in children[5461].getchildren():
print(grandchild.getparent().sourceline, grandchild.sourceline)
prints
65536 65536
65536 65537
65536 65538
65536 65539
65536 65540
65536 65541
65536 65542
65536 65543
65536 65544
65536 65545
for a higher level of nesting elements the difference between the real sourceline and the returned sourceline grows.
In this case the following code:
for grandchild in children[-1].getchildren():
print(grandchild.getparent().sourceline, grandchild.sourceline)
prints
393208 393208
393208 393209
393208 393210
393208 393211
393208 393212
393208 393213
393208 393214
393208 393215
393208 393216
393208 393217
so the difference is still 1. |
Python : sys.version_info(major=3, minor=4, micro=4, releaselevel='final', serial=0)
lxml.etree : (3, 6, 0, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 3)
libxslt used : (1, 1, 29)
libxslt compiled : (1, 1, 29)
windows 10 x64, python x64, lxml x64
For very long xmls, the returned sourceline is wrong.
I have build a custom xml to show my problem (which is attached to this bug):
<?xml version="1.0" encoding="UTF-8"?>
<root>
<child>
<grandchild>GC0</grandchild>
<grandchild>GC1</grandchild>
<grandchild>GC2</grandchild>
<grandchild>GC3</grandchild>
<grandchild>GC4</grandchild>
<grandchild>GC5</grandchild>
<grandchild>GC6</grandchild>
<grandchild>GC7</grandchild>
<grandchild>GC8</grandchild>
<grandchild>GC9</grandchild>
</child>
...
<child>
<grandchild>GC0</grandchild>
<grandchild>GC1</grandchild>
<grandchild>GC2</grandchild>
<grandchild>GC3</grandchild>
<grandchild>GC4</grandchild>
<grandchild>GC5</grandchild>
<grandchild>GC6</grandchild>
<grandchild>GC7</grandchild>
<grandchild>GC8</grandchild>
<grandchild>GC9</grandchild>
</child>
</root>
I have 32768 child nodes. Starting with 5461st child the returned sourceline is wrong by at least 1 line:
The 5461st node is at line 65535, but sourceline returns 65536.
The following code:
for grandchild in children[5461].getchildren():
print(grandchild.getparent().sourceline, grandchild.sourceline)
prints
65536 65536
65536 65537
65536 65538
65536 65539
65536 65540
65536 65541
65536 65542
65536 65543
65536 65544
65536 65545
for a higher level of nesting elements the difference between the real sourceline and the returned sourceline grows.
In this case the following code:
for grandchild in children[-1].getchildren():
print(grandchild.getparent().sourceline, grandchild.sourceline)
prints
393208 393208
393208 393209
393208 393210
393208 393211
393208 393212
393208 393213
393208 393214
393208 393215
393208 393216
393208 393217
so the difference is still 1. |
|