incorrect sourceline for long xmls
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Python : sys.version_
lxml.etree : (3, 6, 0, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 3)
libxslt used : (1, 1, 29)
libxslt compiled : (1, 1, 29)
windows 10 x64, python x64, lxml x64
For very long xmls, the returned sourceline is wrong.
I have build a custom xml to show my problem (which is attached to this bug):
<?xml version="1.0" encoding="UTF-8"?>
<root>
<child>
<grandchild>
<grandchild>
<grandchild>
<grandchild>
<grandchild>
<grandchild>
<grandchild>
<grandchild>
<grandchild>
<grandchild>
</child>
...
<child>
<grandchild>
<grandchild>
<grandchild>
<grandchild>
<grandchild>
<grandchild>
<grandchild>
<grandchild>
<grandchild>
<grandchild>
</child>
</root>
I have 32768 child nodes. Starting with 5461st child the returned sourceline is wrong by at least 1 line:
The 5461st node is at line 65535, but sourceline returns 65536.
The following code:
for grandchild in children[
print(
prints
65536 65536
65536 65537
65536 65538
65536 65539
65536 65540
65536 65541
65536 65542
65536 65543
65536 65544
65536 65545
for a higher level of nesting elements the difference between the real sourceline and the returned sourceline grows.
In this case the following code:
for grandchild in children[
print(
prints
393208 393208
393208 393209
393208 393210
393208 393211
393208 393212
393208 393213
393208 393214
393208 393215
393208 393216
393208 393217
so the difference is still 1.
description: | updated |
I created a second xml which looks like this:
<root> <child2> <child3> </child3> </child2> </child1> </child4> <child5> </child5> <child6> </child6> <child2> <child3> </child3> </child2> </child1> </child4> <child5> </child5> <child6> </child6>
<level1>
<child1>
<child4>
</level1>
...
<level1>
<child1>
<child4>
</level1>
</root>
childs = root.xpath( './/child1' )
Starting with the 16384th child1 element it won't return the right sourceline:
>>> c1[16382] .sourceline .sourceline .sourceline .sourceline .sourceline
65531
>>> c1[16383]
65535
>>> c1[16384]
65535
>>> c1[16385]
65535
>>> c1[16386]
65535
>>> c1[-1].sourceline
65535