misleading doc for _ElementTree.iter
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Fix Released
|
Low
|
scoder |
Bug Description
The API reference tells us that _ElementTree.iter "loops over all elements in this tree".
However it ignores comments and processing instructions outside the root element.
Example:
>>> import lxml.etree
>>> test = '''<!-- a comment --><root><!-- another comment --></root><!-- third comment -->'''
>>>
>>> root = lxml.etree.
>>> tree = root.getroottree()
>>> print(type(tree))
<type 'lxml.etree.
>>> for e in tree.iter():
... print(e)
...
<Element root at 0x102b3fa70>
<!-- another comment -->
So tree.iterator() ignores the first and last comments.
Well, are the comments really "in this tree"? There seems to be no formal definition of tree in XML standard, so I assume the tree refers to the _ElementTree object, which contains the comments:
>>> tree.getroot(
<!-- a comment -->
>>> etree.tostring(
'<!-- a comment --><root><!-- another comment --></root><!-- third comment -->'
I propose changing the document along the following lines:
"Create an iterator for the root element. The iterator loops over all elements of the tree, excluding comments and processing
instructions outside the root. It does so in document order."
Or, alternatively you could change the implementation so that it iterates over all the elements.
Version info of my system:
Python : sys.version_
lxml.etree : (3, 3, 5, 0)
libxml used : (2, 9, 0)
libxml compiled : (2, 9, 1)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)
Changed in lxml: | |
milestone: | none → 3.4 |
It actually says
"""
Creates an iterator for the root element. The iterator loops over
all elements in this tree, in document order.
"""
Note the "root element" bit. I've added a comment to make it clearer:
https:/ /github. com/lxml/ lxml/commit/ 44ec7b535d1a342 b3c4b6070ebd2a4 c3e29595f7