misleading doc for _ElementTree.iter

Bug #1342469 reported by Olli Pottonen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Low
scoder

Bug Description

The API reference tells us that _ElementTree.iter "loops over all elements in this tree".
However it ignores comments and processing instructions outside the root element.

Example:
>>> import lxml.etree
>>> test = '''<!-- a comment --><root><!-- another comment --></root><!-- third comment -->'''
>>>
>>> root = lxml.etree.fromstring(test)
>>> tree = root.getroottree()
>>> print(type(tree))
<type 'lxml.etree._ElementTree'>
>>> for e in tree.iter():
... print(e)
...
<Element root at 0x102b3fa70>
<!-- another comment -->

So tree.iterator() ignores the first and last comments.

Well, are the comments really "in this tree"? There seems to be no formal definition of tree in XML standard, so I assume the tree refers to the _ElementTree object, which contains the comments:
>>> tree.getroot().getprevious()
<!-- a comment -->
>>> etree.tostring(tree)
'<!-- a comment --><root><!-- another comment --></root><!-- third comment -->'

I propose changing the document along the following lines:
"Create an iterator for the root element. The iterator loops over all elements of the tree, excluding comments and processing
instructions outside the root. It does so in document order."

Or, alternatively you could change the implementation so that it iterates over all the elements.

Version info of my system:
Python : sys.version_info(major=2, minor=7, micro=7, releaselevel='final', serial=0)
lxml.etree : (3, 3, 5, 0)
libxml used : (2, 9, 0)
libxml compiled : (2, 9, 1)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)

Revision history for this message
scoder (scoder) wrote :

It actually says

"""
        Creates an iterator for the root element. The iterator loops over
        all elements in this tree, in document order.
"""

Note the "root element" bit. I've added a comment to make it clearer:

https://github.com/lxml/lxml/commit/44ec7b535d1a342b3c4b6070ebd2a4c3e29595f7

Changed in lxml:
importance: Undecided → Low
status: New → Fix Committed
scoder (scoder)
Changed in lxml:
milestone: none → 3.4
Revision history for this message
scoder (scoder) wrote :

Fix released in lxml 3.4.2.

Changed in lxml:
assignee: nobody → scoder (scoder)
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers