iterwalk tag support enhancements

Bug #1748309 reported by david greisen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Won't Fix
Undecided
Unassigned

Bug Description

1) It would be nice if we could specify multiple tags with *tags similar to iterdescendants (http://lxml.de/api/lxml.etree._Element-class.html#iterdescendants).

2) It would be great if we could have a function called something like iterwalker.set_tags(*tags), that we could call during iteration to change the tags that will be returned by the iterator

>>> root = etree.XML('''
... <root>
... <a> <b /> </a>
... <c> <d /> <e /> </c>
... <e />
... </root>
... ''')

>>> context = etree.iterwalk(root, events=("start", "end"))

>>> for action, elem in context:
... print("%s: %s" % (action, elem.tag))
... if action == 'start' and elem.tag == 'a':
... context.skip_subtree() # ignore <b>
... if elem.tag == 'c':
... if action == 'start':
... context.set_tags('d')
... else:
... context.set_tags()

start: root
start: a
end: a
start: c
start: d
end: d
end: c
start: e
end: e
end: root

Tags: enhancement
Revision history for this message
scoder (scoder) wrote :

1) You can pass a sequence of tags as tag=('a', 'b', 'c'). That may not be the ideal API, but it stems from the original one-tag interface and is actually available consistently across all iteration functions.

2) In many cases, creating a new iterator (with .iter() etc.) should be enough, unless you really need start-end iteration. Creating a new iterwalker also isn't all that expensive. Personally, I think that it leads to better code to process subtrees in a new loop, rather than switching between iteration contexts globally.

Also, it seems that you want more something like a stack interface, which allows going back to the previously configured tags. That also suggests a recursive approach with nested (new) iterators.

Changed in lxml:
status: New → Triaged
Revision history for this message
david greisen (dgreisen) wrote :

Thanks for the info. It is not clear from documentation that you can pass a sequence to the tag attribute. Could that be clarified?

The recursive approach is working well for us. no reason to keep this open on our end.

Revision history for this message
scoder (scoder) wrote :

The docstring of iterwalk() was updated in 4.2, and I'll also update some others for clarity.
Closing as "won't fix" as the proposed changes will not be implemented.

Changed in lxml:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.