iterwalk tag support enhancements

Bug #1748309 reported by david greisen on 2018-02-08
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Undecided
Unassigned

Bug Description

1) It would be nice if we could specify multiple tags with *tags similar to iterdescendants (http://lxml.de/api/lxml.etree._Element-class.html#iterdescendants).

2) It would be great if we could have a function called something like iterwalker.set_tags(*tags), that we could call during iteration to change the tags that will be returned by the iterator

>>> root = etree.XML('''
... <root>
... <a> <b /> </a>
... <c> <d /> <e /> </c>
... <e />
... </root>
... ''')

>>> context = etree.iterwalk(root, events=("start", "end"))

>>> for action, elem in context:
... print("%s: %s" % (action, elem.tag))
... if action == 'start' and elem.tag == 'a':
... context.skip_subtree() # ignore <b>
... if elem.tag == 'c':
... if action == 'start':
... context.set_tags('d')
... else:
... context.set_tags()

start: root
start: a
end: a
start: c
start: d
end: d
end: c
start: e
end: e
end: root

scoder (scoder) wrote :

1) You can pass a sequence of tags as tag=('a', 'b', 'c'). That may not be the ideal API, but it stems from the original one-tag interface and is actually available consistently across all iteration functions.

2) In many cases, creating a new iterator (with .iter() etc.) should be enough, unless you really need start-end iteration. Creating a new iterwalker also isn't all that expensive. Personally, I think that it leads to better code to process subtrees in a new loop, rather than switching between iteration contexts globally.

Also, it seems that you want more something like a stack interface, which allows going back to the previously configured tags. That also suggests a recursive approach with nested (new) iterators.

Changed in lxml:
status: New → Triaged
david greisen (dgreisen) wrote :

Thanks for the info. It is not clear from documentation that you can pass a sequence to the tag attribute. Could that be clarified?

The recursive approach is working well for us. no reason to keep this open on our end.

scoder (scoder) wrote :

The docstring of iterwalk() was updated in 4.2, and I'll also update some others for clarity.
Closing as "won't fix" as the proposed changes will not be implemented.

Changed in lxml:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers