iterwalk tag support enhancements
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Won't Fix
|
Undecided
|
Unassigned |
Bug Description
1) It would be nice if we could specify multiple tags with *tags similar to iterdescendants (http://
2) It would be great if we could have a function called something like iterwalker.
>>> root = etree.XML('''
... <root>
... <a> <b /> </a>
... <c> <d /> <e /> </c>
... <e />
... </root>
... ''')
>>> context = etree.iterwalk(
>>> for action, elem in context:
... print("%s: %s" % (action, elem.tag))
... if action == 'start' and elem.tag == 'a':
... context.
... if elem.tag == 'c':
... if action == 'start':
... context.
... else:
... context.set_tags()
start: root
start: a
end: a
start: c
start: d
end: d
end: c
start: e
end: e
end: root
1) You can pass a sequence of tags as tag=('a', 'b', 'c'). That may not be the ideal API, but it stems from the original one-tag interface and is actually available consistently across all iteration functions.
2) In many cases, creating a new iterator (with .iter() etc.) should be enough, unless you really need start-end iteration. Creating a new iterwalker also isn't all that expensive. Personally, I think that it leads to better code to process subtrees in a new loop, rather than switching between iteration contexts globally.
Also, it seems that you want more something like a stack interface, which allows going back to the previously configured tags. That also suggests a recursive approach with nested (new) iterators.