lxml

iterwalk tag support enhancements

Bug #1748309 reported by david greisen on 2018-02-08

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	lxml	Won't Fix	Undecided	Unassigned

Bug Description

1) It would be nice if we could specify multiple tags with *tags similar to iterdescendants (http://lxml.de/api/lxml.etree._Element-class.html#iterdescendants).

2) It would be great if we could have a function called something like iterwalker.set_tags(*tags), that we could call during iteration to change the tags that will be returned by the iterator

>>> root = etree.XML('''
... <root>
... <a> <b /> </a>
... <c> <d /> <e /> </c>
... <e />
... </root>
... ''')

>>> context = etree.iterwalk(root, events=("start", "end"))

>>> for action, elem in context:
... print("%s: %s" % (action, elem.tag))
... if action == 'start' and elem.tag == 'a':
... context.skip_subtree() # ignore <b>
... if elem.tag == 'c':
... if action == 'start':
... context.set_tags('d')
... else:
... context.set_tags()

start: root
start: a
end: a
start: c
start: d
end: d
end: c
start: e
end: e
end: root

Tags:

Revision history for this message

scoder (scoder) wrote on 2018-02-09:

1) You can pass a sequence of tags as tag=('a', 'b', 'c'). That may not be the ideal API, but it stems from the original one-tag interface and is actually available consistently across all iteration functions.

2) In many cases, creating a new iterator (with .iter() etc.) should be enough, unless you really need start-end iteration. Creating a new iterwalker also isn't all that expensive. Personally, I think that it leads to better code to process subtrees in a new loop, rather than switching between iteration contexts globally.

Also, it seems that you want more something like a stack interface, which allows going back to the previously configured tags. That also suggests a recursive approach with nested (new) iterators.

Changed in lxml:
status:	New → Triaged

Revision history for this message

david greisen (dgreisen) wrote on 2018-03-14:

Thanks for the info. It is not clear from documentation that you can pass a sequence to the tag attribute. Could that be clarified?

The recursive approach is working well for us. no reason to keep this open on our end.

Revision history for this message

scoder (scoder) wrote on 2018-03-16:

The docstring of iterwalk() was updated in 4.2, and I'll also update some others for clarity.
Closing as "won't fix" as the proposed changes will not be implemented.

Changed in lxml:
status:	Triaged → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.