Minor problem appending new element

Bug #1861766 reported by Frank Millman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Triaged
Undecided
Unassigned

Bug Description

Python : sys.version_info(major=3, minor=7, micro=2, releaselevel='final', serial=0)
lxml.etree : (4, 3, 2, 0)
libxml used : (2, 9, 5)
libxml compiled : (2, 9, 5)
libxslt used : (1, 1, 30)
libxslt compiled : (1, 1, 30)

In Python I can iterate through a list, and on a certain condition append a new item to the list, which is then included in the iteration.

>>> x = ['a', 'b', 'c']
>>> for y in x:
... print(y)
... if y == 'b':
... x.append('d')
...
a
b
c
d
>>> x
['a', 'b', 'c', 'd']
>>>

The same thing works in lxml -

>>> lmx = '<x><y z="a"/><y z="b"/><y z="c"/></x>'
>>> xml = etree.fromstring(lmx)
>>> for y in xml:
... print(etree.tostring(y))
... if y.get('z') == 'b':
... xml.append(etree.Element('y', attrib={'z': 'd'}))
...
b'<y z="a"/>'
b'<y z="b"/>'
b'<y z="c"/>'
b'<y z="d"/>'
>>> etree.tostring(xml)
b'<x><y z="a"/><y z="b"/><y z="c"/><y z="d"/></x>'

However, if it happens that the condition is met on the last item in the list, Python still works, but lxml does not include the appended item in the iteration. In the following, the only change is checking for 'c' instead of 'b'.

>>> x = ['a', 'b', 'c']
>>> for y in x:
... print(y)
... if y == 'c':
... x.append('d')
...
a
b
c
d
>>> x
['a', 'b', 'c', 'd']
>>>

>>> lmx = '<x><y z="a"/><y z="b"/><y z="c"/></x>'
>>> xml = etree.fromstring(lmx)
>>> for y in xml:
... print(etree.tostring(y))
... if y.get('z') == 'c':
... xml.append(etree.Element('y', attrib={'z': 'd'}))
...
b'<y z="a"/>'
b'<y z="b"/>'
b'<y z="c"/>'
>>> etree.tostring(xml)
b'<x><y z="a"/><y z="b"/><y z="c"/><y z="d"/></x>'

As you can see, the last element is correctly appended, but is not included in the iteration.

BTW, I see that ElementTree in the standard library does not have this problem.

Revision history for this message
scoder (scoder) wrote :

ET in the stdlib is actually backed by a Python list of children, lxml uses a C level tree structure.

The iterators in lxml tend to look ahead one item in order to allow replacements of the last returned element in the tree to work. I think you can only have one of the two.

Changed in lxml:
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.