attrib.pop for comments/PI broken

Bug #1281139 reported by Kovid Goyal
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Low
scoder

Bug Description

attrib.pop('xxx', None) raises AttributeError: 'dictproxy' object has no attribute 'pop' for comments/PIs/ This used to work in older lxml versions, and it breaks commonly used code like

[x.attrib.pop('style', None) for x in root.iter()]

to remove an attrib from all elements.

I strongly suggest implementing a dummy pop method for comments/PIs or updating lxml will break a *lot* of code.

Test case:

python -c 'from lxml.etree import fromstring; r = fromstring("<r><!--x--></r>"); [x.attrib.pop("xxx", None) for x in r.iter()]'

Version info:
>>> import sys
>>> from lxml import etree
>>>
>>> print("%-20s: %s" % ('Python', sys.version_info))
Python : sys.version_info(major=2, minor=7, micro=6, releaselevel='final', serial=0)
>>> print("%-20s: %s" % ('lxml.etree', etree.LXML_VERSION))
lxml.etree : (3, 3, 0, 0)
>>> print("%-20s: %s" % ('libxml used', etree.LIBXML_VERSION))
libxml used : (2, 9, 1)
>>> print("%-20s: %s" % ('libxml compiled', etree.LIBXML_COMPILED_VERSION))
libxml compiled : (2, 9, 1)
>>> print("%-20s: %s" % ('libxslt used', etree.LIBXSLT_VERSION))
libxslt used : (1, 1, 28)
>>> print("%-20s: %s" % ('libxslt compiled', etree.LIBXSLT_COMPILED_VERSION))
libxslt compiled : (1, 1, 28)

Revision history for this message
scoder (scoder) wrote :

> ... or updating lxml will break a *lot* of code.

Well, it's certainly the first time I see this use case, so I may raise my doubts that it's all that common. But yes, it's a regression, so it should be fixed.

Changed in lxml:
importance: Undecided → Low
milestone: none → 3.3
status: New → Confirmed
Revision history for this message
Kovid Goyal (kovid) wrote :

I use it all the time in calibre, and calibre has been installed ~ 20 million times.

Revision history for this message
scoder (scoder) wrote :

any reason you're not using .iter('*') to iterate only over Elements ?

Revision history for this message
Kovid Goyal (kovid) wrote :

No particular reason, I sometimes use iter(etree.Element) and sometimes use iter(). The latter is when I am lazy/forgetful, or when the code is not written by me.

The point is that using iter() is the default usage. Most people wil forget about comments/PIs when writing code. Therefore, breaking it is likely to break a lot of code.

Revision history for this message
scoder (scoder) wrote :

The reason why I made the "attrib" value of PIs/Comments read-only is that I consider it better to raise an error on modifications when the underlying object doesn't support them, than to silently accept these modifications and just drop them when the mapping object gets discarded. So this change was really intentional.

I find it reasonable for Comment/PI to have an immutable mapping as their "attrib" value. The fact that "attrib.pop(x, default)" doesn't actually modify the (empty) mapping is rather a coincidence. As you can see from the Python docs, .pop() is not an API method of immutable mappings but only of mutable ones:

http://docs.python.org/3/library/collections.abc.html#module-collections.abc

That being said, I already agreed that this is an unfortunate regression. It breaks code, so there should at least have been a deprecation phase. But the mere fact that there *is* an "attrib" proprety on Comments at all shows that the intention is to keep their interface compatible with that of Elements.

I also take your point that it's easy to forget about non-Elements in the tree. ElementTree's parser even discards them by default, and I might convince myself at some point that that's the better default behaviour, because it requires users to be explicit about non-Elements in the tree. I guess that behaviour won't change in lxml, though.

So, I think the only reasonable way to move on with this is to implement a new mapping type in lxml that has the complete interface of MutableMapping but raises TypeError on modifications. Shouldn't be too hard, given that DictMixin and MutableMapping are just a conditional import away.

Revision history for this message
scoder (scoder) wrote :

Actually, this is only a problem with Comments, not PIs (which have a sort of pseudo attribute emulation).

Revision history for this message
scoder (scoder) wrote :
Changed in lxml:
assignee: nobody → scoder (scoder)
status: Confirmed → Fix Committed
Revision history for this message
Kovid Goyal (kovid) wrote :

Thanks for the fix :)

Revision history for this message
scoder (scoder) wrote :

Fixed in lxml 3.3.2.

Changed in lxml:
status: Fix Committed → Fix Released
Revision history for this message
Attila Lendvai (attila-lendvai) wrote :

FTR, this bit me also while using ebook-convert. not urgent, so just waiting for the fix to reach debian...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.