Inconsistent hash output for _Comment and _Element

Bug #1690134 reported by Cindy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Opinion
Wishlist
Unassigned

Bug Description

__hash__ output is inconsistent causing my dictionary using the _Comment as the key to be unusable. The same thing was happening with the _Element __hash__ but I am unable to reproduce it at the moment.

test.xml:
<Test>
    <!--
        A comment.
    -->
</Test>

OS: Windows 7, IDLE - Python 2.7.12 Shell output:
>>> from lxml import etree
>>> parser = etree.XMLParser(strip_cdata=False, recover=True, encoding='UTF-8')
>>> new_xml = etree.parse('test.xml', parser)
>>> c = new_xml.getroot()
>>> lxml.etree._Comment.__hash__(c[0])
-2144458933
>>> lxml.etree._Comment.__hash__(c[0])
3024733
>>> lxml.etree._Comment.__hash__(c[0])
3024738
>>> c[0]
<!--\n A comment.\n -->

Revision history for this message
scoder (scoder) wrote :

This works as (currently) designed. Elements (and Comments) don't implement their own hash function, which means that they hash by their object ID. These objects get created on the fly, which shows in your example. You are simply hashing three different objects.

I would consider a pull request, though, that implements a hash function based on the underlying libxml2 node pointer (i.e. the "_c_node" attribute).

Changed in lxml:
importance: Undecided → Wishlist
status: New → Opinion
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.