lxml

PythonElementClassLookup cannot access children with XMLPullParser

Bug #1952560 reported by lovetox on 2021-11-28

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	lxml	Confirmed	Low	Unassigned

Bug Description

Hi,

the docs mention that elements inside the ClassLookup support the getchildren() call.

This seems not true for the XMLPullParser which makes this sort of ClassLookup not very useful with this parser.

Sample Code

```python
from lxml import etree
from lxml.etree import ElementBase

class MyElementClass(ElementBase):
pass

class MyLookup(etree.PythonElementClassLookup):
    def lookup(self, document, element):
        print(element.tag, element.getchildren())
        return MyElementClass # defined elsewhere

parser = etree.XMLPullParser(events=['start', 'end'])
parser.set_element_class_lookup(MyLookup())

xml = '<stream><iq><jid>asd</jid></iq></stream>'

parser.feed(xml)
```

Python : sys.version_info(major=3, minor=9, micro=7, releaselevel='final', serial=0)
lxml.etree : (4, 6, 3, 0)
libxml used : (2, 9, 12)
libxml compiled : (2, 9, 10)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)

scoder (scoder) on 2021-11-29

summary:

- PythonElementClassLookup does not work with XMLPullParser
+ PythonElementClassLookup cannot access children with XMLPullParser

Revision history for this message

scoder (scoder) wrote on 2021-11-29:

Hmm. The elements are looked up on the "start" event, where they don't have children yet. That happens regardless of whether you ask for the event or not (events=['end'] has the same issue).

The reason is that incremental parsing needs to keep the elements (safely) alive until they are processed by the user, so it creates a Python instance for them (using your element lookup) to assure correct cleanup also in the case of extraction or deletion. That instance then stays alive until the subtree is processed, including the "end" event. It does not get recreated at the "end" event, where the children would be available.

It's definitely a current limitation. But it wouldn't be easy to change, so it's likely to stay that way.

I'll leave the ticket open in case someone wants to give it a try, but don't expect that to happen by itself.

Changed in lxml:
importance:	Undecided → Low
status:	New → Confirmed

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.