PythonElementClassLookup cannot access children with XMLPullParser

Bug #1952560 reported by lovetox
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Confirmed
Low
Unassigned

Bug Description

Hi,

the docs mention that elements inside the ClassLookup support the getchildren() call.

This seems not true for the XMLPullParser which makes this sort of ClassLookup not very useful with this parser.

Sample Code

```python
from lxml import etree
from lxml.etree import ElementBase

class MyElementClass(ElementBase):
    pass

class MyLookup(etree.PythonElementClassLookup):
    def lookup(self, document, element):
        print(element.tag, element.getchildren())
        return MyElementClass # defined elsewhere

parser = etree.XMLPullParser(events=['start', 'end'])
parser.set_element_class_lookup(MyLookup())

xml = '<stream><iq><jid>asd</jid></iq></stream>'

parser.feed(xml)
```

Python : sys.version_info(major=3, minor=9, micro=7, releaselevel='final', serial=0)
lxml.etree : (4, 6, 3, 0)
libxml used : (2, 9, 12)
libxml compiled : (2, 9, 10)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)

scoder (scoder)
summary: - PythonElementClassLookup does not work with XMLPullParser
+ PythonElementClassLookup cannot access children with XMLPullParser
Revision history for this message
scoder (scoder) wrote :

Hmm. The elements are looked up on the "start" event, where they don't have children yet. That happens regardless of whether you ask for the event or not (events=['end'] has the same issue).

The reason is that incremental parsing needs to keep the elements (safely) alive until they are processed by the user, so it creates a Python instance for them (using your element lookup) to assure correct cleanup also in the case of extraction or deletion. That instance then stays alive until the subtree is processed, including the "end" event. It does not get recreated at the "end" event, where the children would be available.

It's definitely a current limitation. But it wouldn't be easy to change, so it's likely to stay that way.

I'll leave the ticket open in case someone wants to give it a try, but don't expect that to happen by itself.

Changed in lxml:
importance: Undecided → Low
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.