PythonElementClassLookup cannot access children with XMLPullParser
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Confirmed
|
Low
|
Unassigned |
Bug Description
Hi,
the docs mention that elements inside the ClassLookup support the getchildren() call.
This seems not true for the XMLPullParser which makes this sort of ClassLookup not very useful with this parser.
Sample Code
```python
from lxml import etree
from lxml.etree import ElementBase
class MyElementClass(
pass
class MyLookup(
def lookup(self, document, element):
return MyElementClass # defined elsewhere
parser = etree.XMLPullPa
parser.
xml = '<stream>
parser.feed(xml)
```
Python : sys.version_
lxml.etree : (4, 6, 3, 0)
libxml used : (2, 9, 12)
libxml compiled : (2, 9, 10)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)
summary: |
- PythonElementClassLookup does not work with XMLPullParser + PythonElementClassLookup cannot access children with XMLPullParser |
Hmm. The elements are looked up on the "start" event, where they don't have children yet. That happens regardless of whether you ask for the event or not (events=['end'] has the same issue).
The reason is that incremental parsing needs to keep the elements (safely) alive until they are processed by the user, so it creates a Python instance for them (using your element lookup) to assure correct cleanup also in the case of extraction or deletion. That instance then stays alive until the subtree is processed, including the "end" event. It does not get recreated at the "end" event, where the children would be available.
It's definitely a current limitation. But it wouldn't be easy to change, so it's likely to stay that way.
I'll leave the ticket open in case someone wants to give it a try, but don't expect that to happen by itself.