alternative interface for iterative parsing that does not build a complete tree
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Triaged
|
Wishlist
|
Unassigned |
Bug Description
Using iterparse, lxml builds whole tree in memory instead of releasing it after each iteration.
I know, this is not really a memory leak, but rather a feature. But what is the point of having iterative parsing if in the end you still have whole tree in the memory.
There is a documentation [1] explaining how to work around memory consumption, but maybe it would be much better to have an option for iterparse to not build whole tree in memory?
It could be something similar like xmltodict streaming mode [2] where you can specify element depth. All elements with smaller depth are simply ignored. For example `depth=1` would mean, that root element should be completely ignored.
[1] http://
[2] https:/
I'm writing this bug report, because I would a lot content on the internets where the same issue is addressed over and over again:
https:/
http://
https:/
http://
summary: |
- iterparse memory leak + alternative interface for iterative parsing that does not build a + complete tree |
Changed in lxml: | |
importance: | Undecided → Wishlist |
Changed in lxml: | |
status: | New → Triaged |