alternative interface for iterative parsing that does not build a complete tree

Bug #1688805 reported by Mantas Zimnickas
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
lxml
Triaged
Wishlist
Unassigned

Bug Description

Using iterparse, lxml builds whole tree in memory instead of releasing it after each iteration.

I know, this is not really a memory leak, but rather a feature. But what is the point of having iterative parsing if in the end you still have whole tree in the memory.

There is a documentation [1] explaining how to work around memory consumption, but maybe it would be much better to have an option for iterparse to not build whole tree in memory?

It could be something similar like xmltodict streaming mode [2] where you can specify element depth. All elements with smaller depth are simply ignored. For example `depth=1` would mean, that root element should be completely ignored.

[1] http://lxml.de/parsing.html#modifying-the-tree
[2] https://github.com/martinblech/xmltodict#streaming-mode

I'm writing this bug report, because I would a lot content on the internets where the same issue is addressed over and over again:

https://www.ibm.com/developerworks/xml/library/x-hiperfparse/
http://stackoverflow.com/a/7171543/475477
https://codereview.stackexchange.com/q/2449
http://stackoverflow.com/a/9814580/475477

scoder (scoder)
summary: - iterparse memory leak
+ alternative interface for iterative parsing that does not build a
+ complete tree
Changed in lxml:
importance: Undecided → Wishlist
scoder (scoder)
Changed in lxml:
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.