lxml

alternative interface for iterative parsing that does not build a complete tree

Bug #1688805 reported by Mantas Zimnickas on 2017-05-06

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	lxml	Triaged	Wishlist	Unassigned

Bug Description

Using iterparse, lxml builds whole tree in memory instead of releasing it after each iteration.

I know, this is not really a memory leak, but rather a feature. But what is the point of having iterative parsing if in the end you still have whole tree in the memory.

There is a documentation [1] explaining how to work around memory consumption, but maybe it would be much better to have an option for iterparse to not build whole tree in memory?

It could be something similar like xmltodict streaming mode [2] where you can specify element depth. All elements with smaller depth are simply ignored. For example `depth=1` would mean, that root element should be completely ignored.

[1] http://lxml.de/parsing.html#modifying-the-tree
[2] https://github.com/martinblech/xmltodict#streaming-mode

I'm writing this bug report, because I would a lot content on the internets where the same issue is addressed over and over again:

https://www.ibm.com/developerworks/xml/library/x-hiperfparse/
http://stackoverflow.com/a/7171543/475477
https://codereview.stackexchange.com/q/2449
http://stackoverflow.com/a/9814580/475477

scoder (scoder) on 2017-05-09

summary:	- iterparse memory leak + alternative interface for iterative parsing that does not build a + complete tree
Changed in lxml:
importance:	Undecided → Wishlist

scoder (scoder) on 2017-11-04

Changed in lxml:
status:	New → Triaged

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.