Memory leaks updating Element.attrib dictionary
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Fix Released
|
Medium
|
scoder |
Bug Description
My code needs to parse, modify then serialise medium size xml files (10-700MB). I began using cElementTree but found that lxml serialises much faster. However, I'm getting what looks like a memory leak from lxm which does not occur with cElementTree. I can't easily reproduce the bug using a smaller string input xml snippet, but it is very consistent with the same input file that does not leak memory with cElementTree.
I'm using lxml 2.2.2 and libxml 2.6.32 on ubuntu
filename = "blah.xml" #600MB
#running this script from the shell grows the process memory by about 5MB
for i in xrange(100):
elTree = ElementTree(
el = elTree.
for k,v in new_values.
el.set(k,v)
#This loop irreversibly increases process memory by 365MB
new_values = dict(section=
for i in xrange(100):
elTree = ElementTree(
el = elTree.
el.
description: | updated |
security vulnerability: | yes → no |
visibility: | private → public |
With "leak", do you mean it isn't given back to the system? How do you measure the memory usage? Note that the Python interpreter does not necessarily free memory that it has allocated immediately when it is no longer used, so the size of the interpreter process is not necessarily a good measure.
Also note that using "element.attrib" creates cross-referenced objects that need garbage collection. You may want to run "gc.collect()" within or after the last loop to see if the memory is really permanently "leaking" or just temporarily allocated.