pickle format memory leaks

Bug #566902 reported by Ximin Luo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
igraph
Confirmed
Low
Tamás Nepusz

Bug Description

the pickle format (or at least the reading) leaks memory - pickle_loop(large_graph, True) in the attached test demonstrates this.

I don't know if this is igraph's fault, or pickle's fault, but I also found that:
- pickle with a simpler data structure doesn't memory leak - pickle_loop(large_range, True) in the attached test
- igraph using (eg) write_graphml, doesn't memory leak either. - write_loop(large_graph, True) in the attached test

This is probably low priority, but I thought I'd report this bug anyway, since it's hopefully useful to people that need to un/serialise lots of graphs, that come up against this memory leak issue. If you don't need pickle's ability to eg. serialise almost any python object as an attribute in the graph, then graphml+zlib performs better than pickle (both in terms of speed, and size of serialised format), and doesn't memory leak either.

Revision history for this message
Ximin Luo (infinity0) wrote :
Revision history for this message
Tamás Nepusz (ntamas) wrote :

Some more details:

- the leak happens with smaller graphs as well (if they have at least a single edge)
- the leak does not happen if I create the graph with the same call that unpickling would have used. Since unpickling just calls the appropriate __init__ method, this is weird.

Will investigate further when time allows.

Changed in igraph:
importance: Undecided → Low
assignee: nobody → Tamás Nepusz (ntamas)
status: New → Confirmed
Revision history for this message
Ximin Luo (infinity0) wrote :

if i examine the heap with objgraph it says there are a lot a tuple objects. however if i pick a bunch of them at random and get their backrefs, it claims that all of them are dangling (int, int) tuples (representing edges?), which doesn't make sense since those should be getting garbage collected. maybe it's more of a problem with pickle?

Revision history for this message
Tamás Nepusz (ntamas) wrote :

Yes, these tuples definitely represent edges. When a Graph object is pickled, Python calls the __reduce__ method of the Graph object, which returns the constructor method and the arguments that should be passed to the constructor in order to reproduce the Graph during unpickling. So, during the pickling process, a huge list of tuples are created that represent the edge list. When unpickling, it is the argument list and the name of the Graph constructor that is actually unpickled, and the constructor is then simply called with the appropriate arguments.

One thing I noticed yesterday is that if I explicitly force the Python GC to collect garbage after every unpickling (by calling gc.collect()), the memory usage grows slower for small graphs, but it still grows nevertheless. At the same time, the number of uncollectable objects as returned by gc.collect() stays zero during the whole process.

Revision history for this message
Gábor Csárdi (gabor.csardi) wrote : Continue on github

The development of igraph has moved to github, so please do not comment on this bug here. You are of course welcome to comment on github, here:
https://github.com/igraph/igraph/issues/271

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.