Possible memory leak on Dynamic Range Filter

Bug #784606 reported by André Panisson on 2011-05-18
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gephi
Fix Released
High
Mathieu Bastian

Bug Description

This problem appears when applying many times the Dynamic Range Filter in a large dynamic graph.
I'm using the Toolkit built from the 0.8 release sources, but this leak could affect the desktop application when using the Dynamic Range Filter.

Attached, a zip containing a gexf file with a large dynamic graph, and a java file that applies a dynamic range filter many times in succession, in order to take snapshots of the graph at different ranges.
The application starts with low memory usage after loading the graph, but when the filter is applied some times, the memory usage starts to increase.

It seems that the EventManager thread does not bear the amount of events generated by the filtering (the rate of produced events is much larger than the rate of consumed events).

André Panisson (panisson) wrote :

I have some ideas on how to face this problem.
First off, I would like to know if there's any possibility of creating new views without creating new node instances, but reusing the existing ones. This could reduce the amount of memory usage.

The possible solutions I have in mind are:

1. Choosing different strategies when applying a filter:
Actually, it seems that the strategy used when applying a filter is by duplicating the entire view and filter out the graph objects that do not pass the filter one by one. In a situation where I have a graph with thousands of nodes and the resulting filtered view has few nodes, the best strategy is by creating an empty view and adding the nodes that pass the filter. That would be faster and produce less events.

2. Fortunately, when Mathiew implemented the EventManager, he implemented it in such a way that the events could be grouped by type before being consumed. The problem is that when applying a filter, nodes are removed one by one from the view, and for each REMOVE_NODES event fired, the corresponding REMOVE_EDGES events are also fired, and there's no possibility of grouping the alternating event types in this way. A better strategy would be firing all REMOVE_EDGES events before firing the REMOVE_NODES events, in order to give to the EventManager the possibility to group them and creating only two event groups when removing a list of nodes.

Nodes are uniquely attached to views. The reason for that is that nodes have references to the edges they are linked to, as well as their position, in case of a hierarchical graphs. Edges are hierarchy can be different from one view to anther, so it's not possible to reuse instances. However, as you may know NodeData are common between views.

1.Yes, I agree. We could somehow count the number of objects passing the filter and add/remove according to what the most efficient.

2. Yes I didn't realize this when I designed this compression. I noticed the problem before, but I don't want to complexity the compression algorithm, as it could result in inconsistence in events finally sent, but can be discussed. Your suggestion is right, we could make small modification in the filters code to delete all edges before nodes. Currently, we remove nodes, which internally in DHNS removes edges.

Thanks André, this system has rooms for improvements and happy for your help

Changed in gephi:
status: New → Confirmed
milestone: none → 0.8alpha
assignee: nobody → Mathieu Bastian (mathieu.bastian)
importance: Undecided → High
Changed in gephi:
status: Confirmed → Fix Committed

Fix in rev 2336.

I made some significant changes:

1. I merged the ADD_NODES and ADD_EDGES event, as well as the REMOVE_NODES and REMOVE_EDGES. Therefore, a ADD_NODES_AND_EDGES can contains both nodes (accessible from addedNodes()) and edges (accessible from addedEdges()). Though this change is breaking the API compatibility, I think it make sense and should be an easy change for API users. That modification allows to dramatically reduce the number of events. When receiving a ADD_NODES_AND_EDGES event, API user should test if the addedNodes() or addedEdges() array are null before proceed.

2. I revised the EventQueueManager class and improved the way events are queued and dispatched. The manager now properly tune it's own event rate so the number of events in the queue should be stable. In other words if the queue becomes very loaded, the manager will speed-up and fires more events until the queue is empty again. I did some profiling and the new strategy seems to keep memory usage stable and use less CPU.

André would you mind test with your example?

André Panisson (panisson) wrote :

Hello, I tested the new implementation with my examples, and there was a big performance improvement. I also changed my plugin according to the API changes for the version 0.8 beta, and it was easy to adapt. Good work Mathieu, I consider this issue as closed.

Changed in gephi:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers