Misleading average edge weights in dynamic networks

Bug #695371 reported by Axel Bruns
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gephi
Confirmed
Medium
Sébastien Heymann

Bug Description

Hi,

not sure if this is a bug or a feature, but it's very confusing.

I'm experimenting with dynamic network visualisations where the weight of edges is different during different time intervals. I'm defining these in GEXF format, for example:

<[1000.0, 2000.0, 1.0]; [3000.0, 4000.0, 1.0]>

(the edge has a weight of 1 between t=1000 and t=2000, and between 3000 and 4000; by default, the weight is 0 elsewhere).

However, the way Gephi calculates its edge weights for visualisations is quite misleading; it simply seems to use the formula

(weight in period 1 + weight in period 2 + ... + weight in period n) / n

for all periods which fall into the currently selected timeframe. For a timeframe from 0 to 5000, the edge weight resulting from the example above would be:

(1 + 1) / 2 = 1

If the edges above were defined as

<[0.0, 1000.0, 0.0]; [1000.0, 2000.0, 1.0]; [3000.0, 4000.0, 0.0]; [3000.0, 4000.0, 1.0]; [4000.0, 5000.0, 0.0]>

(which describes exactly the same edge, but explicitly sets the edge weight to 0 for other periods), then the result would be different:

(0 + 1 + 0 + 1 + 0) / 5 = 0.4

And if the periods were broken up further (e.g. [0000.0, 500.0, 0.0]; [500.0, 1000.0, 0.0]; etc.), we could generate further alternative results.

Similarly, the average edge weights for

<[0.0, 2.0, 0.0]; [2.0, 5000.0, 1.0]>

and

<[0.0, 4998.0, 0.0]; [4998.0, 5000.0, 1.0]>

both come out as (0 + 1) / 2 = 0.5, even though the first edge is visible for almost the entire period between 0 and 5000, while the second edge only appears at t=4998.

Is there a way to revise the edge weight calculation algorithm in Gephi to take into account the length of each defined period, to come up with a reliable total regardless of how the edge weights are described ? I think the following formula should do the trick:

((weight in period 1 * length of period 1) + (weight in period 2 * length of period 2) + ... + (weight in period n * length of period n)) / length of entire timeframe = average weight

E.g., for my two examples:

(1 * 1000 + 1 * 1000) / 5000 = (0 * 1000 + 1 * 1000 + 0 * 1000 + 1 * 1000 + 0 * 1000) / 5000 = 0.4

and

(1 * 4998 + 0 * 2) / 5000 = 0.9996 versus (0 * 4998 + 1 * 2) / 5000 = 0.0004

Hopefully that's just a small change to the algorithm ?

Many thanks, and hope this makes some sense,

Axel Bruns

Revision history for this message
Axel Bruns (a-bruns) wrote :

Ping. Be great to get this fixed soon...

Changed in gephi:
status: New → Confirmed
importance: Undecided → Low
milestone: none → 0.7beta
tags: added: dynamics
removed: dynamic
Changed in gephi:
importance: Low → Medium
assignee: nobody → Sébastien Heymann (sebastien.heymann)
Revision history for this message
Luiz Ribeiro (luizribeiro) wrote :

The unit tests from DynamicTest testGetValue_Estimator and testGetValue_3args are probably wrong too: the DynamicDouble object created by the makeTree1 method doesn't make any sense to me, since there is more than one double value assigned to the same instant of time.

For example, on lines 471 and 472 from DynamicTypeTest.java:
intervals.add(new Interval<Double>(16.0, 21.0, 5.0));
intervals.add(new Interval<Double>(15.0, 23.0, 4.0));

There are at least two different values assigned to the interval [15.0, 21.0] (values 5.0 and 4.0). Is this really wrong or am I missing something important here? I could fix this issue easily by rewriting the algorithm on the DynamicFloat getValue method, but I think we first need to fix the unit tests.

If anyone confirm the unit tests are indeed wrong, I'll start working on this issue ASAP.

Revision history for this message
Sébastien Heymann (sebastien.heymann) wrote :

Hi,

The bug is much more complicated than it appears. If we change the definition of what an average is during time, we should also change the other Estimator fonctions accordingly:

http://gephi.org/docs/api/org/gephi/data/attributes/api/Estimator.html

We currently don't know what is a good definition for median value and sum, because it can be very sensitive to noise at sampling/capturing a dynamic network.

The second issue is a design problem in the code: currently an estimator function can only return the same type as the values. But the average of Integers is not necessary an Integer. It is much more a feature improvement than a simple bug, so we'll work on it while runnuing the GSoC proposal on Timeline improvement.

Revision history for this message
Axel Bruns (a-bruns) wrote :

Hi guys,

just wanted to say I appreciate your work on tracing this bug. Hope it can be fixed by the GSoC project.

Axel

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.