Misleading average edge weights in dynamic networks
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Gephi |
Confirmed
|
Medium
|
Sébastien Heymann |
Bug Description
Hi,
not sure if this is a bug or a feature, but it's very confusing.
I'm experimenting with dynamic network visualisations where the weight of edges is different during different time intervals. I'm defining these in GEXF format, for example:
<[1000.0, 2000.0, 1.0]; [3000.0, 4000.0, 1.0]>
(the edge has a weight of 1 between t=1000 and t=2000, and between 3000 and 4000; by default, the weight is 0 elsewhere).
However, the way Gephi calculates its edge weights for visualisations is quite misleading; it simply seems to use the formula
(weight in period 1 + weight in period 2 + ... + weight in period n) / n
for all periods which fall into the currently selected timeframe. For a timeframe from 0 to 5000, the edge weight resulting from the example above would be:
(1 + 1) / 2 = 1
If the edges above were defined as
<[0.0, 1000.0, 0.0]; [1000.0, 2000.0, 1.0]; [3000.0, 4000.0, 0.0]; [3000.0, 4000.0, 1.0]; [4000.0, 5000.0, 0.0]>
(which describes exactly the same edge, but explicitly sets the edge weight to 0 for other periods), then the result would be different:
(0 + 1 + 0 + 1 + 0) / 5 = 0.4
And if the periods were broken up further (e.g. [0000.0, 500.0, 0.0]; [500.0, 1000.0, 0.0]; etc.), we could generate further alternative results.
Similarly, the average edge weights for
<[0.0, 2.0, 0.0]; [2.0, 5000.0, 1.0]>
and
<[0.0, 4998.0, 0.0]; [4998.0, 5000.0, 1.0]>
both come out as (0 + 1) / 2 = 0.5, even though the first edge is visible for almost the entire period between 0 and 5000, while the second edge only appears at t=4998.
Is there a way to revise the edge weight calculation algorithm in Gephi to take into account the length of each defined period, to come up with a reliable total regardless of how the edge weights are described ? I think the following formula should do the trick:
((weight in period 1 * length of period 1) + (weight in period 2 * length of period 2) + ... + (weight in period n * length of period n)) / length of entire timeframe = average weight
E.g., for my two examples:
(1 * 1000 + 1 * 1000) / 5000 = (0 * 1000 + 1 * 1000 + 0 * 1000 + 1 * 1000 + 0 * 1000) / 5000 = 0.4
and
(1 * 4998 + 0 * 2) / 5000 = 0.9996 versus (0 * 4998 + 1 * 2) / 5000 = 0.0004
Hopefully that's just a small change to the algorithm ?
Many thanks, and hope this makes some sense,
Axel Bruns
Changed in gephi: | |
status: | New → Confirmed |
importance: | Undecided → Low |
milestone: | none → 0.7beta |
tags: |
added: dynamics removed: dynamic |
Changed in gephi: | |
importance: | Low → Medium |
assignee: | nobody → Sébastien Heymann (sebastien.heymann) |
Ping. Be great to get this fixed soon...