NullPointerException on importing CSV data in Data Laboratory

Bug #654030 reported by Mathieu Bastian
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gephi
Fix Released
High
Eduardo Ramos
0.7
Fix Released
High
Unassigned

Bug Description

Graph API throws NPE because source or target is null

Stack trace:
java.lang.NullPointerException
 at org.gephi.graph.dhns.core.GraphFactoryImpl.newEdge(GraphFactoryImpl.java:128)
 at org.gephi.graph.dhns.core.GraphFactoryImpl.newEdge(GraphFactoryImpl.java:123)
 at org.gephi.graph.dhns.core.GraphFactoryImpl.newEdge(GraphFactoryImpl.java:47)
 at org.gephi.datalab.impl.GraphElementsControllerImpl.buildEdge(GraphElementsControllerImpl.java:431)
 at org.gephi.datalab.impl.GraphElementsControllerImpl.createEdge(GraphElementsControllerImpl.java:89)
 at org.gephi.datalab.impl.AttributeColumnsControllerImpl.importCSVToEdgesTable(AttributeColumnsControllerImpl.java:597)
 at org.gephi.datalab.plugin.manipulators.general.ui.ImportCSVUIWizardAction.performAction(ImportCSVUIWizardAction.java:85)
 at org.gephi.datalab.plugin.manipulators.general.ImportCSV.execute(ImportCSV.java:40)

Revision history for this message
Axel Bruns (a-bruns) wrote :

Attached is the edges CSV file (zipped) that I'm trying to import (data on Twitter @replies, extracted from a public Twapperkeeper archive).

Gephi chokes on the import after some time (never at the same place, it seems), and throws up the error; after that, it's possible in the Data Laboratory to find out where it finished (i.e. what the last imported edge was - the one with the highest edge ID).

It's also possible to continue the import if you remove the already imported edges from the CSV, and import the new truncated file. With the attached CSV, I had to repeat that process more than half a dozen times to import all the edges...

Hope that helps...

Axel

Revision history for this message
Eduardo Ramos (eduramiba) wrote :

Hi, this bug is now fixed and you will be able to get an update from Gephi 0.7 beta soon.
Also, note that your file has some parallel edges, which Gephi can't create.

Revision history for this message
Eduardo Ramos (eduramiba) wrote :

And thanks for the file, it was very useful :)

Revision history for this message
Axel Bruns (a-bruns) wrote :

Brilliant, thanks for that.

On the parallel edges - yes, but they're probably occurring during different timeframes, right ? Will Gephi permit that ?

Axel

Revision history for this message
Mathieu Bastian (mathieu.bastian) wrote :

Bugfix deployed on AutoUpdate.

If you import your data using the Time Frame import (as described here: http://wiki.gephi.org/index.php/Import_Dynamic_Data) it will manage the parallel edges

Otherwise no, at this level I think we cannot guess they belong to different timeframes anyway, right?

Changed in gephi:
status: Confirmed → Fix Committed
Revision history for this message
Axel Bruns (a-bruns) wrote :

Thanks again for this, Mathieu - really appreciate the fast response.

Just following up again on the parallel edges question: what I'm wondering is whether there's a way to get Gephi to accept edges which appear only in specific time slices when importing CSVs of edges.

For example, say I want an edge to exist only during 2001-2003 and 2005-2007 - which could be expressed like this:

source,target,Time Interval
a,b,<[2001,2003]>
a,b,<[2005,2007]>

At the moment, importing this CSV into Gephi ignores the second line (it's regarded as a parallel edge).

Does Gephi do slices at all at the moment, and if so, how do you express them in the internal time interval notation ?

Axel

Revision history for this message
Axel Bruns (a-bruns) wrote :

Mathieu,

just following up on this again: I've experimented some more, and I'm now at a point where I'm not sure whether I'm doing something wrong, or whether there's still an error in Gephi somewhere.

I've created a simple test network in Gephi: two nodes (labelled 'c' and 'g'), which are visible between times 0 and 30, which are linked with an edge between times 1 and 10, and again between 20 and Infinity.

If I export the nodes and edges tables as CSVs from Gephi's Data Laboratory, this is what I get:

Nodes.csv:

Id,Label,Time Interval
c,,"<[0.0, 30.0]>"
g,,"<[0.0, 30.0]>"

Edges.csv:

Source,Target,Type,Id,Label,Weight,Time Interval
c,g,Directed,1,,1.0,"<[1.0, 10.0); [20.0, Infinity]>"

Importing these two CSVs back into the Gephi Data Laboratory (in a clean Workspace) works fine, but even if I make sure that the field type for 'Time Interval' is set to TimeInterval, I'm still not able to do any dynamic visualisation - the dynamic filter function simply doesn't work.

Even only importing the nodes CSV file by itself (without the edges CSV) doesn't work - the lower and upper limits of the timeline slider are (correctly) set to 0 and 30, but on the 'filters' tab, no filter is available in the 'dynamic' subsection, and clicking on the 'Filter' button does nothing...

Is there something I'm doing wrong here ?

Axel

Revision history for this message
Mathieu Bastian (mathieu.bastian) wrote :

You can't import directly the time interval data from the CSV import in data laboratory, but you have an easy way to convert a numerical column into the time interval. Import your time data as a DOUBLE or INT and look at this wiki tutorial: http://wiki.gephi.org/index.php/Import_Dynamic_Data#Tranform_existing_column_in_Time_Interval to know how to get the time interval from that.

On the tutorial you also have the method with importing slices from a GEXF file.

The fact you're properly importing a Time_INTERVAL column but it is not activated as the dynamic column could be improved, we could activate it by default is no other TIME_INTERVAL columns exists. It is currently not done, as it is not the default use-case.

Revision history for this message
Axel Bruns (a-bruns) wrote :

Mathieu,

many thanks for this again. I think what I'm trying to do is more complicated than that, though:

First, as far as I can tell, the first approach (transforming numerical columns into a time interval) doesn't work for nodes or edges which appear, disappear, and the re-appear - e.g. <[1.0, 10.0); [20.0, Infinity]> in my example above, which is visible from 1 to 10, and then again from 20 to the end. I'm not sure whether Gephi allows me to select four columns (e.g. start1, end1, start2, end2) to convert into one single time interval - but even that would get very messy in the case of a node which appears and disappears very frequently over time...

And I've been reluctant to try the GEXF route, mainly because of the nature of my data: if I understand GEXF slices right (and I may not), they work only for fairly well-defined and consistent time intervals (day by day, hour by hour, etc.). However, I'm dealing with continuous Twitter data which has no clear time intervals - tweets may appear at random, any second of the timeline, and I don't want to lose too much of that resolution by defining hour-by-hour slices, for example. Also, converting my Twitter data (which is in CSV form by default) to GEXF for import into Gephi would be very work-intensive - being able to import the CSVs of nodes and edges which I already have would be a lot easier...;-)

So if there's a way to add functionality in Gephi to activate the Time Interval column which I can already import, that would be the path of least resistance for my purposes... I know it's a very obscure request, but if there's a any way to do it, that would be fantastic !

Axel

Revision history for this message
Mathieu Bastian (mathieu.bastian) wrote :

I see, you're at the most advanced level already :-) With several slices for elements. I would indeed recommend to use GEXF as the format can nicely represent intervals. I can give you more details about slices. You have two ways to format your intervals, with DATE or DOUBLE (cf the tutorial), so you can be as precise as double values. If your values represents nanoseconds, that is fine, you don't need to define slices properly.
In the GEXF format, slices is just a set of intervals, you don't need to declare them in advance or something like this.

I created the bug 659017 to enable dynamic when importing time interval column from CSV wizard. That would make possible direct import of time intervals if they are correctly formatted.

Revision history for this message
Axel Bruns (a-bruns) wrote :

Thanks for that clarification - will see what I can do with GEXF. Unfortunately converting from a CSV list of edges (with start and end times) is not very straightforward... (Do you know of any tool that can do this ?)

Thanks for setting up the new bug as well !

Axel

Revision history for this message
Eduardo Ramos (eduramiba) wrote :

Hi, thanks for reporting the bug 659017. I fixed that problem and now the importer should use the first time interval column that it finds as the default dynamic column. Note that you still will need to indicate in the wizard that the column has TimeInterval type.

About the several slices I think you will need to use gexf format to achieve that, since this csv importer was intended to be a simple table formatted data importer.

Revision history for this message
Mathieu Bastian (mathieu.bastian) wrote :

Thanks Eduardo for your quick fix.

I deployed the patch on AutoUpdate, Alex you can update your Gephi, it should work now.

Revision history for this message
Axel Bruns (a-bruns) wrote :

Eduardo, Mathieu,

brilliant - many thanks for this very fast work. I've just tried this with the very simple example I posted in comment #7 above, and everything works perfectly - scrolling through the timeline, the edge disappears between 10.0 and 20.0 as it should, but is visible at all other times between 1.0 and 30.0...

Thanks again - this will make importing my Twitter data a lot easier.

Axel

Revision history for this message
Axel Bruns (a-bruns) wrote :

Guys,

sorry to follow up on this again. There's one last thing missing that would be extremely helpful for me.

Edge _weights_ in Gephi can be time-dependent as well - you can see this if you have a network with time intervals and export an edge list as CSV from the Data Laboratory, for example. The format for time-dependent weights looks like this:

<[100.0, Infinity, 1.0]>

Or presumably, if there are different weights at different times,

<[100.0, 200.0, 1.0); [200.0, 300.0, 2.0]> etc.

Currently, Gephi doesn't accept time-dependent weights when importing edge list CSVs into the Data Laboratory, though. Here's a simple edge list which I've exported from the Data Laboratory:

Source,Target,Type,Id,Label,Weight,Time Interval
a,b,Directed,2,,"<[100.0, Infinity, 1.0]>","<[100.0, 200.0); [400.0, Infinity]>"
b,c,Directed,3,,"<[200.0, Infinity, 1.0]>","<[200.0, 400.0)>"

When re-importing this into Gephi, the weights are simply set to 1. Is there any chance of fixing this? Hopefully it would be just a matter of being able to choose the data type for the weight field when importing CSVs?

Axel

Revision history for this message
Eduardo Ramos (eduramiba) wrote :

Hi Axel, importing dynamic attributes from CSV works fine if the dynamic column is already created and dynamic.
So to import CSV data with dynamic weights for edges in a existing project the Weight column has to be dynamic float and can't be float. The edges are keeping the value 1 because the weight column is float and can't have a dynamic value.

Revision history for this message
Axel Bruns (a-bruns) wrote :

Hi Eduardo,

Thanks for this. I've now bitten the bullet and scripted something that creates a working GEXF file from my data - the method is explained at http://www.mappingonlinepublics.net/2010/10/20/dynamic-networks-in-gephi-from-twapperkeeper-to-gexf/.

As far as I can tell, it _is_ possible for edge weights to be dynamic, though - in GEXF, for example, this would be expressed in the following fashion:

<edge source="user1" target="user2" start="0" end="5400" weight="0">
 <attvalues>
  <attvalue for="weight" value="1" start="0" end="1800"/>
  <attvalue for="weight" value="2" start="1800" end="3600"/>
  <attvalue for="weight" value="1" start="3600" end="5400"/>
 </attvalues>
 <slices>
  <slice start="0" end="1800" />
  <slice start="1800" end="3600" />
  <slice start="3600" end="5400" />
 </slices>
</edge>

This works in the GEXF files I've created now...

Axel

Changed in gephi:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.