R: better support for complex attributes

Bug #1012434 reported by Gábor Csárdi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
igraph
Confirmed
High
Gábor Csárdi

Bug Description

It should be possible to use complex attributes in GraphML and GML readers and writers. We would need to serialize and unserialize attributes for this.

Revision history for this message
Tamás Nepusz (ntamas) wrote :

Related bug: https://bugs.launchpad.net/igraph/+bug/918138 -- this is from the Python side. Basically, we have to extend the attribute handler interface s.t. the attribute handlers can provider serializer/unserializer hooks for data types that are not numbers or strings.

I think we roughly need the following changes in the attribute handler:

- Remove IGRAPH_ATTRIBUTE_R_OBJECT and IGRAPH_ATTRIBUTE_PY_OBJECT from igraph_attribute_type_t and replace them with a single IGRAPH_ATTRIBUTE_OBJECT type.
- Add functions like get_object_{graph,vertex,edge}_attr to the attribute handler interface. These functions should take the name of a graph/vertex/edge attribute and optionally a vertex/edge selector and return an igraph_vector_ptr_t, each element of which should point to a buffer holding the serialized values of the attributes.

The attribute handler of both the Python and the R interface should then be adapted to implement the new attribute handler functions, and also to handle IGRAPH_ATTRIBUTE_OBJECT attribute types properly in the getinfo, add_vertices, add_edges methods.

Finally, we should figure out how to write attributes of object types into GraphML or GML. The biggest obstacle here is that neither of these formats have an "object" data type, they only have "strings". One option would be to use a hidden graph attribute that lists the names of vertex/edge attributes that "look like" strings but that in fact contain serialized data. We must also make sure that serialized data is escaped properly when saved and unescaped when read.

Revision history for this message
Gábor Csárdi (gabor.csardi) wrote :

I am actually not sure that we need to write complex attributes to GraphML or GML files. If the files written by Python-igraph can be interpreted only by Python-igraph, and the same is true for R, then what is the point of the whole thing?

GML can store complex attributes, it has a list data type, which may contain strings, numbers or other lists. So I think it would be better to convert Python and R lists to GML (whenever possible) lists and back. Does GraphML have some thing similar?

Revision history for this message
Tamás Nepusz (ntamas) wrote :

GraphML has no "list" data type and the default attribute extension of the GraphML specification supports numeric and string types (and maybe boolean). According to the GraphML primer, the preferred way to extend GraphML with custom types is to redefine parts of the XML schema definition to describe the custom types one would like to add. See http://graphml.graphdrawing.org/primer/graphml-primer.html, Chapter 4.2.

Also, I believe that GML lists are more like "ordered key-value pairs", which would correspond to Python dicts (and named lists in R?).

Regarding your initial concern (the relevance of writing complex attributes with a native serialization format): theoretically, nothing forces the higher level interface to use a _native_ serialization format. The higher level interface could allow the user to hook into the serialization process and specify custom serializers/deserializers for data types, falling back to the platform native serialization format if nothing else succeeds.

Revision history for this message
Gábor Csárdi (gabor.csardi) wrote :

As for GraphML, we can maybe check how other software tools extended the schema, and emulate them.

As for serialization. OK, this is better. So effectively this would allow the user to specify how exactly they want to save the complex attributes, right? If yes, then we can do this.

I still think, however, that it is not as important, as the other issue, i.e. the GML and GraphML complex attributes that can be used by other software (without tweaking with serialization). In Python you can always pickle, and in R you can save/load, you saving/reloading graphs with complex attributes is not a problem.

If it is possible to solve the GML/GraphML stuff as a special case of serialization, that would be probably the best. If you know what I mean.

Revision history for this message
Gábor Csárdi (gabor.csardi) wrote : Continue on github

The development of igraph has moved to github, so please do not comment on this bug here. You are of course welcome to comment on github, here:
https://github.com/igraph/igraph/issues/109

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.