igraph

Bug #545663
Comment #5

Comment 5 for bug 545663

Revision history for this message

Tamás Nepusz (ntamas) wrote on 2010-03-24:

There is some sort of a "partial fix" for this problem in igraph 0.6 now. igraph 0.6 encodes all Unicode objects to UTF-8 when saving a graph, but the string attributes will still be read back as "regular" strings and not Unicode objects. The whole situation is a bit complicated because practically none of the graph formats except GraphML and DOT specify the way one should encode non-ASCII characters in the output. (DOT uses UTF-8 by default, GraphML uses whatever XML encoding is specified in the header -- this is always UTF-8 for graphs saved from igraph).

At the moment I'm not sure what the best solution would be (and I wonder how the R interface handles this problem). I could modify igraph's attribute handler to use Unicode objects no matter what, assuming UTF-8 encoding for the input file, or I could simply postpone the problem until everyone migrates to Python 3 where all strings are Unicode by default anyway.