segfaults writing unicode

Bug #545663 reported by Ximin Luo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
igraph
Confirmed
Low
Tamás Nepusz

Bug Description

writing unicode output causes igraph to segfault:

$ python
Python 2.5.5 (r255:77872, Feb 2 2010, 00:25:36)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, igraph
>>> igraph.__version__
'0.5.3'
>>> gg = igraph.Graph(n=1, edges=[], vertex_attrs={'id': [u'\u0391\u03b8\u03ae\u03bd\u03b1']})
>>> gg.write_dot(sys.stdout)
/* Created by igraph 0.5.3 */
graph {
  0 [
Segmentation fault

the unicode string in the example is just "Αθήνα" ("Athens") and not some obscure construct.

the segfault occurs for all the write formats i tested (dot, graphml, graphmlz, pajek), except pickle, and for python 2.6 on debian.

----
as a side note, write_pickle gives
  File "/usr/lib/python2.5/site-packages/igraph/__init__.py", line 844, in write_pickle
    if file_was_opened: fname.close()
UnboundLocalError: local variable 'file_was_opened' referenced before assignment

which can fixed by adding "else: file_was_opened=False" to the appropriate if-block; write_pickle() succeeds without segfaulting after this

Revision history for this message
Ximin Luo (infinity0) wrote :

it seems to be segfaulting when writing out the attribute - formats like gml which don't output the attribute don't segfault.

Revision history for this message
Ximin Luo (infinity0) wrote :

temporary fix is to call u.encode("utf-8") on every unicode string, which converts it into a byte string. the output file displays correctly when read as utf-8, and can be loaded back into igraph, with the attributes remaining as byte strings rather than unicode strings.

Revision history for this message
Gábor Csárdi (gabor.csardi) wrote :

I think this is a known problem, see e.g.:
http://lists.gnu.org/archive/html/igraph-help/2010-02/msg00063.html
Maybe it was already fixed, Tamas will be able to tell, I cannot remember.

Revision history for this message
Tamás Nepusz (ntamas) wrote :

There is some sort of a "partial fix" for this problem in igraph 0.6 now. igraph 0.6 encodes all Unicode objects to UTF-8 when saving a graph, but the string attributes will still be read back as "regular" strings and not Unicode objects. The whole situation is a bit complicated because practically none of the graph formats except GraphML and DOT specify the way one should encode non-ASCII characters in the output. (DOT uses UTF-8 by default, GraphML uses whatever XML encoding is specified in the header -- this is always UTF-8 for graphs saved from igraph).

At the moment I'm not sure what the best solution would be (and I wonder how the R interface handles this problem). I could modify igraph's attribute handler to use Unicode objects no matter what, assuming UTF-8 encoding for the input file, or I could simply postpone the problem until everyone migrates to Python 3 where all strings are Unicode by default anyway.

Revision history for this message
Tamás Nepusz (ntamas) wrote :

Note to self: it looks like the only places where the Python attribute handler generates string objects are in igraphmodule_i_attribute_{init,add_vertices,add_edges}. There are explicit calls to PyString_FromString and PyDict_SetItemString in these functions; they can be replaced by PyUnicode_FromEncodedObject to enable Unicode support. Maybe I should add a module-level attribute to igraph.core (say, _unicode_attributes) that is None when ordinary strings should be used; otherwise it may contain the name of an encoding (say, UTF-8) supported by Python, this would enable Unicode mode. When Unicode mode is enabled, graphs loaded from GraphML files would automatically have Unicode objects for string attributes.

Changed in igraph:
assignee: nobody → Tamás Nepusz (ntamas)
importance: Undecided → Low
status: New → Confirmed
Revision history for this message
Gábor Csárdi (gabor.csardi) wrote : Continue on github

The development of igraph has moved to github, so please do not comment on this bug here. You are of course welcome to comment on github, here:
https://github.com/igraph/igraph/issues/269

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.