Comment 10 for bug 128496

Revision history for this message
Martin von Gagern (gagern) wrote : Backtrace and debugging for non-ascii filenames

OK, I hit this as well. Did some heavy debugging. First the backtrace of where I am, mixed Python and C, the latter re-ordered to match most recent call last order.

bzr: ERROR: svn.core.SubversionException: ("Can't convert string from native encoding to 'UTF-8':", 22)

Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 846, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 797, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 499, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 807, in ignore_pipe
    result = func(*args, **kwargs)
  File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 173, in run
    tree, file_list = tree_files(file_list)
  File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 64, in tree_files
    return internal_tree_files(file_list, default_branch)
  File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 88, in internal_tree_files
    return WorkingTree.open_containing(default_branch)[0], file_list
  File "/usr/lib/python2.5/site-packages/bzrlib/workingtree.py", line 325, in open_containing
    return control.open_workingtree(), relpath
  File "~/.bazaar/plugins/svn/workingtree.py", line 743, in open_workingtree
    return SvnWorkingTree(self, self.local_path, self.open_branch())
  File "~/.bazaar/plugins/svn/workingtree.py", line 88, in __init__
    status = svn.wc.revision_status(self.basedir, None, True, None, None)
  File "/usr/lib/svn-python/libsvn/wc.py", line 2310, in svn_wc_revision_status
SubversionException: ("Can't convert string from native encoding to 'UTF-8':", 22)

#11 0xb7725c76 in _wrap_svn_wc_revision_status ()
   from /usr/lib/python2.5/site-packages/libsvn/_wc.so
#10 0xb79083f3 in svn_wc_revision_status (result_p=0xbfe173a0,
    wc_path=0xa10e184 "/home/mvg/src/java/ornament", trail_url=0x0,
    committed=1, cancel_func=0xb7ade7fb <svn_swig_py_cancel_func>,
    cancel_baton=0x4e5168c0, pool=0xa27b108)
    at subversion/libsvn_wc/revision_status.c:123
#9 0xb7acc313 in close_edit (edit_baton=0xa30ad18, pool=0xa27b108)
    at subversion/libsvn_delta/cancel.c:334
#8 0xb790ba4b in close_edit (edit_baton=0xa30a6f0, pool=0xa27b108)
    at subversion/libsvn_wc/status.c:2033
#7 0xb79092e2 in get_dir_status (eb=0xa30a6f0, parent_entry=0x0,
    adm_access=0xa27b1d8, entry=0x0, ignore_patterns=0xa30a7a0,
    depth=svn_depth_infinity, get_all=1, no_ignore=0, skip_this_dir=0,
    status_func=0xb79080d0 <analyze_status>, status_baton=0xbfe17334,
    cancel_func=0xb7ade7fb <svn_swig_py_cancel_func>, cancel_baton=0x4e5168c0,
    pool=0xa27b108) at subversion/libsvn_wc/status.c:828
#6 0xb7976164 in svn_io_get_dirents2 (dirents=0xbfe17168,
    path=0xa27b200 "/home/mvg/src/java/ornament", pool=0xa2870a0)
    at subversion/libsvn_subr/io.c:1976
#5 0xb798263b in svn_path_cstring_to_utf8 (path_utf8=0xbfe17048,
    path_apr=0xa287840 "debugSym_4-z\303\244hlige Drehung_Tile.png",
    pool=0xa2870a0) at subversion/libsvn_subr/path.c:1387
#4 0xb798f611 in svn_utf_cstring_to_utf8 (dest=0xbfe17048,
    src=0xa287840 "debugSym_4-z\303\244hlige Drehung_Tile.png", pool=0xa2870a0)
    at subversion/libsvn_subr/utf.c:752
#3 0xb798f536 in convert_cstring (dest=0xbfe17048,
    src=0xa287840 "debugSym_4-z\303\244hlige Drehung_Tile.png",
    node=0xa287468, pool=0xa2870a0) at subversion/libsvn_subr/utf.c:729
#2 0xb798ed36 in convert_to_stringbuf (node=0xa287468,
    src_data=0xa287840 "debugSym_4-z\303\244hlige Drehung_Tile.png",
    src_length=36, dest=0xbfe16f94, pool=0xa2870a0)
    at subversion/libsvn_subr/utf.c:493
#1 0x41037672 in apr_xlate_conv_buffer () from /usr/lib/libaprutil-1.so.0
#0 iconv (cd=<value optimized out>, inbuf=<value optimized out>, inbytesleft=Could not find the frame base for "iconv".
)
    at iconv.c:36

Although the value of cd is optimized out in my lib, it can be inferred from node at frame #2 if you know the memory layout of these structures. According to utf.c from subversion, node->handle has type apr_xlate_t. This type is incomplete in apr_xlate.h from package apr-utils. Its specification is given in xlate.c. It has four pointers followed by an iconv_t member. According to iconv.h, iconv_t is a void*, but it's converted to an __gconv_t* in iconv.c, which is part of glibc. So I can use these steps to look at what iconv is actually doing:

(gdb) p (**(__gconv_t*)((void*)node->handle + 4*sizeof(void*)))
$15 = {__nsteps = 2, __steps = 0xa256f80, __data = 0xa268d38}
(gdb) p (**(__gconv_t*)((void*)node->handle + 4*sizeof(void*))).__steps[0]
$16 = {__shlib_handle = 0x0, __modname = 0x0, __counter = 1,
  __from_name = 0xb7f06be3 "ANSI_X3.4-1968//",
  __to_name = 0x42acde2f "INTERNAL",
  __fct = 0x429d5db0 <__gconv_transform_ascii_internal>,
  __btowc_fct = 0x429d47b0 <__gconv_btwoc_ascii>, __init_fct = 0,
  __end_fct = 0, __min_needed_from = 4, __max_needed_from = 4,
  __min_needed_to = 1, __max_needed_to = 1, __stateful = 0, __data = 0x0}
(gdb) p (**(__gconv_t*)((void*)node->handle + 4*sizeof(void*))).__steps[1]
$17 = {__shlib_handle = 0x0, __modname = 0x0, __counter = 1,
  __from_name = 0x42acde2f "INTERNAL",
  __to_name = 0xb7f07cc7 "ISO-10646/UTF8/",
  __fct = 0x429d91e0 <__gconv_transform_internal_utf8>, __btowc_fct = 0,
  __init_fct = 0, __end_fct = 0, __min_needed_from = 4, __max_needed_from = 4,
  __min_needed_to = 1, __max_needed_to = 6, __stateful = 0, __data = 0x0}

Accoding to gconv_builtin.h, "ASCII" is an alias for "ANSI_X3.4-1968//", and the function name __gconv_transform_ascii_internal confirms this. So iconv tries to convert from ASCII to UTF-8, which is bound to fail. As my system uses UTF-8, I wonder what made it think to use ASCII as the origin charset in the first place. Needs some more debugging, but this comment here is long enough as it is, so I'll be back later.