Unable to open native working tree with non-ascii filenames

Bug #128496 reported by Wouter van Heyst
20
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Undecided
Martin von Gagern
Bazaar Subversion Plugin
Fix Released
Medium
Jelmer Vernooij
subversion
Invalid
Undecided
Unassigned
bzr-svn (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

On feisty:

decoy%~/work/kmx/trunk> bzr st
version of bzr-svn is experimental; output may change between revisions
bzr: ERROR: libsvn._core.SubversionException: ("Can't convert string from native encoding to 'UTF-8':", 22)

Traceback (most recent call last):
  File "/home/wouter/src/bzr/bzr.dev/bzrlib/commands.py", line 729, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/home/wouter/src/bzr/bzr.dev/bzrlib/commands.py", line 691, in run_bzr
    ret = run(*run_argv)
  File "/home/wouter/src/bzr/bzr.dev/bzrlib/commands.py", line 389, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/home/wouter/src/bzr/bzr.dev/bzrlib/commands.py", line 701, in ignore_pipe
    result = func(*args, **kwargs)
  File "/home/wouter/src/bzr/bzr.dev/bzrlib/builtins.py", line 183, in run
    tree, file_list = tree_files(file_list)
  File "/home/wouter/src/bzr/bzr.dev/bzrlib/builtins.py", line 70, in tree_files
    return internal_tree_files(file_list, default_branch)
  File "/home/wouter/src/bzr/bzr.dev/bzrlib/builtins.py", line 94, in internal_tree_files
    return WorkingTree.open_containing(default_branch)[0], file_list
  File "/home/wouter/src/bzr/bzr.dev/bzrlib/workingtree.py", line 340, in open_containing
    return control.open_workingtree(), relpath
  File "/home/wouter/.bazaar/plugins/svn/workingtree.py", line 728, in open_workingtree
    return SvnWorkingTree(self, self.local_path, self.open_branch())
  File "/home/wouter/.bazaar/plugins/svn/workingtree.py", line 80, in __init__
    status = svn.wc.revision_status(self.basedir, None, True, None, None)
  File "/var/lib/python-support/python2.5/libsvn/wc.py", line 1577, in svn_wc_revision_status
    return apply(_wc.svn_wc_revision_status, args)
SubversionException: ("Can't convert string from native encoding to 'UTF-8':", 22)

bzr 0.19.0dev0 on python 2.5.1.final.0 (linux2)
arguments: ['/home/wouter/bin/bzr', 'st']

Related branches

Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 128496] bzr status in an svn checkout fails with SubversionException

Can you perhaps provide a test case that demonstrates this bug? I at
least can't reproduce it with a simple checkout.

Revision history for this message
Wouter van Heyst (larstiq) wrote : Re: bzr status in an svn checkout fails with SubversionException

That repo fails on multiple machines that handle others fine, I'll see if I can narrow it down.

Jelmer Vernooij (jelmer)
Changed in bzr-svn:
assignee: nobody → jelmer
status: New → Incomplete
Revision history for this message
Jelmer Vernooij (jelmer) wrote : Re: [Bug 128496] Re: bzr status in an svn checkout fails with SubversionException

Ok, I can reproduce this now.

  summary "Unable to open native working tree with non-ascii filenames"
  status triaged
  importance medium

Thanks for the bug report.

--
Jelmer Vernooij <email address hidden> - http://samba.org/~jelmer/
Jabber: <email address hidden>

Changed in bzr-svn:
importance: Undecided → Medium
status: Incomplete → Triaged
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Looks like svn.wc.revision_status() should be avoided because it can't deal with non-ASCII stuff.

Jelmer Vernooij (jelmer)
Changed in bzr-svn:
milestone: none → 0.4.7
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

According to a thread on subversion-dev, this is not a bug but rather a problem caused by not being able to convert from the file system encoding to utf-8.

I guess this means the only thing bzr-svn can do is catch the exception and print a clear error.

Revision history for this message
Wesley J. Landaker (wjl) wrote :

I think you are referring to a thread that I actually started. The conclusion that it's an encoding conversion issue is NOT true (I haven't gotten back to that thread yet to comment, due to being ill for a few days). The locale is UTF-8, and the filename is valid in UTF-8, and it's trying to convert that to UTF-8. It doesn't fail when libsvn does it, it only fails in the python bindings.

But I do believe the conclusion is correct that this is not a bzr-svn bug, but a bug somewhere in the svn python bindings.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Yes, that's the thread I'm referring to.

I don't think this is specific to the Python bindings though, as the could be reproduced with svnversion which doesn't use Python at all.

Revision history for this message
Wesley J. Landaker (wjl) wrote :

Not with my reproduction recipe. The person who claimed that it could be reproduced with svnversion showed it failing on a UTF-8 name using a non-UTF-8 locale, which is supposed to fail.

If you are using a UTF-8 locale, you should be able to validate this yourself: it will work with svnversion, but fail with python svn. If you use my tarball with a non UTF-8 locale, it will always fail, as tar does not transcode filenames, and the tarball I made contains UTF-8 names. (This was the original demonstration I noted on the list. The counter example someone else posted I believe is flawed.)

Anyway, I'll take the rest of this discussion to the SVN mailing list, as at this point I am sure that this is not a bzr-svn bug. I don't know if it's useful to continue parallel discussion on this linked bug (but we can if someone else thinks it is helpful -- I just want to help get this fixed one way or another! =).

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

I noticed it seems like this is fixable by setting the locale appropriately from python:

import locale
locale.setlocale(locale.LC_ALL, "en_US.UTF-8")

except:

 - this is valid for the complete process, so I can't just use this in bzr-svn
 - I don't know what the file system encoding is

Revision history for this message
Martin von Gagern (gagern) wrote : Backtrace and debugging for non-ascii filenames
Download full text (5.9 KiB)

OK, I hit this as well. Did some heavy debugging. First the backtrace of where I am, mixed Python and C, the latter re-ordered to match most recent call last order.

bzr: ERROR: svn.core.SubversionException: ("Can't convert string from native encoding to 'UTF-8':", 22)

Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 846, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 797, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 499, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 807, in ignore_pipe
    result = func(*args, **kwargs)
  File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 173, in run
    tree, file_list = tree_files(file_list)
  File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 64, in tree_files
    return internal_tree_files(file_list, default_branch)
  File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 88, in internal_tree_files
    return WorkingTree.open_containing(default_branch)[0], file_list
  File "/usr/lib/python2.5/site-packages/bzrlib/workingtree.py", line 325, in open_containing
    return control.open_workingtree(), relpath
  File "~/.bazaar/plugins/svn/workingtree.py", line 743, in open_workingtree
    return SvnWorkingTree(self, self.local_path, self.open_branch())
  File "~/.bazaar/plugins/svn/workingtree.py", line 88, in __init__
    status = svn.wc.revision_status(self.basedir, None, True, None, None)
  File "/usr/lib/svn-python/libsvn/wc.py", line 2310, in svn_wc_revision_status
SubversionException: ("Can't convert string from native encoding to 'UTF-8':", 22)

#11 0xb7725c76 in _wrap_svn_wc_revision_status ()
   from /usr/lib/python2.5/site-packages/libsvn/_wc.so
#10 0xb79083f3 in svn_wc_revision_status (result_p=0xbfe173a0,
    wc_path=0xa10e184 "/home/mvg/src/java/ornament", trail_url=0x0,
    committed=1, cancel_func=0xb7ade7fb <svn_swig_py_cancel_func>,
    cancel_baton=0x4e5168c0, pool=0xa27b108)
    at subversion/libsvn_wc/revision_status.c:123
#9 0xb7acc313 in close_edit (edit_baton=0xa30ad18, pool=0xa27b108)
    at subversion/libsvn_delta/cancel.c:334
#8 0xb790ba4b in close_edit (edit_baton=0xa30a6f0, pool=0xa27b108)
    at subversion/libsvn_wc/status.c:2033
#7 0xb79092e2 in get_dir_status (eb=0xa30a6f0, parent_entry=0x0,
    adm_access=0xa27b1d8, entry=0x0, ignore_patterns=0xa30a7a0,
    depth=svn_depth_infinity, get_all=1, no_ignore=0, skip_this_dir=0,
    status_func=0xb79080d0 <analyze_status>, status_baton=0xbfe17334,
    cancel_func=0xb7ade7fb <svn_swig_py_cancel_func>, cancel_baton=0x4e5168c0,
    pool=0xa27b108) at subversion/libsvn_wc/status.c:828
#6 0xb7976164 in svn_io_get_dirents2 (dirents=0xbfe17168,
    path=0xa27b200 "/home/mvg/src/java/ornament", pool=0xa2870a0)
    at subversion/libsvn_subr/io.c:1976
#5 0xb798263b in svn_path_cstring_to_utf8 (path_utf8=0xbfe17048,
    path_apr=0xa287840 "debugSym_4-z\303\244hlige Drehung_Tile.png",
    pool=0xa2870a0) at subversion/libsvn_subr/path.c:13...

Read more...

Revision history for this message
Martin von Gagern (gagern) wrote :

The source locale in APR is specified using apr_os_locale_encoding which in turn uses nl_langinfo which in turn bases its result on the locale currently enabled for the application.

Two important breakpoints are "setlocale" and "apr_os_locale_encoding". With them in place from the beginnin I get this sequence:

setlocale(LC_CTYPE, NULL) // query only
setlocale(LC_CTYPE, "") // set according to environment
setlocale(LC_CTYPE, "C") // set to US-ASCII default
setlocale(LC_CTYPE, NULL)
setlocale(LC_CTYPE, "")
setlocale(LC_CTYPE, "C")
apr_os_locale_encoding // repeated

So it looks like python would not set any locale settings by default in order to provide maximum compatibility for non-locale-aware applications. bzr itself seems to have no call to setlocale. I would suggest this line somewhere in the initialization code of bzr:
locale.setlocale(locale.LC_ALL, '')

So this is neither a bug in subversion nor a bug in bzr-subversion but rather a bug in bzr itself, imo. I'll associate a branch containing a suggested fix with this bug here. It's against bzr.dev, so it won't work directly with bzr-svn, but I'd hope the bzr people merge it to bzr 1.5 as well.

Revision history for this message
Wesley J. Landaker (wjl) wrote :

I can confirm that patching this into bzr fixes the problems I personally was having in relation to this bug.

import locale
locale.setlocale(locale.LC_ALL, '')

Revision history for this message
Martin von Gagern (gagern) wrote :

I found out that bzr contains quite a bit of code dedicated to locale issues. There is even a test suite, blackbox.test_locale, dealing with this. However, bzr seems to do things in a pythonish way, using tools provided by the python library. And those functions seem to be designed to touch the current locale setting as little as possible, but rather relies on corresponding environment settings. I can think of three possible approaches:

1. Have bzr-svn wrap all its access to libsvn in a function temporarily setting the environment according to environment settings.
+ No modifications to bzr at large
- Might need to be replicated for other applications as well
- If another application which imports bzr called setlocale, that setting gets ignored

2. Have bzr call setlocale on a global level if it is run as an application. Leave everything else alone. That's the current approach of my setlocale branch.
+ Consistent locale-aware behaviour for all plugins
+ Locale-specific formatting fo dates and so on
- I guess the python functions still use environment settings over this locale, so behaviour remains inconsistent if imported in an application that did setlocale but did not modify the environment

3. Try to make Python-specific functions honour locale as set by setlocale. I haven't tried any of this, but I guess this would require determining locale settings and adjusting the environment accordingly.
+ Consistent behaviour even when imported in another application
- Maybe less locale support when run from within a non-locale aware application
- Modifying the environment when run inside an application might be a bad idea

Any input from people with more knowledge about Python and locales highy appreciated.

Revision history for this message
Wesley J. Landaker (wjl) wrote :
Download full text (5.3 KiB)

Well, almost. When adding setlocale to bzr, it does fix a lot cases, but it looks like it then exposes [an]other problem[s] in bzr-svn. For example:

# before adding setlocale
$ bzr info
bzr: ERROR: libsvn._core.SubversionException: ("Can't convert string from native encoding to 'UTF-8':", 22)

Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 846, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 797, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 499, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 807, in ignore_pipe
    result = func(*args, **kwargs)
  File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 1130, in run
    verbose=noise_level, outfile=self.outf)
  File "/usr/lib/python2.5/site-packages/bzrlib/info.py", line 315, in show_bzrdir_info
    recommend_upgrade=False)
  File "/home/wjlanda/.bazaar/plugins/svn/workingtree.py", line 743, in open_workingtree
    return SvnWorkingTree(self, self.local_path, self.open_branch())
  File "/home/wjlanda/.bazaar/plugins/svn/workingtree.py", line 88, in __init__
    status = svn.wc.revision_status(self.basedir, None, True, None, None)
  File "/var/lib/python-support/python2.5/libsvn/wc.py", line 1577, in svn_wc_revision_status
    return apply(_wc.svn_wc_revision_status, args)
SubversionException: ("Can't convert string from native encoding to 'UTF-8':", 22)

bzr 1.5 on python 2.5.2 (linux2)
arguments: ['/usr/bin/bzr', 'info']
encoding: 'UTF-8', fsenc: 'UTF-8', lang: 'en_US.UTF-8'
plugins:
  bisect /home/wjlanda/.bazaar/plugins/bisect [1.1.0pre0]
  bzrtools /usr/lib/python2.5/site-packages/bzrlib/plugins/bzrtools [1.5.0]
  cvsps /home/wjlanda/.bazaar/plugins/cvsps [unknown]
  gtk /home/wjlanda/.bazaar/plugins/gtk [0.95.0dev1]
  launchpad /usr/lib/python2.5/site-packages/bzrlib/plugins/launchpad [unknown]
  loom /home/wjlanda/.bazaar/plugins/loom [1.4.0dev0]
  rebase /home/wjlanda/.bazaar/plugins/rebase [0.4.0dev0]
  stats /home/wjlanda/.bazaar/plugins/stats [unknown]
  svn /home/wjlanda/.bazaar/plugins/svn [0.4.11dev0]
*** Bazaar has encountered an internal error.
    Please report a bug at https://bugs.launchpad.net/bzr/+filebug
    including this traceback, and a description of what you
    were doing when the error occurred.

# after adding setlocale
$ bzr info
bzr: ERROR: exceptions.KeyError: 'doc/I\xc2\xb2C'

Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 846, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 797, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 499, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 807, in ignore_pipe
    result = func(*args...

Read more...

Revision history for this message
Martin von Gagern (gagern) wrote :

OK, with the modifications in the ~gagern/bzr-svn/bug128496 branch in addition to the setlocale in bzr I got bzr info working. I am not sure, however, that this is how things should work, and I guess there might be other places needing such adjustments as well.

Revision history for this message
Wesley J. Landaker (wjl) wrote :

With the latest bug128496 branch merged, and with setlocale in place, I now get an even different error. I hope this is helpful:

$ bzr info
bzr: ERROR: exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\xb2' in position 34: ordinal not in range(128)

Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 846, in run_bzr_catch_errors
    return run_bzr(argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 797, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 499, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "/usr/lib/python2.5/site-packages/bzrlib/commands.py", line 807, in ignore_pipe
    result = func(*args, **kwargs)
  File "/usr/lib/python2.5/site-packages/bzrlib/builtins.py", line 1130, in run
    verbose=noise_level, outfile=self.outf)
  File "/usr/lib/python2.5/site-packages/bzrlib/info.py", line 315, in show_bzrdir_info
    recommend_upgrade=False)
  File "/home/wjlanda/.bazaar/plugins/svn/workingtree.py", line 743, in open_workingtree
    return SvnWorkingTree(self, self.local_path, self.open_branch())
  File "/home/wjlanda/.bazaar/plugins/svn/workingtree.py", line 90, in __init__
    self.base_tree = SvnBasisTree(self)
  File "/home/wjlanda/.bazaar/plugins/svn/tree.py", line 348, in __init__
    add_dir_to_inv(u"", wc, None)
  File "/home/wjlanda/.bazaar/plugins/svn/tree.py", line 338, in add_dir_to_inv
    add_dir_to_inv(subrelpath, subwc, id)
  File "/home/wjlanda/.bazaar/plugins/svn/tree.py", line 336, in add_dir_to_inv
    False, 0, None)
  File "/var/lib/python-support/python2.5/libsvn/wc.py", line 58, in svn_wc_adm_open3
    return apply(_wc.svn_wc_adm_open3, args)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb2' in position 34: ordinal not in range(128)

bzr 1.5 on python 2.5.2 (linux2)
arguments: ['/usr/bin/bzr', 'info']
encoding: 'UTF-8', fsenc: 'UTF-8', lang: 'en_US.UTF-8'
plugins:
  bisect /home/wjlanda/.bazaar/plugins/bisect [1.1.0pre0]
  bzrtools /usr/lib/python2.5/site-packages/bzrlib/plugins/bzrtools [1.5.0]
  cvsps /home/wjlanda/.bazaar/plugins/cvsps [unknown]
  gtk /home/wjlanda/.bazaar/plugins/gtk [0.95.0dev1]
  launchpad /usr/lib/python2.5/site-packages/bzrlib/plugins/launchpad [unknown]
  loom /home/wjlanda/.bazaar/plugins/loom [1.4.0dev0]
  rebase /home/wjlanda/.bazaar/plugins/rebase [0.4.0dev0]
  stats /home/wjlanda/.bazaar/plugins/stats [unknown]
  svn /home/wjlanda/.bazaar/plugins/svn [0.4.11dev0]
*** Bazaar has encountered an internal error.
    Please report a bug at https://bugs.launchpad.net/bzr/+filebug
    including this traceback, and a description of what you
    were doing when the error occurred.

Revision history for this message
Martin von Gagern (gagern) wrote :

Now I also added tests to my branch, and got problems in japan and russia, the only non-latin locales available on my system.
It seems that osutils.format_date doesn't return a unicode string, but rather a byte sequence.
Fixed one instance in log.LongLogFormatter.log_revision but there might be others affected as well.
osutils.format_date(...).decode() didn't work as I would have expected.
osutils.format_date(...).decode("utf-8") did work, but I'm not sure how it would do in a non-latin1 non-utf8 locale, like some of the pre-unicode asian locales. Someone care to check?

Some possible solutions:
1. Make osutils.format_date use POSIX locale. This is current behaviour in bzr.dev but I suppose it's bad for users who don't instantly understanding english weekday names. Would reduce the worth of the whole setlocale attempt.
2. Check every occurrence of osutils.format_date and figure out whether the caller needs a unicode string, UTF-8 string or a string in current default or terminal encoding. Probably it would be best if there was a test case for each of these invocations, so one would have to figure out command line commands that use them.

I'll need some developer opinion on this before I put any more work into something that might already be doomed.

Revision history for this message
Martin von Gagern (gagern) wrote : Automatic conversion between byte and unicode strings

I investigated the automatic conversion between byte and unicode strings, and found a really interesting thread on the bazaar mailing list called "About encoding issues": http://thread.gmane.org/gmane.comp.version-control.bazaar-ng.general/10908

There is a function called sys.setdefaultencoding to set the encoding used for such implicit transformations. Unfortunately it usually gets removed by site.py, and should only be called before, so it's a bit tricky to use and it will affect all modules. Writing your own character encoding, it is possible to trace automatic conversions without throwing an exception each time one happens.

I realized that idea as a proof of concept in my branch https://code.launchpad.net/~gagern/bzr/str-unicode but I hope for the bzr development community to extend this further, as I can't possibly investigate all automatic character conversions in bazaar all by myself.

Specifying a regular expression using the STR_UNICODE environment variable, the output can be restricted to bzr-svn, but even there the number of automatic conversions is astronomical, and some more efficient log format is required.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Now that we control the bindings, it may actually be possible to fix this there. I wonder what the most appropriate solution here is. Wrapping all bzr-svn calls by setlocale() may be the best solution here. Unless there's some other way to tell apr the encoding of everything it receives is UTF-8...

Revision history for this message
Martin von Gagern (gagern) wrote :

Jelmer Vernooij wrote:
> Wrapping all bzr-svn calls by setlocale() may be the best solution here.

Ugly, as setlocale is not guaranteed to be thread-safe, and can be quite inefficient as well.

Quoting from http://docs.python.org/lib/node745.html
"It is generally a bad idea to call setlocale() in some library routine, since as a side effect it affects the entire program. Saving and restoring it is almost as bad: it is expensive and affects other threads that happen to run before the settings have been restored."

> Unless there's some other way to tell apr the encoding of everything it
> receives is UTF-8...

I can see no alternative route through the sequence of library calls. One possibility might be some interposter library that overrides one of these functions, in order to relay it to the original implementation unless the current thread requested a fixed return value for the next call. Ugly, probably difficult to get right, and I guess none too portable either.

Two more options, both of which I would deem superior to those mentioned above:

Leave it to bzr. I have hopes to get some setlocale support merged soon: http://bundlebuggy.aaronbentley.com/request/%3C4863D8D1.405%40gmx.net%3E
I'm against ugly code in one project just to work around the deficiencies of code in another project, as this tends to make code quite unreadable. So you could try to get that fix backported in the next bzr 1.5 release, if there will be such a thing, and otherwise wait for 1.6.

Have bzr-svn modify bzrlib itself. As you can see, the fix above contains modifications to three functions, and a bit of top level code. Maybe bzr-svn can introduce this fix into bzr versions that don't provide it themselves. I don't know enough Python to be sure, but I would have thought that it should be possible to redefine these functions. Either there is a reliable way to get the source and apply the patch, or you rely on the current implementation being appropriate for all previous implementations as well, or you have a list of different implementations to choose from for different versions of bzr. The top level code could be executed during bzr-svn initialization, under the condition that the locale is still unset (i.e. set to "C"), the bzr version is known not to contain the fix, and bzrlib was actually loaded by the bzr command line tool, to stay consistent with a fix in bzr.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

When wrapping, it shouldn't affect other parts of bzr, since we would change the locale before calling a svn function and then change it back afterwards. We also don't use threads.

Personally, I think that's a lot clearner than monkeypatching bzr, which may have potential side-effects in other parts of the code.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

I've merged ~gagern/bzr-svn/bug128496 btw, since it seemed useful no matter how we resolve the other issues. Thanks!

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Setting the locale is only relevant for the paths specified to the wc module afaik, the rest expects utf8. Subversion expects the paths there to be in the file system encoding, which it seems to expect is the same thing as the locale.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Hmm, I misunderstood the meaning of setlocale(LC_ALL, "") I think. I'll just wait for your patch to be merged upstream.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

in other words, (2)

Revision history for this message
Martin von Gagern (gagern) wrote :

> When wrapping, it shouldn't affect other parts of bzr, since we would change the locale before calling a svn function and then change it back afterwards. We also don't use threads.

While bzr the command line tool might not be using threads, there might well be multithreaded client applications using bzrlib. However, those should call setlocale themselves, if they can cope with it, or bzr shouldn't use locales either. At least in the C world that's the way things would be. Maybe you can achive the setlocale wrapping with a decorator that does nothing if the locale has been set to something different than C, the Posix default value.

> Hmm, I misunderstood the meaning of setlocale(LC_ALL, "") I think. I'll just wait for your patch to be merged upstream.

You don't want to use LC_ALL, because you can't reliably restor that to it's old value. Only use LC_CTYPE. setlocale(LC_CTYPE, "") sets the current encoding according to environment settings. setlocale(LC_CTYPE, None) simply queries the current value. The returned value is always the old value.

Together you could use this to build a decorator like this (untested):

if setlocale(LC_CTYPE, None) != 'C':
    need_setlocale = False
else:
    old_locale = setlocale(LC_CTYPE, '')
    need_setlocale = (setlocale(LC_CTYPE, old_locale) != 'C')

def setlocale_wrapper(unbound):
    if not need_setlocale:
        return unbound
    def wrapped(*args, **kwargs):
        old_locale = setlocale(LC_CTYPE, '')
        try:
            return unbound(*args, **kwargs)
        finally:
            setlocale(LC_CTYPE, old_locale)
    wrapped.__doc__ = unbound.__doc__
    wrapped.__name__ = unbound.__name__
    return wrapped

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Should this be considered closed now that the setlocale() fixes have made it into bzr 1.6 ?

The bzr-svn changes attached to this bugreport have already made it into 0.4

Revision history for this message
Martin von Gagern (gagern) wrote :

Yes, I guess this can be closed for bzr-svn.

Jelmer Vernooij (jelmer)
Changed in bzr-svn:
milestone: 0.4.7 → 0.4.11
status: Triaged → Fix Committed
Revision history for this message
Martin von Gagern (gagern) wrote :
Changed in bzr:
assignee: nobody → gagern
status: New → Fix Committed
Revision history for this message
Martin von Gagern (gagern) wrote :

This issue is better addressed using setlocale in the application, not by some hack in svn or apr libs. Therefore not to be addressed in subversion.

Changed in subversion:
status: New → Invalid
Rolf Leggewie (r0lf)
Changed in bzr-svn:
status: New → Confirmed
Jelmer Vernooij (jelmer)
Changed in bzr:
status: Fix Committed → Fix Released
Jelmer Vernooij (jelmer)
Changed in bzr-svn:
status: Fix Committed → Fix Released
Jelmer Vernooij (jelmer)
Changed in bzr-svn:
status: Confirmed → Fix Released
Revision history for this message
Martin Pool (mbp) wrote :

Martin von Gagern posted a patch <http://bundlebuggy.aaronbentley.com/project/bzr/request/%3C4863D8C0.10305%40gmx.net%3E> that adds the setlocale call to bzr and then fixes up various issues following on from there.

That particular patch is still in bb but I'm going to mark it superseded because he submitted followon patches that address most aspects of it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.