bzr svn-import fails on windows with UnicodeEncodeError

Bug #262923 reported by Bronislav Gabrhelik
2
Affects Status Importance Assigned to Milestone
Bazaar Subversion Plugin
Fix Released
Low
Jelmer Vernooij

Bug Description

I am trying to convert svn repository to bzr. The svn repos is local. Platform Windows XP SP2. Installed Python 2.5, Bazaar 1.6. I entered "bzr svn-import file:///C:/data/svn/ecos-full", but it failed with unicode error. I tried to change default encoding in "Python25\Lib\site.py" from "ascii" to "utf8" with no success.

...>bzr svn-import file:///C:/data/svn/ecos-full
bzr: ERROR: exceptions.UnicodeEncodeError: 'ascii' codec can't encode character
u'\xed' in position 49: ordinal not in range(128)

Traceback (most recent call last):
  File "bzrlib\commands.pyo", line 857, in run_bzr_catch_errors
  File "bzrlib\commands.pyo", line 797, in run_bzr
  File "bzrlib\commands.pyo", line 499, in run_argv_aliases
  File "bzrlib\commands.pyo", line 818, in ignore_pipe
  File "C:/Program Files/Bazaar/plugins\svn\__init__.py", line 262, in run
  File "C:/Program Files/Bazaar/plugins\svn\remote.py", line 72, in open_reposit
ory
  File "C:/Program Files/Bazaar/plugins\svn\repository.py", line 205, in __init_
_
UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 49:
 ordinal not in range(128)

bzr 1.6 on python 2.5.2 (win32)
arguments: ['bzr', 'svn-import', 'file:///C:/data/svn/ecos-full']
encoding: 'cp1250', fsenc: 'mbcs', lang: None
plugins:
  bzrtools C:\Program Files\Bazaar\plugins\bzrtools [1.6.0]
  launchpad C:\Program Files\Bazaar\plugins\launchpad [unknown]
  qbzr C:\Program Files\Bazaar\plugins\qbzr [0.9.3]
  svn C:\Program Files\Bazaar\plugins\svn [0.4.11]
*** Bazaar has encountered an internal error.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

What's your username on that system ? It appears this is related to the cache file which can't be encoded.

repository.py line 205 here is:

                cachedbs[cache_file] = sqlite3.connect(cache_file)

is that right?

Jelmer Vernooij (jelmer)
Changed in bzr-svn:
status: New → Incomplete
Revision history for this message
Bronislav Gabrhelik (bgabrhelik) wrote :

I am bgabrhelik on that machine. The same name is in svn repository. Bazaar showed me this user name:

c:\usr\convert>bzr whoami
bgabrhelik <bgabrhelik@BGNBDEV>

An important information is probably that the OS is czech version of XP. So there are some paths localized. That's problem!

I am newbie in python, but I modified script and printed out path set in connect_file variable. It contains:
"C:/Documents and Settings/bgabrhelik/Data aplikací/bazaar/2.0\svn-cac
he\f6d051f5-4fe7-e246-b4cf-f19aecd70f77\cache-v4"

Revision history for this message
Bronislav Gabrhelik (bgabrhelik) wrote :

The variable in my last comment is named cache_file not connect_file.

Revision history for this message
Bronislav Gabrhelik (bgabrhelik) wrote :

The locale.getdefaultlocale()[1] returns 'cp1250'

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Looks like the í is the culprit here.

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

What sort of error do you get if you change that line to just run "sqlite3.connect(cache_file)" ? (no assignment)

Also, can you try adding this print type(cache_file)" and pasting the output here?

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Fixed in the 0.4 branch.

Changed in bzr-svn:
assignee: nobody → jelmer
milestone: none → 0.4.12
status: Incomplete → Fix Committed
Jelmer Vernooij (jelmer)
Changed in bzr-svn:
status: Fix Committed → Fix Released
Revision history for this message
Bronislav Gabrhelik (bgabrhelik) wrote :

I am sorry for late feedback. I found out that this fix doesn't work for me.

Below is output from bzr svn-import. Your fix encoded filename passed to sqlite. The encoding name is taken from osutils._fs_enc, which returns 'mbcs'. I tried hardcoded value "utf8" instead of osutils._fs_enc and it actually works fine. It seems that sqlite expects utf8 encoding.

Here is an error output before I changed encoding to "utf8":

C:\temp\convert\ecos-full>bzr svn-import file:///C:/data/svn/ecos-full
Initialising Subversion metadata cache in C:/Documents and Settings/bgabrhelik/D
ata aplikací/bazaar/2.0\svn-cache\f6d051f5-4fe7-e246-b4cf-f19aecd70f77
bzr: ERROR: sqlite3.OperationalError: unable to open database file

Traceback (most recent call last):
  File "bzrlib\commands.pyo", line 893, in run_bzr_catch_errors
  File "bzrlib\commands.pyo", line 839, in run_bzr
  File "bzrlib\commands.pyo", line 539, in run_argv_aliases
  File "bzrlib\commands.pyo", line 853, in ignore_pipe
  File "C:/Program Files/Bazaar/plugins\svn\__init__.py", line 244, in run
  File "C:/Program Files/Bazaar/plugins\svn\remote.py", line 72, in open_reposit
ory
  File "C:/Program Files/Bazaar/plugins\svn\repository.py", line 205, in __init_
_
OperationalError: unable to open database file

bzr 1.9 on python 2.5.2 (win32)
arguments: ['bzr', 'svn-import', 'file:///C:/data/svn/ecos-full']
encoding: 'cp1250', fsenc: 'mbcs', lang: None
plugins:
  bzrtools C:\Program Files\Bazaar\plugins\bzrtools [1.9.1]
  launchpad C:\Program Files\Bazaar\plugins\launchpad [unknown]
  qbzr C:\Program Files\Bazaar\plugins\qbzr [0.9.5]
  svn C:\Program Files\Bazaar\plugins\svn [0.4.14]
*** Bazaar has encountered an internal error.
    Please report a bug at https://bugs.launchpad.net/bzr/+filebug
    including this traceback, and a description of what you
    were doing when the error occurred.

Changed in bzr-svn:
status: Fix Released → New
Revision history for this message
Bronislav Gabrhelik (bgabrhelik) wrote :

Maybe problem is that the bzr is not Unicode console application but ANSI console application. Is the reason compatibility with Win9x/WinMe systems?

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

This seems more like a sqlite issue to me now, at least the fact that it's havnig problems opening that file.

Can you reproduce this problem on a non-Windows '95/ME system?

Revision history for this message
Bronislav Gabrhelik (bgabrhelik) wrote :

>This seems more like a sqlite issue to me now, at least the fact that it's having problems opening that file.

The discussed string has origin in WIN shell API SHGetSpecialFolderPath(). There are two variants of this API. One with A postfix and one with W postfix. Each of them returns string in different encoding. The SHGetSpecialFolderPathA() has ANSI encoding, which on NT systems can be customized. On Win9x/Me systems it is hardcoded and cannot be changed. The SHGetSpecialFolderPathA() API is present in WinNT for backward compatibility. The SHGetSpecialFolderPathW() returns string in unicode (to be exact UTF16LE). Now is question which one API is used by bazaar.

I downloaded sources and tracking down value origin...

It has origin in function get_local_appdata_location() (bzrlib\win32api.py). The comment for this function says...

  Returned value can be unicode or plain string.
    To convert plain string to unicode use
    s.decode(bzrlib.user_encoding)
    (XXX - but see bug 262874, which asserts the correct encoding is 'mbcs')

Is there a property in string class which says which encoding it contains, so you can recognize if returned string was plain/unicode? If not, I think encoding should be done in low level functions.

seeing _get_sh_special_folder_path() it uses SHGetSpecialFolderPathW() so it should be encoded in unicode, but it seems that encoding is ANSI - CP1250. I don't know what is behind scenes....

Here in office I have no Czech version of OS, so I will test recommended decoding - s.decode.bzrlib.user_encoding) later from home.

>Can you reproduce this problem on a non-Windows '95/ME system?
I don't understand the meaning non-Windows '95/ME system. Anyway I have no Win9x available. But according code in create_cache_dir() it uses different path.

Regards
Bronislav Gabrhelik

Revision history for this message
Jelmer Vernooij (jelmer) wrote :

I think this problem is specific to Windows '95/'9x, so it would be useful if you could e.g. try to reproduce it on a newer system such as Windows 2000 or XP.

Can you try changing cache_file.encode to cache_file.decode in repository.py ?

Revision history for this message
Bronislav Gabrhelik (bgabrhelik) wrote : Re: [Bug 262923] Re: bzr svn-import fails on windows with UnicodeEncodeError

>I think this problem is specific to Windows '95/'9x, so it would be
>useful if you could e.g. try to reproduce it on a newer system such as
>Windows 2000 or XP.

Oops. It is missunderstanding. I just mentioned Win9x/WinMe as a reason for
using ANSI version of windows API. The problem occurs on Win XP Proffesional
- Czech version. It is mentioned in the first comment.

Revision history for this message
Bronislav Gabrhelik (bgabrhelik) wrote : Fwd: [Bug 262923] Re: bzr svn-import fails on windows with UnicodeEncodeError

 *>Can you try changing cache_file.encode to cache_file.decode in*
*>repository.py ?
*
I tried it and also all combinations encode/decode and
osutils._fs_enc/osutils.get_user_encoding()

In most cases I get original error UnicodeEncodeError.

print type(cache_file) dumps <type 'unicode'>, so cache_file is
unicode string.

That's issue of contract between bzr-svn and sqlite. Sqlite connect string
is <type 'str'> must be encoded in "utf8"

*Take look on following links:*

sqlite3 docs should mention utf8 requirement
http://bugs.python.org/issue2127

Opening A New Database Connection
http://www.sqlite.org/c3ref/open.html

Jelmer Vernooij (jelmer)
Changed in bzr-svn:
status: New → Incomplete
Revision history for this message
Jelmer Vernooij (jelmer) wrote :

Thanks for those links, looks like there's two bugs we have to workaround here (one in bzrlib, one in the python sqlite3 bindings). I think I've done that in the 0.5 branch now (r2246). Please let me know if this fixes the bug.

Changed in bzr-svn:
importance: Undecided → Low
milestone: 0.4.12 → 0.5.0
status: Incomplete → Fix Released
Revision history for this message
Bronislav Gabrhelik (bgabrhelik) wrote :

Jelmer,
Branch 0.5 is still not part of bzr1.1, so I cannot test it. I am not python guy, so I don't know how to build bzr-svn plugin. I tried to follow instructions in the INSTALL file, but I was unsuccessfull. I installed windows package bzr-setup-1.11rc1-3.exe and then replaced plugins\svn with content which I got by bzr branch lp:bzr-svn. It complains that subvertpy library is missing.

I don't know how to build subvertpy library. I want to build it for windows native platform (not cygwin). Is it supported? Where I can download subversion library? I cannot see any library on the site http://subversion.tigris.org/. Maybe more detailed instructions in subvert\INSTALL would be great.

Thanks
Bronislav Gabrhelik

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.