>>>>> "Kamil" == Kamil Szot writes: Kamil> By examining the changes you made Kamil> http://bazaar.launchpad.net/~vila/bzr/472161-ftp-utf8/revision/4936 I Kamil> doubt this goes anywhere near fixing my trouble. I realized the approach is different, that's why I asked you to test it :) Sorry for not having commented on your changes more clearly, see my comments below. You started on the assumption that the client and the server will use the same encoding which is less likely to work than using UTF8 (see RFC2640). Kamil> FTP servers are usually encoding agnostic. Not really, they have to obey whatever file system is used underneath. Kamil> They treat file names as built of 8-bit characters Kamil> without any specific encoding. They don't support any Kamil> encoding explicitly, Many file systems will refuse to create files with arbitrary 8-bits characters. Using UTF8 as an intermediate representation guarantees that clients and servers talk about the same paths (again, have a look at RFC2640). And some ftp servers will just not support arbitrary 8-bits paths (medusa for one). Kamil> Please take a look at my changes. They should fix the Kamil> problem even if your system uses some other encoding Kamil> for filenames (for example iso-8859-2). Your changes only impact the client side, the problem is on the server side. RFC2640 says that the clients should use UTF8 and that servers should handle UTF8 and *then* do what is needed on their side. If the clients uses iso-8859-1, mac-roman or utf8 and the server uses iso-8859-2, then using your changes can break but using utf8 should work. Now, it depends on whether the server is handling utf8. If it doesn't, then if (but only if) the server and all the clients use the same fs encoding, your changes will work. If it does, your changes will break but using utf8 will work. So the proposed changes may not be the final answer but it's more likely to work under more configurations. And that's why I wanted to know if it worked for *you* ! Kamil> The root of my problem seems to come from the fact Kamil> that python os.path functions appear not to be aware Kamil> of the fact that OS might use different encodings for Kamil> file names. They just return string and if it is Kamil> implicitly converted to unicode at some point then the Kamil> 'ascii' encoding is used. True and you re-discover what is done in bzrlib.osutils (_fsenc, get_terminal_encoding and get_user_encoding, but mainly the first one). The ftp transport receives paths that have been processed by higher layers, it should not worry about what the file system encoding is, the higher layers did. Kamil> I wonder how the core of bzr deals with this. By using unicode internally and decoding the paths respecting the file system encoding when needed (outside the scope or both your changes and mine). Kamil> Some people get around such bugs by setting default Kamil> encoding for unicode<->string conversions by calling Kamil> os.setdefaultencoding() like complained about here Kamil> http://tarekziade.wordpress.com/2008/01/08/syssetdefaultencoding- Kamil> is-evil/ Kamil> I have not made merge proposal yet because I wanted to Kamil> hear your opinion first and test if it breaks Kamil> testsuite for myself. But merge proposals are easier to deal with even at that point as they present a diff of your changes without the need to drill down into each revision or grab a local copy of your branch. They also allows us to discuss there instead of here :) Kamil> I'm using bazaar with python 2.6.4, I'll try to Kamil> install medusa or pyftpdlib, run the tests the way you Kamil> described and report my findings. If you use 2.6.4 then install pyftpdlib. You can have a look at medusa but the bugs there are really hard to understand and bzr will avoid trying to use it (so you'll have to revert the check in bzrlib/tests/ftp_server/__init__.py first).