-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 ... > This indicates that the filenames are in OEM encoding (cp1251) > >> No, OEM encoding is cp866. cp1251 is ANSI encoding. ANSI encoding used >> for all non-unicode Win32 API functions, e.g. CreateProcessA (vs >> CreateProcessW that uses unicode). > cp1251 is window's version of latin-1. > especially given the fact that you are doing: > > f1 = '"5AB.txt' > > Which means that it isn't even a Unicode string. > >> Yes, this is the point of my test. > > I would certainly expect that to work. The question is whether it > supports paths which are valid Unicode but do not have an OEM > representation on your filesytem. I'll note that: >>> x = u'\u0422\u0435\u0441\u0442.txt' # 'Тест.txt' >>> x.encode('cp1251') '\xd2\xe5\xf1\xf2.txt' >>> x.encode('cp866') '\x92\xa5\xe1\xe2.txt' I don't know what you would get for >>> x.encode('mbcs') As it depends on the given filesystem. It fails here with: >>> x.encode('mbcs') '????.txt' As 'Тест.txt' is not valid in my filesystem encoding. (I'm pretty sure that 'mbcs' == OEM encoding, which is unknown until you actually evaluate it.) I'm guessing, but I would guesst that on your system, you will get: >>> x.encode('mbcs') '\xd2\xe5\xf1\xf2.txt' or put another way >>> x.encode('cp1251') == x.encode('mbcs') True The issue is that there is sys.getfilesystemencoding() (which always returns 'mbcs' which only Windows knows what the OEM codepage is), and there is "locale.getpreferredencoding()". The former is the encoding for 8-bit paths on disk. The latter is the encoding for the *content* of files. In your test case, you explicitly define the encoding of the file as cp1251, which means that it likely exactly matches the filesystem encoding, which is why things "just work". WinMerge doesn't need to be Unicode aware, and your python Popen doesn't need to be Unicode aware, because they are just passing OEM strings to eachother. Or perhaps I should say "they are passing MBCS strings" to eachother. Where MBCS has a custom definition based on a Windows setting (which I don't, ATM, know how to modify). That is why I like testing with: u'\u062c\u0648\u062c\u0648.txt' # جوجو.txt Because I know that it is never going to be a valid MBCS / OEM / sys.getfilesystemencoding() / locale.getpreferredencoding() / bzrlib.osutils.get_user_encoding() etc But it *is* a valid Win32 filesystem (UTF-16) path. ... > >> This may require to write own C-extension. It seems even pywin32 does >> not provide unicode API for Create Process. > > John > =:-> I was actually pretty surprised at this. Looking here: http://msdn.microsoft.com/en-us/library/ms682425(VS.85).aspx It looks like CreateProcess should be part of the Kernel32.lib api, but it certainly isn't exported via ctypes.windll.kernel32. I'm pretty sure you can define your own foreign function with ctypes, without needing it to already be exposed via ctypes.windll (it is just easier when it has already been done for you). So we wouldn't *have* to wrap it. I also see that subprocess itself does: from win32process import CreateProcess, STARTUPINFO And you also have from _subprocess import * So it seems that win32all does, indeed, expose CreateProcess. Though looking here: http://docs.activestate.com/activepython/2.4/pywin32/win32process__CreateProcess_meth.html It seems that it is only exporting CreateProcessA, not CreateProcessW. Looking at the PC/_subprocess.c code, I see it calling result = CreateProcess(application_name, // char * command_line, // char * Which I'm a little bit surprised at, given that I thought CreateProcess would always be a #define to CreateProcessW on unicode platforms, which means the char* would be interpreted as a wchar_t *... I'm curious if you do something evil like: subprocess.Popen( u'python -c "Unicode command line \xb5"'.encode('UTF-16')) if that wouldn't actually 'just work' and cause the command line to be properly interpreted as a wide char string. (I'm wondering if the CreateProcessW function doesn't have some special tricks inside it to support when the first and second LPCTSTR parameters aren't proper wchar_t * strings, so we instead force it to be a wchar_t which is what it was supposed to be in the first place.) Just a thought. John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkoufIEACgkQJdeBCYSNAAPT7QCglYQxWOTxI419z3Xzer6iKjbk K2UAoIR/7U6IeDiKWWsGjTUi6K18bDqN =JfVU -----END PGP SIGNATURE-----