-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


...

> This indicates that the filenames are in OEM encoding (cp1251)
> 
>> No, OEM encoding is cp866. cp1251 is ANSI encoding. ANSI encoding used 
>> for all non-unicode Win32 API functions, e.g. CreateProcessA (vs 
>> CreateProcessW that uses unicode).
> 

cp1251 is window's version of latin-1.

> especially given the fact that you are doing:
> 
>   f1 = '"5AB.txt'
> 
> Which means that it isn't even a Unicode string.
> 
>> Yes, this is the point of my test.
> 
> I would certainly expect that to work. The question is whether it
> supports paths which are valid Unicode but do not have an OEM
> representation on your filesytem.

I'll note that:
>>> x = u'\u0422\u0435\u0441\u0442.txt'  # 'Тест.txt'
>>> x.encode('cp1251')
'\xd2\xe5\xf1\xf2.txt'
>>> x.encode('cp866')
'\x92\xa5\xe1\xe2.txt'

I don't know what you would get for
>>> x.encode('mbcs')

As it depends on the given filesystem. It fails here with:
>>> x.encode('mbcs')
'????.txt'

As 'Тест.txt' is not valid in my filesystem encoding. (I'm pretty sure
that 'mbcs' == OEM encoding, which is unknown until you actually
evaluate it.)

I'm guessing, but I would guesst that on your system, you will get:
>>> x.encode('mbcs')
'\xd2\xe5\xf1\xf2.txt'

or put another way
>>> x.encode('cp1251') == x.encode('mbcs')
True

The issue is that there is sys.getfilesystemencoding() (which always
returns 'mbcs' which only Windows knows what the OEM codepage is), and
there is "locale.getpreferredencoding()".

The former is the encoding for 8-bit paths on disk. The latter is the
encoding for the *content* of files.

In your test case, you explicitly define the encoding of the file as
cp1251, which means that it likely exactly matches the filesystem
encoding, which is why things "just work".

WinMerge doesn't need to be Unicode aware, and your python Popen doesn't
need to be Unicode aware, because they are just passing OEM strings to
eachother. Or perhaps I should say "they are passing MBCS strings" to
eachother. Where MBCS has a custom definition based on a Windows setting
(which I don't, ATM, know how to modify).

That is why I like testing with:
  u'\u062c\u0648\u062c\u0648.txt' # جوجو.txt

Because I know that it is never going to be a valid MBCS / OEM /
sys.getfilesystemencoding() / locale.getpreferredencoding() /
bzrlib.osutils.get_user_encoding() etc

But it *is* a valid Win32 filesystem (UTF-16) path.

...

> 
>> This may require to write own C-extension. It seems even pywin32 does 
>> not provide unicode API for Create Process.
> 
> John
> =:->

I was actually pretty surprised at this. Looking here:
http://msdn.microsoft.com/en-us/library/ms682425(VS.85).aspx

It looks like CreateProcess should be part of the Kernel32.lib api, but
it certainly isn't exported via ctypes.windll.kernel32.

I'm pretty sure you can define your own foreign function with ctypes,
without needing it to already be exposed via ctypes.windll (it is just
easier when it has already been done for you). So we wouldn't *have* to
wrap it.

I also see that subprocess itself does:

from win32process import CreateProcess, STARTUPINFO

And you also have
from _subprocess import *

So it seems that win32all does, indeed, expose CreateProcess. Though
looking here:
http://docs.activestate.com/activepython/2.4/pywin32/win32process__CreateProcess_meth.html

It seems that it is only exporting CreateProcessA, not CreateProcessW.

Looking at the PC/_subprocess.c code, I see it calling
	result = CreateProcess(application_name, // char *
			       command_line,	 // char *

Which I'm a little bit surprised at, given that I thought CreateProcess
would always be a #define to CreateProcessW on unicode platforms, which
means the char* would be interpreted as a wchar_t *...

I'm curious if you do something evil like:

subprocess.Popen(
  u'python -c "Unicode command line \xb5"'.encode('UTF-16'))

if that wouldn't actually 'just work' and cause the command line to be
properly interpreted as a wide char string.

(I'm wondering if the CreateProcessW function doesn't have some special
tricks inside it to support when the first and second LPCTSTR parameters
aren't proper wchar_t * strings, so we instead force it to be a wchar_t
which is what it was supposed to be in the first place.)

Just a thought.

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkoufIEACgkQJdeBCYSNAAPT7QCglYQxWOTxI419z3Xzer6iKjbk
K2UAoIR/7U6IeDiKWWsGjTUi6K18bDqN
=JfVU
-----END PGP SIGNATURE-----