Comment 3 for bug 832028

Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 832028] Re: environment variables are not decoded properly

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 8/24/2011 9:04 AM, Vincent Ladeuil wrote:
>> * Python 2 doesn't give access to the unicode environment block
>> on
> windows only the 'ANSI' compatiblity apis.
>
> What does that mean in practice ? Is there at least a way for the
> user to specify unicode paths in *some* encoding (mbcs ?)

There are GetEnvironW sort of apis, where you set and retrieve env
variables in UCS-2/UTF-16. There are similar APIs for CreateProcessW, etc.

However, *many* Windows programs aren't wide-char safe, so there are
something like 3-5 different encodings things can use. (OEM, ANSI,
MBCS, ...)

>
>> * Both the environment and paths can be arbitrary bytes on nix
>> so
> decoding to unicode is never fully correct.
>
> As long as we can trap invalid values, we can at least define
> which subset we can support (and report proper errors for the rest)
> no ?
>

The issue (at least partly) is that if internally we say "X is a
Unicode String", then we have trouble when on Nix it is an
8-bit-in-some-arbitrary-encoding-that-is-often-utf-8. We can't decode
it into a Unicode string, and it isn't safe to leave it as "str"
because when we do "\xb5" + u"Unicode" it blows up.

So yes, we could trap things we can't decode. I think we just need a
better story than we currently have around that.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk5VHzgACgkQJdeBCYSNAANRpQCgxlvFmE63su0HqmJ+Ld+BzjHr
R80AmwQ5rdf6wnPU80aqX2Wpf3VT/t6b
=pxZ4
-----END PGP SIGNATURE-----