Python 2.6 @ win32: print non-ascii characters to console produce borked output

Bug #631350 reported by Alexander Belchenko
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Critical
Martin Packman

Bug Description

This is regression in bzr.exe 2.2 since bzr.exe 2.1. Actually it's not really bzr problem, but underlying platform (Python 2.6 + MSVC 2008 runtime), but bzr triggers this problem.

The reason (per Marin gz): calling setlocale(LC_ALL,"") force "puts" function from C run-time to re-encode the output we want to write to stdout.

C:\Temp>bzr init 5
Created a standalone tree (format: 1.9)

C:\Temp>cd 5
C:\Temp\5>

C:\Temp\5>bzr mkdir Тест
added '?бв

C:\Temp\5>bzr st
added:
  '?бв/

C:\Temp\5>bzr ci -m "committing Тест"
Committing to: C:/Temp/5/
added '?бв
Committed revision 1.

C:\Temp\5>bzr log -v
------------------------------------------------------------
revno: 1
committer: Alexander Belchenko <email address hidden>
branch nick: 5
timestamp: Mon 2010-09-06 10:35:45 +0300
message:
  committing Тест
added:
  Тест/

C:\Temp\5>bzr uncommit
    1 Alexander Belchenko 2010-09-06
      committing '?бв

The above revision(s) will be removed.
Are you sure? [y/n]: n
Canceled

So there is a lot of commands broken now regarding output to the console.

Related branches

Revision history for this message
Alexander Belchenko (bialix) wrote :

It also affects qcommit in QBzr

Revision history for this message
Martin Packman (gz) wrote :

Any idea what caused this Alexander? The good news is tests will now do non-ascii properly and babune is almost... blue... so we should be able to avoid this kind of regression in future.

Revision history for this message
Alexander Belchenko (bialix) wrote :

I think we should look for the change of how stdout is wrapped now. IIRC poolie have moved outf creation code into ui subpackage, so it could be he just forgot to call get_terminal_encoding() there.

Revision history for this message
Alexander Belchenko (bialix) wrote :

this effect was observed with bzr.exe from standalone installer (bzr-2.2.0-setup.exe)

Revision history for this message
Alexander Belchenko (bialix) wrote :

This bug does not present in lp:bzr/2.2 branch.

Revision history for this message
Martin Packman (gz) wrote :

So, this is specific to the way the 2.2 windows installer is built. Since it switched from Python 2.5 to 2.6 it also needed to bundle the msvc 9 runtime libraries. For some reason the encoding is being mangled when trying to write with the bundled dll whereas my system dll (which also gets linked) is fine:

(Pdb) os.write(sys.stdout.fileno(), "Тест")
'?aa4
(Pdb) ctypes.cdll.msvcr90._write(sys.stdout.fileno(), "Тест", 4)
'?aa4
(Pdb) ctypes.cdll.msvcrt._write(sys.stdout.fileno(), "Тест", 4)
Тест4

affects: bzr → bzr-windows-installers
Changed in bzr-windows-installers:
milestone: 2.2.1 → none
Revision history for this message
Alexander Belchenko (bialix) wrote :

Heh... Next time when someone will ask me why I prefer Python 2.5 I know what will be answer.

Revision history for this message
Alexander Belchenko (bialix) wrote :

From IRC:

[16:37] <mgz> okay, this is kinda funky.
[16:40] <mgz> the bazaar ui object contains a codecs wrapper, contains stdout
[16:40] <mgz> the codecs wrapper and stdout both have the encoding cp866 set
[16:41] <mgz> and the wrapper correctly encodes the unicode russian to a cp866 bytestring, and writes it to the python file
[16:41] <mgz> ...which somehow is coming out mangled as-if it's being decoded as... something random latin-ish then encoded as mbcs or similar
[16:42] <mgz> sys.version is 2.6.4 ... can we build an installer with something else?
[16:50] <mgz> ...not that I can find any likely upstream bug
[16:54] <GaryvdM> mgz: The 2.1 installers have python 2.5 - I don't know if that help?
[16:54] <mgz> I'll try one, pretty sure this isn't related to bzrlib code changes
[16:56] <mgz> py2exe seems most likely, to be honest
[16:59] <mgz> (Pdb) ctypes.cdll.msvcrt.printf("\x92\xa5\xe1\xe2")
[16:59] <mgz> Тест4
[16:59] <mgz> (Pdb) os.write(sys.stdout.fileno(), "\x92\xa5\xe1\xe2")
[16:59] <mgz> '?aa4
[17:05] <mgz> so, certainly a build problem, printing through msvcr90.dll is borked, and worked through msvcrt.dll both of which are linked
[17:07] <mgz> unfortunately that means I'm not entirely sure what the right fix is.
[17:10] <maxb> Sounds a bit wrong to have 2 CRTs linked
[17:11] <mgz> it's not as wrong as it sounds from nix perspective, but it's possibly indicative of something
[17:12] <mgz> as I understand it, starting with 2.6 you have to ship some vc 9 dlls with python
[17:12] <mgz> the older threaded runtime is from my system.

Revision history for this message
Alexander Belchenko (bialix) wrote :

OK, it could be a problem in the installer script. For python 2.6 py2exe should create a manifest file for bzr.exe and either link it into exe itself, or put it in the same directory where exe resides. Also for msvcrt90.dll -- IIRC it should be in SxS system directory. That's from my memory, we have to check py2exe docs re 2.6 compatibility. As I remember MS made a big changes in their VS 2008 and how dlls should be installed. And all these changes sounds as needless complications. Or maybe it's just too complex comparing to old way.

Revision history for this message
Alexander Belchenko (bialix) wrote : Re: [Bug 631350] Re: bzr 2.2 @ win32: print non-ascii characters to console made in user_encoding() instead of terminal_encoding()

http://www.py2exe.org/index.cgi/Tutorial#Step52

Also google for py2exe and manifest and sxs and you'll find more
interesting things to read.

Revision history for this message
Alexander Belchenko (bialix) wrote : Re: bzr 2.2 @ win32: print non-ascii characters to console made in user_encoding() instead of terminal_encoding()

Martin gz, I was WRONG!

With Python 2.5 and bzr/2.2 sources I have correct output:

C:\Temp\5>C:\Python25\python C:\work\Bazaar\bzr-2a\2.2\bzr ls
Тест/
bzr: warning: some compiled extensions could not be loaded; see <https://answers.launchpad.net/bzr/+faq/703>

With Python 2.6 and bzr/2.2 sources I have borked output:

C:\Temp\5>C:\Python26\python C:\work\Bazaar\bzr-2a\2.2\bzr ls
'?бв/
bzr: warning: some compiled extensions could not be loaded; see <https://answers.launchpad.net/bzr/+faq/703>

I have

C:\Temp\5>C:\Python26\python -V
Python 2.6.5

While bzr.exe has

C:\Temp\5>bzr version
Bazaar (bzr) 2.2.0
  Python interpreter: C:\Program Files\Bazaar\python26.dll 2.6.4

So this SMELLS like either bug in Python interpreter OR bug in our compatibility with Python 2.6

affects: bzr-windows-installers → bzr
Revision history for this message
Alexander Belchenko (bialix) wrote :

C:\Temp\5>bzr ls
'?бв/

Revision history for this message
Alexander Belchenko (bialix) wrote :

When I said bzr/2.2 sources I mean

C:\work\Bazaar\bzr-2a\2.2>bzr log -l1
------------------------------------------------------------
revno: 5082 [merge]
committer: Canonical.com Patch Queue Manager <email address hidden>
branch nick: 2.2
timestamp: Fri 2010-08-20 04:53:57 +0100
message:
  (mbp) typo fixes and additional summary into 2.2 whatsnew (Martin Pool)

tags: added: python2.6
Revision history for this message
Alexander Belchenko (bialix) wrote :

Martin gz's experiments:

Python 2.7 (trunk, Jul 19 2010, 18:02:53) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print 'Тест'
Тест
>>> import locale
>>> locale.setlocale(locale.LC_ALL, "")
'English_United Kingdom.1252'
>>> print 'Тест'
'?aa

Revision history for this message
Alexander Belchenko (bialix) wrote :

And for completness:

Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> print 'Тест'
Тест
>>> locale.setlocale(locale.LC_ALL,"")
'Russian_Russia.1251'
>>> print 'Тест'
Тест

Revision history for this message
Alexander Belchenko (bialix) wrote :

Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import locale
>>> print 'Тест'
Тест
>>> locale.setlocale(locale.LC_ALL,"")
'Russian_Russia.1251'
>>> print 'Тест'
'?бв

summary: - bzr 2.2 @ win32: print non-ascii characters to console made in
- user_encoding() instead of terminal_encoding()
+ Python 2.6 @ win32: print non-ascii characters to console produce borked
+ output
Revision history for this message
Alexander Belchenko (bialix) wrote :

NOTE for myself: this is all tested on English Windows XP with Russian locale settings. I should test the same on native Russian Windows XP and ensure it behaves the same there.

description: updated
description: updated
Revision history for this message
Gary van der Merwe (garyvdm) wrote : Re: [Bug 631350] Re: Python 2.6 @ win32: print non-ascii characters to console produce borked output

On 07/09/2010 13:43, Launchpad Bug Tracker wrote:
> ** Branch linked: lp:~gz/bzr/setlocale_on_posix_only_631350

I'll build an installer tonight.

Martin Pool (mbp)
Changed in bzr:
status: Confirmed → Fix Released
Revision history for this message
Martin Pool (mbp) wrote :

marking released on the grounds the branch is landed

Changed in bzr:
assignee: nobody → Martin [gz] (gz)
milestone: none → 2.2.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.