Comment 18 for bug 128496

Revision history for this message
Martin von Gagern (gagern) wrote : Automatic conversion between byte and unicode strings

I investigated the automatic conversion between byte and unicode strings, and found a really interesting thread on the bazaar mailing list called "About encoding issues": http://thread.gmane.org/gmane.comp.version-control.bazaar-ng.general/10908

There is a function called sys.setdefaultencoding to set the encoding used for such implicit transformations. Unfortunately it usually gets removed by site.py, and should only be called before, so it's a bit tricky to use and it will affect all modules. Writing your own character encoding, it is possible to trace automatic conversions without throwing an exception each time one happens.

I realized that idea as a proof of concept in my branch https://code.launchpad.net/~gagern/bzr/str-unicode but I hope for the bzr development community to extend this further, as I can't possibly investigate all automatic character conversions in bazaar all by myself.

Specifying a regular expression using the STR_UNICODE environment variable, the output can be restricted to bzr-svn, but even there the number of automatic conversions is astronomical, and some more efficient log format is required.