Comment 41 for bug 1069019

Julian Andres Klode (juliank) wrote :

Attaching a better approach then the currently broken one.

Instead of using unicode in Python 2, keep bytes there. This will work all the time, because those objects do not care about encoding at all. An additional unicode() is provided to decode an entry, considering it as UTF-8.

In Python 3, open the files in UTF-8, and provide a bytes() method to encode them as UTF-8 again.

Please note that this bug does not happen if you do not run programs in the non-unicode C locale. If you want to interface with utf-8 files in a language-agnostic environment, use the C.UTF-8 locale. Other apps will break just as well.