Zim

Overview
Code
Bugs
Blueprints
Translations
Answers

Bug #572805
Comment #4

Comment 4 for bug 572805

Revision history for this message

Jaap Karssenberg (jaap.karssenberg) wrote on 2010-05-04:

Summarizing this change is based on the fact that most python filesystem functions already do encoding themselves. All we need to do is decode back to unicode after reading the files. Seems a good idea to at least try decode as UTF-8 when decoding in the preferred encoding fails (see bug #561121). Any file that can not be decoded even after fallback should be treated as invalid (?).

To avoid unicode encoding errors we should also do the encoding ourselves and handle errors. I propose applying utf-8 + url encoding for any chars that could not be encoded.

An exception to this rule is for win32 where the API is slightly different and prefers unicode strings.

See this page for some details: http://kofoto.rosdahl.net/wiki/UnicodeInPython