Comment 6 for bug 254570

Revision history for this message
Dieter Maurer (d.maurer) wrote :

Here is a detailed critique on this so called "heuristics".

From Zope 2.10 on, Zope uses the unicode based page template implementation
of Zope 3. While Zope 3 consistently uses unicode to represent text,
this is not the case for Zope 2. As a result, Zope 2 page templates
have a high probablity to have to handle text represented by the Python
"str" datatype in some encoding.
It uses a (mis-named) so called "IUnicodeEncodingConflictResolver"
utility to handle conversion of such text into unicode.

By default, a "PreferredCharsetResolver" is registered as this utility.
"PreferredCharsetResolver" uses a completely insane approach to
determine the unknown encoding used by the server: it looks
at the current request and its "Accept-Charset" header to determine
a list of candidate encodings -- and uses the first one that does
not cause a "UnicodeDecodeError".
This is insane for the following reasons:

  * The server side used encoding is (almost) completely unrelated
     to the client's accepted charsets.

     The approach therefore uses an arbitrary set of encodings
     to guess the encoding used on the server side.

  * 8-bit encodings can decode almost any "str" string -- but
     the probability that they do the right thing is small
     unless it is the true encoding used by the string at hand.

     Cheching for the absence of "UnicodeDecodeError"s
     is thus an extremely week check -- even an impractical check

  * As different clients have different "Accept-Charset",
     the decoding behaviour get apparently non-deterministic:
     some clients will get the correct one, others will see
     wrong characters -- an analysis nightmare, unless one knows
     about this insanity

  * The broken implementation requires access to "REQUEST" via
     acquisition. As a consequence, it does not work in
     situations without "REQUEST" -- e.g. in scripting situations.

  * "five.localsitemanager" tries hard to remove "REQUEST" from
     the acquisition context for local utilities looked up by its
     registries. The broken implementation therefore can fail
     on these utilities and on objects wrapped in their acquisition
     context.