UnicodeDecodeError when using IE, Safari

Bug #530620 reported by Ole Christian Helset on 2010-03-02
This bug affects 2 people
Affects Status Importance Assigned to Milestone

Bug Description

Using Zope 2.11.5, default-zpublisher-encoding utf-8, rendering content fails in IE and Safari, as they (at the time of writing) doesn't provide the Accept-Charset header, if the content contains a string in utf-8.

In http.py (zope/publisher/http.py), the HTTPCharsets.getPreferredCharsets() method returns an empty list, causing a UnicodeDecodeError in zope, when a tal:content string contains utf-8 encoded string with fi. norwegian characters (ø > \xc3\xb8).

I made a simple test, just a default page template, giving it a title with such a character (fi. Pølse):
    <meta http-equiv="content-type" content="text/html;charset=utf-8">
    <tal:block content="python:repr(template.title)" /><br />
    <tal:block content="python:repr(template.title.encode('latin-1'))" /><br />
    <tal:block content="python:repr(template.title.encode('utf-8'))" /><br />
    <tal:block content="python:title" define="title python:template.title" /><br />
    <tal:block content="python:title" define="title python:template.title.encode('utf-8')" /><br />

In Firefox the output is fine:

In IE and Safari it raises a UnicodeDecodeError

If HTTPCharsets.getPreferredCharsets() returns ['utf-8'], it works fine in IE and Safari as well.

My changes to http.py:
from zope.publisher.base import RequestDataGetter
+from ZPublisher import Converters


        # Quoting RFC 2616, $14.2: If no "*" is present in an Accept-Charset
        # field, then all character sets not explicitly mentioned get a
        # quality value of 0, except for ISO-8859-1, which gets a quality
        # value of 1 if not explicitly mentioned.
        # And quoting RFC 2616, $14.2: "If no Accept-Charset header is
        # present, the default is that any character set is acceptable."
        if not sawstar and not sawiso88591 and header_present:
- charsets.append((1.0, 'iso-8859-1'))
+ charsets.append((1.0, Converters.default_encoding))
        # UTF-8 is **always** preferred over anything else.
        # Reason: UTF-8 is not specific and can encode the entire unicode
        # range , unlike many other encodings. Since Zope can easily use very
        # different ranges, like providing a French-Chinese dictionary, it is
        # always good to use UTF-8.
        charsets = [charset for quality, charset in charsets]
- if sawstar and 'utf-8' not in charsets:
+ if not sawstar and 'utf-8' not in charsets: # IS THIS BAD, TO FORCE IN UTF-8???
            charsets.insert(0, 'utf-8')

The question is then, is this a problem, forcing utf-8 here (or the default-zpublisher-encoding) when the HTTP_ACCEPT_CHARSET is missing in the request?

affects: zope2 → zope.publisher
Roberto Maurizzi (r-maurizzi) wrote :

Any news on this one? IE9 doesn't set this header too, so the problem won't go away (unless it's solved in some Zope > 2.11... and that's usually not a solution if you depend on some 'peculiar' Zope Product)

I can't avoid wondering where all the UTF-8s I wrote everywere (zope.conf, ZPT encoding field, <?xm>l, <head>, python.SetEncoding...) did go...

As a workaround that I find quite ugly and is Apache dependent: you can insert a Accept-Charset for utf-8 using mod_headers:

RequestHeader merge Accept-Charset utf-8

This adds utf-8 to the existing header or the whole header if it is missing.
Could also be made conditional with mod_setenvif but after a quick test it doesn't seem to cause problems to Firefox or Chrome and fixes IE rendering errors (pre 2.10.8) or 500s.

Colin Watson (cjwatson) wrote :

The zope.publisher project on Launchpad has been archived at the request of the Zope developers (see https://answers.launchpad.net/launchpad/+question/683589 and https://answers.launchpad.net/launchpad/+question/685285). If this bug is still relevant, please refile it at https://github.com/zopefoundation/zope.publisher.

Changed in zope.publisher:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers