UnicodeDecodeError when using IE, Safari

Bug #530620 reported by Ole Christian Helset
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
zope.publisher
Invalid
Undecided
Unassigned

Bug Description

Using Zope 2.11.5, default-zpublisher-encoding utf-8, rendering content fails in IE and Safari, as they (at the time of writing) doesn't provide the Accept-Charset header, if the content contains a string in utf-8.

In http.py (zope/publisher/http.py), the HTTPCharsets.getPreferredCharsets() method returns an empty list, causing a UnicodeDecodeError in zope, when a tal:content string contains utf-8 encoded string with fi. norwegian characters (ø > \xc3\xb8).

I made a simple test, just a default page template, giving it a title with such a character (fi. Pølse):
<html>
  <head>
    <meta http-equiv="content-type" content="text/html;charset=utf-8">
  </head>
  <body>
    <tal:block content="python:repr(template.title)" /><br />
    <tal:block content="python:repr(template.title.encode('latin-1'))" /><br />
    <tal:block content="python:repr(template.title.encode('utf-8'))" /><br />
    <tal:block content="python:title" define="title python:template.title" /><br />
    <tal:block content="python:title" define="title python:template.title.encode('utf-8')" /><br />
  </body>
</html>

In Firefox the output is fine:
u'P\xf8lse'
'P\xf8lse'
'P\xc3\xb8lse'
Pølse
Pølse

In IE and Safari it raises a UnicodeDecodeError

If HTTPCharsets.getPreferredCharsets() returns ['utf-8'], it works fine in IE and Safari as well.

My changes to http.py:
from zope.publisher.base import RequestDataGetter
+from ZPublisher import Converters

...

        # Quoting RFC 2616, $14.2: If no "*" is present in an Accept-Charset
        # field, then all character sets not explicitly mentioned get a
        # quality value of 0, except for ISO-8859-1, which gets a quality
        # value of 1 if not explicitly mentioned.
        # And quoting RFC 2616, $14.2: "If no Accept-Charset header is
        # present, the default is that any character set is acceptable."
        if not sawstar and not sawiso88591 and header_present:
- charsets.append((1.0, 'iso-8859-1'))
+ charsets.append((1.0, Converters.default_encoding))
        # UTF-8 is **always** preferred over anything else.
        # Reason: UTF-8 is not specific and can encode the entire unicode
        # range , unlike many other encodings. Since Zope can easily use very
        # different ranges, like providing a French-Chinese dictionary, it is
        # always good to use UTF-8.
        charsets.sort(sort_charsets)
        charsets = [charset for quality, charset in charsets]
- if sawstar and 'utf-8' not in charsets:
+ if not sawstar and 'utf-8' not in charsets: # IS THIS BAD, TO FORCE IN UTF-8???
            charsets.insert(0, 'utf-8')

The question is then, is this a problem, forcing utf-8 here (or the default-zpublisher-encoding) when the HTTP_ACCEPT_CHARSET is missing in the request?

affects: zope2 → zope.publisher
Revision history for this message
Roberto Maurizzi (r-maurizzi) wrote :

Any news on this one? IE9 doesn't set this header too, so the problem won't go away (unless it's solved in some Zope > 2.11... and that's usually not a solution if you depend on some 'peculiar' Zope Product)

I can't avoid wondering where all the UTF-8s I wrote everywere (zope.conf, ZPT encoding field, <?xm>l, <head>, python.SetEncoding...) did go...

As a workaround that I find quite ugly and is Apache dependent: you can insert a Accept-Charset for utf-8 using mod_headers:

RequestHeader merge Accept-Charset utf-8

This adds utf-8 to the existing header or the whole header if it is missing.
Could also be made conditional with mod_setenvif but after a quick test it doesn't seem to cause problems to Firefox or Chrome and fixes IE rendering errors (pre 2.10.8) or 500s.

Revision history for this message
Colin Watson (cjwatson) wrote :

The zope.publisher project on Launchpad has been archived at the request of the Zope developers (see https://answers.launchpad.net/launchpad/+question/683589 and https://answers.launchpad.net/launchpad/+question/685285). If this bug is still relevant, please refile it at https://github.com/zopefoundation/zope.publisher.

Changed in zope.publisher:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.