Default IUserPreferredCharsets' use of Zope 2's request problematic

Bug #160968 reported by Daniel Nouri
8
Affects Status Importance Assigned to Milestone
Zope 2
Fix Released
Undecided
Unassigned

Bug Description

The IUserPreferredCharsets implementation of Zope 3 found in zope.publisher.http.HTTPCharsets has the following condition in it to check if the HTTP_ACCEPT_CHARSET header is available:

    header_present = 'HTTP_ACCEPT_CHARSET' in self.request

However, with Zope 2's request will return '' (the empty string) for any header that starts with 'HTTP_', see ZPublisher.HTTPRequest.HTTPRequest.get.

Ultimately, this results in the HTTPCharsets.getPreferredCharsets to return ['iso-8859-1'], where it should really return 'UTF-8'.

To understand this problem better, look at Products.Five.browser.decode.processInputs, which uses the negotiator to find out which charset to use to convert form variables. For browsers that do not send the 'HTTP_ACCEPT_CHARSET' header, this will result in wrongly encoded form values. To reproduce this, fill in Chinese characters to any Five formlib form with Internet Explorer 6.0. Since Firefox sends HTTP_ACCEPT_CHARSET, it's not a problem there.

Revision history for this message
Shimizukawa (shimizukawa) wrote :

This problem has come to light by Plone3.

zope.formlib needs unicode decoded field object, and Products.Five.browser.decode.processInputs provides unicode converted request.form. Charset provided by IUserPreferredCharsets.getPreferredCharsets(), and getPreferredCharsets() decide charset by HTTP_ACCEPT_CHARSET. If HTTP_ACCEPT_CHARSET was not sent from client browser (IE6,7, Safari), getPreferredCharsets() return iso-8859-1.

I think use default-zpublisher-encoding value if HTTP_ACCEPT_CHARSET was not provided.

references::

- https://bugs.launchpad.net/zope2/+bug/143873
- http://dev.plone.org/plone/ticket/8185

Revision history for this message
Malthe Borch (mborch) wrote :

Fwiw, bug 143873 (referenced) was fixed in r84616; however, this seems not to have propagted to a Zope 2 release (at least not 2.10.x).

Revision history for this message
Ole Christian Helset (ochelset) wrote :

Using Zope 2.11.5, default-zpublisher-encoding utf-8, rendering content fails in IE and Safari, as they (at the time of writing) doesn't provide the Accept-Charset header, if the content contains a string in utf-8.

In http.py (zope/publisher/http.py), the HTTPCharsets.getPreferredCharsets() method returns an empty list, causing a UnicodeDecodeError in zope, when a tal:content string contains utf-8 encoded string with fi. norwegian characters (ø > \xc3\xb8).

I made a simple test, just a default page template, giving it a title with such a character (fi. Pølse):
<html>
  <head>
    <meta http-equiv="content-type" content="text/html;charset=utf-8">
  </head>
  <body>
    <tal:block content="python:repr(template.title)" /><br />
    <tal:block content="python:repr(template.title.encode('latin-1'))" /><br />
    <tal:block content="python:repr(template.title.encode('utf-8'))" /><br />
    <tal:block content="python:title" define="title python:template.title" /><br />
    <tal:block content="python:title" define="title python:template.title.encode('utf-8')" /><br />
  </body>
</html>

In Firefox the output is fine:
u'P\xf8lse'
'P\xf8lse'
'P\xc3\xb8lse'
Pølse
Pølse

In IE and Safari it raises a UnicodeDecodeError

If HTTPCharsets.getPreferredCharsets() returns ['utf-8'], it works fine in IE and Safari as well.

My changes to http.py:
from zope.publisher.base import RequestDataGetter
+from ZPublisher import Converters

...

        # Quoting RFC 2616, $14.2: If no "*" is present in an Accept-Charset
        # field, then all character sets not explicitly mentioned get a
        # quality value of 0, except for ISO-8859-1, which gets a quality
        # value of 1 if not explicitly mentioned.
        # And quoting RFC 2616, $14.2: "If no Accept-Charset header is
        # present, the default is that any character set is acceptable."
        if not sawstar and not sawiso88591 and header_present:
- charsets.append((1.0, 'iso-8859-1'))
+ charsets.append((1.0, Converters.default_encoding))
        # UTF-8 is **always** preferred over anything else.
        # Reason: UTF-8 is not specific and can encode the entire unicode
        # range , unlike many other encodings. Since Zope can easily use very
        # different ranges, like providing a French-Chinese dictionary, it is
        # always good to use UTF-8.
        charsets.sort(sort_charsets)
        charsets = [charset for quality, charset in charsets]
- if sawstar and 'utf-8' not in charsets:
+ if not sawstar and 'utf-8' not in charsets: # IS THIS BAD, TO FORCE IN UTF-8???
            charsets.insert(0, 'utf-8')

The question is then, is this a problem, forcing utf-8 here (or the default-zpublisher-encoding) when the HTTP_ACCEPT_CHARSET is missing in the request?

Revision history for this message
Tres Seaver (tseaver) wrote :

AFAICT, this bug is a duplicate of lp:143873, for which we have long since released fixed versions.

Changed in zope2:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.