getPreferredCharsets() returns iso-8859-1 and not utf-8 when HTTP_ACCEPT_CHARSET not present in request

Reported by Jostein Leira on 2007-02-16
20
Affects Status Importance Assigned to Milestone
Zope 2
Medium
Unassigned

Bug Description

I'm not sure if I'm sumitting this the right place or if this is just a local problem not affecting any other, but I have the following problem:

While using Internet Explorer 7 (IE7), the method getPreferredCharsets() in the class HTTPCharsets (http.py) returns 'iso-8859-1' and not 'utf-8' as expected. As far as I know, IE7 does not set the HTTP_ACCEPT_CHARSET in the request. Reading the source I would expect that no HTTP_ACCEPT_CHARSET should result in a return value of 'utf-8'.

At least on my system the line 996 of /lib/python/zope/publisher/http.py

   header_present = 'HTTP_ACCEPT_CHARSET' in self.request

sets header_present = True, even if self.request does not contain 'HTTP_ACCEPT_CHARSET'!

Suspecting a problem with the line, not understanding why, I changed it to:

   header_present = 'HTTP_ACCEPT_CHARSET' in str(self.request)

This resolves my problem.

###################
Other charset related settings I have changed:

Have set sys.setdefaultencoding('utf-8') in /usr/local/lib/python2.4/site.py.
Have set management_page_charset='utf-8' as property of / in ZMI.
Have set default-zpublisher-encoding utf-8 in etc/zope.conf.

Adding for debug in /lib/python/zope/publisher/http.py (around line 1000):

    print type(self.request)

returns:

    <type 'instance'>

Andreas Jung (ajung) wrote :

Status: Pending => Rejected

This belongs into the Zope 3 bugtracker since it addresses an issues in the Zope 3 core

This might be a problem local to Zope 2, or it may not be a problem at all at this point. Hard to say. Point is:

* str(self.request) is not really a solution. In fact, it's pretty weird and wrong.

* type(self.request) returning <type 'instance'> is normal.

* sys.setdefaultencoding() is evil evil evil. Don't use that.

We need more info to reproduce this issue. Ideally, an HTTP transcript (e.g. using tcpwatch) would be best. Until then the issue remains rejected.

Jostein Leira (jostein-leira) wrote :

> = Comment - Entry #3 by philikon on Feb 16, 2007 11:52 am
> This might be a problem local to Zope 2, or it may not be a problem at
> all at this point. Hard to say. Point is:
>
> * str(self.request) is not really a solution. In fact, it's pretty weird and wrong.

How about chaning the line

    header_present = 'HTTP_ACCEPT_CHARSET' in self.request

to

    header_present = 'HTTP_ACCEPT_CHARSET' in self.request.keys()

This seems to work correct.

> * sys.setdefaultencoding() is evil evil evil. Don't use that.
If I don't change the default encoding I can't save any page templates containing non-ascii characters.

> We need more info to reproduce this issue. Ideally, an HTTP transcript
> (e.g. using tcpwatch) would be best. Until then the issue remains
> rejected.
What is it you want to look at. The request? I am pretty sure it does not contain any HTTP_ACCEPT_CHARSET statement in the header. At least it is not present when I print it out from the method in question.

Regard Jost

Jostein Leira (jostein-leira) wrote :

> = Comment - Entry #4 by jost on Feb 19, 2007 4:52 am
>
> > * sys.setdefaultencoding() is evil evil evil. Don't use that.
> If I don't change the default encoding I can't save any page templates
> containing non-ascii characters.

Correction: I can save page templates containing non-ascii characters, but not like this:

<tal:block tal:content="python:'æøå'"/>

Maciej Wisniowski (pigletto) wrote :

I can confirm that this bug exists in Zope 2.9.6 too.
Problem appears when there is no HTTP_ACCEPT_CHARSET in request, eg. when using IE6 but in fact this is a problem with usage of statement below which is true for every string that starts with 'HTTP_':

'HTTP_ACCEPT_CHARSET' in self.request

pdb session at zope/publisher/http.py:

-> header_present = 'HTTP_ACCEPT_CHARSET' in self.request
(Pdb) l
982 def getPreferredCharsets(self):
983 '''See interface IUserPreferredCharsets'''
984 charsets = []
985 sawstar = sawiso88591 = 0
986 import pdb;pdb.set_trace()
987 -> header_present = 'HTTP_ACCEPT_CHARSET' in self.request
988 for charset in self.request.get('HTTP_ACCEPT_CHARSET', '').split(','):
989 charset = charset.strip().lower()
990 if charset:
991 if ';' in charset:
992 charset, quality = charset.split(';')

(Pdb) p self.request['HTTP_ACCEPT_CHARSET']
''

(Pdb) 'HTTP_ACCEPT_CHARSET' in self.request
True

(Pdb) 'HTTP_ACCEPT_CHARSET' in self.request.keys()
False

(Pdb) p 'HTTP_ANYTHING' in self.request
True

(Pdb) p self.request
<HTTPRequest, URL=http://localhost:8084/snap/shot/add_sth.html>

(Pdb) p self.request.__class__
<class ZPublisher.HTTPRequest.HTTPRequest at 0x2aaaade9ae90>

(Pdb) p self.request.keys()
['-C', 'ACTUAL_URL', 'AUTHENTICATED_USER', 'AUTHENTICATION_PATH', 'BASE1', 'BASE2', 'BASE3', 'BASE4', 'BASE5', 'BASE6', 'GATEWAY_INTERFACE', 'HTTP_ACCEPT', 'HTTP_ACCEPT_ENCODING', 'HTTP_ACCEPT_LANGUAGE', 'HTTP_COOKIE', 'HTTP_HOST', 'HTTP_USER_AGENT', 'PARENTS', 'PATH_INFO', 'PATH_TRANSLATED', 'PUBLISHED', 'REMOTE_ADDR', 'REQUEST_METHOD', 'RESPONSE', 'SCRIPT_NAME', 'SERVER_NAME', 'SERVER_PORT', 'SERVER_PROTOCOL', 'SERVER_SOFTWARE', 'SERVER_URL', 'SESSION', 'TraversalRequestNameStack', 'URL', 'URL1', 'URL2', 'URL3', 'URL4', 'URL5', '_ZopeId', '__ac', 'areYourCookiesEnabled', 'disable_border']
(Pdb)

Tres Seaver (tseaver) wrote :

In Zope2's HTTPRequest, any key starting with 'HTTP_' will be
returned as having a default empty string value if the key
is not actually present. The following might be a better bridge::

  header_present = bool(request.get('HTTP_ACCEPT_CHARSET'))

We'll need to add tests for this, as well.

Tres Seaver (tseaver) wrote :

Status: Rejected => Pending

Andreas Jung (ajung) wrote :

"""

I'm not sure if I'm sumitting this the right place or if this is just a local problem not affecting any other, but I have the following problem:

While using Internet Explorer 7 (IE7), the method getPreferredCharsets() in the class HTTPCharsets (http.py) returns 'iso-8859-1' and not 'utf-8' as expected. As far as I know, IE7 does not set the HTTP_ACCEPT_CHARSET in the request. Reading the source I would expect that no HTTP_ACCEPT_CHARSET should result in a return value of 'utf-8'.
"""

getPreferredCharsets() returns an empty list if HTTP_ACCEPT_CHARSET is not present in the request. There is a dedicated test for this case in test_httpcharsets.py. I can't see how it can return 'iso-8859-15' or even 'utf-8'?!

Andreas Jung (ajung) wrote :

Changes: submitter email, edited transcript

Tres Seaver (tseaver) wrote :

> = Comment - Entry #9 by ajung on May 26, 2007 9:43 pm

> getPreferredCharsets() returns an empty list if HTTP_ACCEPT_CHARSET is
> not present in the request. There is a dedicated test for this case in
> test_httpcharsets.py. I can't see how it can return 'iso-8859-15' or even
> 'utf-8'?!

A Zope2 request *always* says that it has *any* header starting with
'HTTP_', which defeats that test.

Malthe Borch (mborch) wrote :

Fixed in r84616.

Andreas Jung (ajung) on 2008-03-12
Changed in zope2:
status: New → Fix Committed
Tres Seaver (tseaver) wrote :

Per:

 http://svn.zope.org/zope.publisher/trunk/CHANGES.txt?view=markup

That fix has been propagated only to 3.5.1 and later versions of the zope.publisher
package. Here are the versions currently used by the Zope2 branches:

- Zope 2.9 branch: zope.publisher 3.2.3 (in the Zope3 tree)

- Zope 2.10 branch: zope.publisher 3.3.2 (in the Zope3 tree)

- Zope 2.11 branch and trunk: zope.publisher 3 4.2

Closing this bug will require preparing and releasing new versions of zope.publisher
for all desired Zope2 branches. An alternative would be to point the svn:externals
for those Zope2 branches to the zope.publisher 3.5.2 release.

Note that bug 160698 depends on getting the fix propagated.

Artur Zaprzała (arturz) wrote :

I'm currently using Zope 2.11.1 and I'm affected by this bug. My solution (patch attached) is to replace self.request with self.request.environ in getPreferredCharsets() (HTTP headers belong to the request.environ)

Why is it so important for getPreferredCharsets() to return an empty list when there is no HTTP_ACCEPT_CHARSET?
 - Products.PageTemplates.unicodeconflictresolver.PreferredCharsetResolver.resolve appends to the list the value of context.management_page_charset and sys.getdefaultencoding() (in my case both are 'utf-8')
 - Because getPreferredCharsets() returned an empty list, the first item on the list is now 'utf-8'
 - Now PageTemplates can decode correctly non-unicode strings.

Artur Zaprzała (arturz) wrote :

Philipp von Weitershausen wrote that sys.setdefaultencoding() is evil. I call sys.setdefaultencoding() from sitecustomize.py. This allows to do correct implicit charset conversion when mixing unicode and non-unicode strings with + and % operators. I hit on this problem hard a few years ago and I don't see other solution until Zope runs on Python 3.0 in all-unicode world.

Andreas Jung (ajung) wrote :

Changing the default encoding of Pythion is unsupported and not best-practise.

Hanno Schlichting (hannosch) wrote :

The fix has been backported to zope.publisher 3.3.3 and 3.4.3. It is released in Zope 2.11.2 and will be part of Zope 2.10.8.

I can confirm that the fix works for me. In a client project where we had some problems with a search form when using IE it helped to update Zope 2.10.5 to 2.10.8. (Well, we went to 2.10.9 immediately, but the fix is in 2.10.8.)
So this bug can be marked as 'fix committed'. (I'll try that now.)

Changed in zope2:
status: Fix Committed → Fix Released

I meant to say: this bug can be marked as 'fix released' now. And apparently I am allowed to do that. :)

Charlie_X (charlie) wrote :

I don't think the sending back [] is of any use to man nor beast. From the w3c specification:

"""If no Accept-Charset header is present, the default is that any character set is acceptable. If an Accept-Charset header is present, and if the server cannot send a response which is acceptable according to the Accept-Charset header, then the server SHOULD send an error response with the 406 (not acceptable) status code, though the sending of an unacceptable response is also allowed.""" (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html)

So, we have two choices:

1) send default_zpublisher_encode, which defaults to Latin-15 if not set explicitly

2) explicitly set UTF-8 as the comments in http.py propose for other situations

While I would like default_zpublisher_encode to default to UTF-8 and use that I can imagine such a change potentially causing problems for ZMI work and should be handled separately. I propose, therefore, to return UTF-8 where the client does not set this header.

Charlie_X (charlie) wrote :

Unset or empty ACCEPT_CHARSET header will get UTF-8 as per #117902

Changed in zope2:
status: Fix Released → Fix Committed
Tres Seaver (tseaver) on 2013-02-04
Changed in zope2:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers