Unicode error in PageTemplate

Bug #143431 reported by Yury Don
2
Affects Status Importance Assigned to Milestone
Zope 2
Won't Fix
Medium
Unassigned

Bug Description

I have management_page_charset property with value "koi8-r". With zope 2.8 I can't create or edit Page Template if it contains russian characters in koi8-r encoding. When I try to save such Page Template, I'm getting an error:
Error Type: UnicodeDecodeError
Error Value: 'ascii' codec can't decode byte 0xf2 in position 199: ordinal not in range(128)

If it's new (created in zope 2.8) object, traceback is:
Module ZPublisher.Publish, line 113, in publish
Module ZPublisher.mapply, line 88, in mapply
Module ZPublisher.Publish, line 40, in call_object
Module Products.PageTemplates.ZopePageTemplate, line 142, in pt_editAction
Module Shared.DC.Scripts.Bindings, line 311, in __call__
Module Shared.DC.Scripts.Bindings, line 348, in _bindAndExec
Module Products.PageTemplates.PageTemplateFile, line 110, in _exec
Module Products.PageTemplates.PageTemplate, line 103, in pt_render
<PageTemplateFile at /rft/adv/ll/pt_editForm>
Module StringIO, line 203, in getvalue

If it's old (created in zope 2.7) object, traceback is:
Module ZPublisher.Publish, line 113, in publish
Module ZPublisher.mapply, line 88, in mapply
Module ZPublisher.Publish, line 40, in call_object
Module Products.PageTemplates.ZopePageTemplate, line 135, in pt_editAction
Module Products.PageTemplates.PageTemplate, line 66, in pt_edit
Module Products.PageTemplates.ZopePageTemplate, line 224, in write
Module Products.PageTemplates.PageTemplate, line 144, in write

In zope 2.7 etherything was ok.

Tags: bug zope
Revision history for this message
Andreas Jung (ajung) wrote :

Changes: submitter email, importance (critical => medium)

Revision history for this message
Andreas Jung (ajung) wrote :

Could you try to figure out where the problem is? I've no idea how to reproduce this behaviour with utf8 or so...

Revision history for this message
Alexander Naberenkov (anaber) wrote :

I have exactly the same problem.
To fix it, I tried to insert "sys.setdefaultencoding('cp1251')" into the sitecustomize.py. Then I could upload file with russian characters in management console, but when I tried to view this one, I saw the '?' symbols instead of letters.

Revision history for this message
Christian Schildwaechter (faduci) wrote :

I have the same problems with German umlauts and the utf8 setting. If I set management_page_charset as a property of the root folder to 'utf8' and try to store a ZPT that contains umlauts, I get an error:

Error Type: UnicodeDecodeError
Error Value: 'ascii' codec can't decode byte 0xXX in position YYY: ordinal not in range(128)

If I try to save the exact same object in Zope 2.7.4 with the same setting for management_page_charset everything works fine (to my astonishment).

If I remove management_page_charset and force the browser to display the ZMI as utf8 before I paste the source of the ZPT, it works fine also.

So for me the problem seems to be that management_page_charset does not seem to work as it did in 2.7.4.

Revision history for this message
Florent Guillaume (efge) wrote :

This is probably due to broken browsers that don't post back content with the encoding specified in the page. What browser are you using? It works for me here with Safari.

Maybe an accept-charset should be used on the input form for these broken browsers.
See http://www.w3.org/TR/html40/interact/forms.html#h-17.3 for the spec.

Revision history for this message
Yury Don (gercon) wrote :

Due to servers crash I had no possibility to install zope for testing, and now I've installed 2.8.1 and tested on it.
The second error (with saving old page templates) was solved by changing method PageTemplate.write in PageTemplate.py
Now old page templates (created in zope 2.7) been saving normaly unless it contents python expressions with non-ascii (e.g. cyrillic) characters
    def write(self, text):
        assert type(text) in types.StringTypes
        if text[:len(self._error_start)] == self._error_start:
            errend = text.find('-->')
            if errend >= 0:
                text = text[errend + 4:]

        #Added by Don
        charset = getattr(self, 'management_page_charset', None)
        if charset and type(self._text) == types.StringType:
            try:
                unicode(self._text,'us-ascii')
            except UnicodeDecodeError:
                self._text = unicode(self._text, charset)
        #end Don's addition

        if self._text != text:
            self._text = text
        self._cook()

Now problem persists only when I try to save page template containing python expression with non-ascii characters
For example, I've created page template with one string:
<tal:block replace="python:request.get('Here word with cyrillic characters')" />
And when save it, got and error:
Compilation failed
exceptions.UnicodeEncodeError: 'ascii' codec can't encode characters in position 32-34: ordinal not in range(128)
I've commented some lines in method _cook() of PageTemplate class:
        #try:
        parser.parseString(self._text)
        self._v_program, self._v_macros = parser.getCode()
        #except:
        # self._v_errors = ["Compilation failed",
        # "%s: %s" % sys.exc_info()[:2]]
And now I get zope error with following traceback:

Module ZPublisher.Publish, line 113, in publish
Module ZPublisher.mapply, line 88, in mapply
Module ZPublisher.Publish, line 40, in call_object
Module Products.PageTemplates.ZopePageTemplate, line 135, in pt_editAction
Module Products.PageTemplates.PageTemplate, line 66, in pt_edit
Module Products.PageTemplates.ZopePageTemplate, line 224, in write
Module Products.PageTemplates.PageTemplate, line 156, in write
Module Products.PageTemplates.PageTemplate, line 194, in _cook
Module TAL.HTMLTALParser, line 126, in parseString
Module TAL.HTMLParser, line 114, in feed
Module TAL.HTMLParser, line 159, in goahead
Module TAL.HTMLParser, line 349, in parse_endtag
Module TAL.HTMLTALParser, line 178, in handle_endtag
Module TAL.TALGenerator, line 801, in emitEndElement
Module TAL.TALGenerator, line 298, in emitCondition
Module TAL.TALGenerator, line 208, in compileExpression
Module Products.PageTemplates.TALES, line 135, in compile
Module Products.PageTemplates.ZRPythonExpr, line 35, in __init__
UnicodeEncodeError: 'ascii' codec can't encode characters in position 32-34: ordinal not in range(128)

Revision history for this message
Retsu (tyam) wrote :

Uploaded: 051116.gif

I hope our experience helps to fix this problem. It looks same problem.

Non-ASCII characters(eg. Japanese Kanji) in the page template can not be displayed properly. Any non-ASCII characters in the page template are shown as "?".
This happens with Zope 2.8, becuase Zope 2.8 start to use Unicode for page template as default. Following patch code is applied in Japan tentatively. I would like this problem fixed in the next release with this patch or better codes. Attached image file is showing page template code and displayed characters.

==================================================
Products/PageTemplateHotfix/__init__.py :
      import sys, types

      def pt_edit(self, text, content_type):
          if content_type:
              self.content_type = str(content_type)
          if hasattr(text, 'read'):
              text = text.read()
          self.write(text)

      def write(self, text):
          assert type(text) in types.StringTypes
          if text[:len(self._error_start)] == self._error_start:
              errend = text.find('-->')
              if errend >= 0:
                  text = text[errend + 4:]
          if self._text != text:
              self._text = text
          self._cook()

      def initialize(context):
          from Products.PageTemplates.PageTemplate import PageTemplate
          PageTemplate.write = write
          PageTemplate.pt_edit = pt_edit

==================================================

Revision history for this message
Yury Don (gercon) wrote :

This bug persists in zope 2.9.2 :(
When I write python expression containing non-latin characters I can't even save page temlate, getting an error
Error Type: UnicodeDecodeError
Error Value: 'ascii' codec can't decode byte 0xd0 in position 42: ordinal not in range(128)
When I write TAL expression containing non-latin characters (eg. <tal:block replace="request/word_with_characters_in_koi8-r | nothing" />), I can save page template, but get the same error when try to execute it.

Revision history for this message
Maciej Wisniowski (pigletto) wrote :

The problem seems to appear when ZPT is already saved as utf-8 encoded string and then we're trying to save it as unicode. The previous fix solves this problem but causes that other zpt's (in products) stopped working for me. My fix for that first compares types and then sources because line like that:

u'aał'=='aał'

causes UnicodeDecodeError.

import types

def write(self, text):
    assert type(text) in types.StringTypes
    if text[:len(self._error_start)] == self._error_start:
        errend = text.find('-->')
        if errend >= 0:
            text = text[errend + 4:]
    if type(self._text)!=type(text) or self._text != text:
        self._text = text
    self._cook()

def initialize(context):
    from Products.PageTemplates.PageTemplate import PageTemplate
    PageTemplate.write = write

Revision history for this message
Evgeny Prigorodov (eprigorodov) wrote :

We have the same problem with simple 'latin-1' encoding. PageTemplate is designed in the way that it cannot handle both Unicode and plain strings with non-ascii characters in the output of the same page. Here is the simplest way to reproduce that error in Zope 2.9:

<html>
  <body>
    <p tal:content="python: 'Non-ASCII: \xe9'" />
    <p tal:content="python: u'Unicode: \u9053'" />
  </body>
</html>

Here is our monkey-patch. It can handle different encodings in the different parts of Zope site. Separate encoding can be forced in any particular Folder by setting its "zpt_output_encoding" string property. Patch also accepts existing "management_page_charset" properties. Without any encoding set patch reproduces original behavior and raises the same UnicodeDecodeError in the same place.

Put it into something like "Products/ZPTHotfix/__init__.py" to activate:

"""Allow Unicode in the PageTemplates output"""

import new

from Products.PageTemplates.PageTemplate import FasterStringIO, PageTemplate
from Products.PageTemplates.ZopePageTemplate import ZopePageTemplate

class UnicodeAwareStringIO(FasterStringIO):

    def __init__(self, encoding):
        self.encoding = encoding
        FasterStringIO.__init__(self)

    def write(self, s):
        if isinstance(s, unicode):
            s = s.encode(self.encoding, 'xmlcharrefreplace')
        return FasterStringIO.write(self, s)

def patched_StringIO(self):
    # Comment from the original PageTemplate.StringIO():
        # Third-party products wishing to provide a full Unicode-aware
        # StringIO can do so by monkey-patching this method.
    encoding = getattr(self, 'zpt_output_encoding', None) \
               or getattr(self, 'management_page_charset', None)
    if encoding:
        return UnicodeAwareStringIO(encoding)
    else:
        return PageTemplate.StringIO(self)

def patch_zpt():
    # apply monkey-patch
    ZopePageTemplate._originalStringIO = ZopePageTemplate.StringIO
    ZopePageTemplate.StringIO = new.instancemethod(patched_StringIO, None,
                                                   ZopePageTemplate)

def revert_zpt():
    # restore original method
    original_method = getattr(ZopePageTemplate, '_originalStringIO', None)
    if original_method and ZopePageTemplate.StringIO != original_method:
        ZopePageTemplate.StringIO = original_method

patch_zpt()

Revision history for this message
Andreas Jung (ajung) wrote :

Since we maintain only Zope versions >= 2.10, this issue will likely remain unfixed.

Revision history for this message
Hanno Schlichting (hannosch) wrote :

In Zope 2.10+ page templates and TAL only handle Unicode internally, so this issue doesn't apply any longer.

Changed in zope2:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.