Can not mix-in unicode and non-latin1 charactor in DTML

Bug #142290 reported by Bug Importer
2
Affects Status Importance Assigned to Milestone
Zope 2
Opinion
Medium
Andreas Jung

Bug Description

I can't input mix-in unicode and non-latin1(CJK) charactor in DTML.

example:

<dtml-in "u'Japanese String','Japanese String'">

run this code then raise UnicodeError.

I found join_unicode function in cDocumentTemplate.c.
this function will decode non-ascii string to latin-1,
but string is not latin-1, error occur.

Tags: bug zope
Revision history for this message
Andreas Jung (ajung) wrote :

Status: Pending => Accepted

 Supporters added: ajung

def join_unicode(rendered):
    """join a list of plain strings into a single plain string,
    a list of unicode strings into a single unicode strings,
    or a list containing a mix into a single unicode string with
    the plain strings converted from latin-1
    """
    try:
        return ''.join(rendered)
    except UnicodeError:
        # A mix of unicode string and non-ascii plain strings.
        # Fix up the list, treating normal strings as latin-1
        rendered = list(rendered)
        for i in range(len(rendered)):
            if type(rendered[i]) is StringType:
                rendered[i] = unicode(rendered[i],'latin-1')
        return u''.join(rendered)

Wouldn't it be better to use sys.getdefaultencoding()
instead of "latin-1". This would allow users to specify
their encoding site-wide. Using 'latin-1' is definitely
a bad choice.

-aj

Revision history for this message
Andreas Jung (ajung) wrote :

Supporters added: htrd

Revision history for this message
Toby Dickenson (htrd) wrote :

The current behaviour is by design. I thought the documentation changes had been merged, but I cant find anything online right now. The best refereces I have are:
http://www.zope.org/Members/htrd/howto/unicode-zdg-changes
http://www.zope.org/Members/htrd/howto/unicode

> Wouldn't it be better to use sys.getdefaultencoding()

sys.getdefaultencoding (and its converse, setdefaultencoding) is not designed for application use. This is a hacking tool added during Python's early unicode development, and is likely to disappear in a future python revision.

As for some other configuration parameter, no. The current design rule was to provide a consistent behaviour when combining these different string representations in dtml. This is essential for writing robust Products.

(Imagine what would happen if 1+"hello" sometimes had a different behaviour, depending on some global configuration parameter. That would be no better than programming in C ;-)

The answer to the original poster's problem is that he should explicity encode the unicode string into the approporiate character encoding. DTML will happily combine 8-bit strings the same way it always has done, provided it does not encounter a unicode string in the same page.

<dtml-var "'japanese string'">
<dtml-var "u'japanese string'.encode('japanese encoding')">

Alternatively, if you want to work entirely in unicode:

<dtml-var "unicode('japanese string','japanese encoding')">
<dtml-var "u'japanese string'">

Essentially, you cant mix pre-encoded strings and unicode string objects inside the same dtml.

This is covered in
http://www.zope.org/Members/htrd/howto/unicode
"Pages That Do Not Expect Unicode"

This behaviour changed between zope 2.5 and 2.6. In 2.5, the dtml posted by the original poster would have raised a UnicodeError exception. For this reason, this change in 2.6 is not judged to be dangerously incompatible.

Revision history for this message
Andreas Jung (ajung) wrote :

Status: Accepted => Rejected

very good description.
can we add this to the documentation somewhere in the doc folder
of zope?

-aj

Revision history for this message
Thimo Kraemer (thimo-kraemer) wrote :

Wouldn't it be better to convert plain strings from UTF-8 first and fall back to Latin-1 only on error?
This gives the ability to support at least two common used encodings.

Workaround:
The attached monkeypatch modifies the method string.join, which is used by cDocumentTemplate.c later on.
But all this should be implemented directly in cDocumentTemplate.c.

Changed in zope2:
status: Invalid → Opinion
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.