UnicodeDecodeError crack in doctest

Bug #69988 reported by Francis J. Lacoste
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
High
Unassigned
zope.testing
Fix Released
High
Christian Theune

Bug Description

Putting this in a doctest will result in an UnicodeDecodeError when running it:

    Unicode crack:

        >>> print u'abc'
        abc

        >>> print u'\xe9'.encode('utf-8')
        Ã©

Traceback (most recent call last):
  File "/usr/lib/python2.4/unittest.py", line 260, in run
    testMethod()
  File "/home/francis/Launchpad/tt-localized-requests-notifications/lib/zope/testing/doctest.py", line 2190, in runTest
    failures, tries = runner.run(
  File "/home/francis/Launchpad/tt-localized-requests-notifications/lib/zope/testing/doctest.py", line 1389, in run
    return self.__run(test, compileflags, out)
  File "/home/francis/Launchpad/tt-localized-requests-notifications/lib/zope/testing/doctest.py", line 1265, in __run
    got = self._fakeout.getvalue() # the actual output
  File "/home/francis/Launchpad/tt-localized-requests-notifications/lib/zope/testing/doctest.py", line 263, in getvalue
    result = StringIO.getvalue(self)
  File "/usr/lib/python2.4/StringIO.py", line 271, in getvalue
    self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

This is crack because if you remove one or the either test. The test will run just fine.

Also, if you replace the ur'' with a regular r'', it will also run fine. You can place the ur'' print below the other one and you'll also get the error.

What is frustating is that you don't have any idea where this error comes from since it aborts the whole output printing.

What is the moral here, is that once there are 8bit characters printed on the doctest stdout, it sets a UnicodeDecodeError time bomb that will trigger as soon as any unicode string is printed. And trust me, you can have a hard time knowing which print statement generated 8bits non-unicode string.

Revision history for this message
Stuart Bishop (stub) wrote :

This is a Zope or Python bug, isn't it?

Revision history for this message
Francis J. Lacoste (flacoste) wrote :

Yes, the problem is in doctest.py

I would expect doctest.py to catch that kind of error and still output meaningul results for the other tests.

I should probably post it into python and/or zope collector. I first posted it here because I wanted to have a place to document that tricky problem on which you can waste a lot of time.

Revision history for this message
Guilherme Salgado (salgado) wrote : Re: [Bug 69988] UnicodeDecodeError crack in doctest

We had some discussion about this on the launchpad mailing list back in June;
you can see it on the "Problem with unicode on new-style pagetests" thread.

 subscribe

Revision history for this message
Francis J. Lacoste (flacoste) wrote :

The thread mentioned by salgado is archived there: https://lists.ubuntu.com/mailman/private/launchpad/2006-June/009612.html

Revision history for this message
Francis J. Lacoste (flacoste) wrote :

See also http://wiki.python.org/moin/PrintFails for some possible work-arounds.

Revision history for this message
Barry Warsaw (barry) wrote :

What about making this change to lib/zope/testing/doctest.py, 'round about
line 1141:

=== modified file 'src/zope/testing/doctest.py'
--- src/zope/testing/doctest.py 2006-04-11 14:08:08 +0000
+++ src/zope/testing/doctest.py 2007-09-24 20:32:49 +0000
@@ -1138,7 +1138,8 @@
         self._name2ft = {}

         # Create a fake output target for capturing doctest output.
- self._fakeout = _SpoofOut()
+ import codecs
+ self._fakeout = codecs.getwriter('utf-8')(_SpoofOut())

     #/////////////////////////////////////////////////////////////////
     # Reporting methods

This way, doctest's print goes to a utf-8 compatible 'terminal'.

Revision history for this message
Barry Warsaw (barry) wrote :
Changed in launchpad:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
David Allouche (ddaa) wrote :

This bug is currently blocking me from fixing bug 146302 properly.

The proper fix for bug 146302 involves using smartquote (at least with the format string sabdfl asked for). Which causes this UnicodeDecodeError crack to be triggered all over the place.

I understand that we generally want to avoid diverging from Zope upstream, but we also generally do not want infrastructure bugs to block us from doing The Right Thing™.

Revision history for this message
Francis J. Lacoste (flacoste) wrote :

Raising to high since this is affecting more and more developers.

Changed in launchpad:
assignee: nobody → flacoste
importance: Medium → High
Changed in launchpad:
milestone: 1.1.10 → 1.1.11
Changed in launchpad:
milestone: 1.1.11 → none
Revision history for this message
Muharem Hrnjadovic (al-maisan) wrote :

I am sorry if this sounds too harsh but the fact that this bug is open for almost two years now is a disgrace (even though a patch was kindly contributed by barry in September of *last* year).

I lost many hours due to this bug and suffered frustration on a level that causes physical pain because two of my tests would fail in most mysterious and obscure ways.

Last night (in a post-midnight, "last day of week 3" hacking session) I consulted Barry since the code I was working on happened to be related to sending emails and he might know something.

To cut the story short: he pointed me to this bug and I applied his patch. The effects were immediate and "miraculous": one of the tests in question worked immediately, in case of the other I got a *meaningful* error message and was able to resolve the error in a couple of minutes.

I have had only a few interactions with Barry so far but have to say that he's as good as gold.

Barry, I am already starting to admire you :) and should you happen to read these lines: please do come to London. I would love to meet you in person :)

Changed in launchpad-foundations:
assignee: flacoste → nobody
Jonathan Lange (jml)
tags: added: build-infrastructure
removed: infrastructure test-system
Revision history for this message
Christian Theune (ctheune) wrote :

I've been seeing this too, lately. I'll check for Barry's fix.

Changed in zope.testing:
assignee: nobody → Christian Theune (ct-gocept)
status: New → Confirmed
importance: Undecided → High
Revision history for this message
JC Brand (jcbrand) wrote :

Cillian de Róiste and I took a look at this at the DZUG Conf sprint.

We added the following doctest, to try and reproduce the reported bug:

----------------------------------------------------------------------------------------
    >>> print u'abc'
    abc

    >>> print u'\xe9'.encode('utf-8')
    é

    >>> print u'\xe9'.encode('utf-8')
    Ã©
----------------------------------------------------------------------------------------

We received the following output: (which seems to suggest that the bug is no longer present, or at least not reproducible by us.)

File "/home/jc/work/zope/zope.testing/trunk/src/zope/testing/testrunner/testrunner-unicode.txt", line 13, in testrunner-unicode.txt
Failed example:
    print u'\xe9'.encode('utf-8')
Expected:
    Ã©
Got:
    é

As Christian mentioned and as also mentioned here http://bugs.python.org/issue1293741, when error messages of nonmatching output containing extended utf-8 encoded characters are displayed, the UnicodeDecodeError occurs.

But as seen from the above example, this wasn't the case for us.

We used the 'test-ztk-zope.testing' test in Jim's branch: 'zopetoolkit/branches/jim-kgs/kgs'.

We used python 2.4.6 and python 2.6.2 on Debian and Ubuntu, and both times the test passed succesfully.

We then wanted to test with python 2.3.7 to see if it was broken there, but it seems that python 2.3 is not anymore supported by zope.testing 3.8.0, so we left it.

Has this bug perhaps been fixed upstream? Or is this perhaps related to the system locale (or some other system setting?)

We ran the test with the following locales: en_IE.UTF-8, en_US.UTF-8 and en_GB.ISO8859-1 .

Curtis Hovey (sinzui)
tags: added: tech-debt
Revision history for this message
Tres Seaver (tseaver) wrote :

I concur with JC Brand's report that the bug is not reproducible against the current zope.testing trunk. If there is a version which Launchpad relies on that *does* show the bug, please re-open if you would like the fix applied and a release made from that branch.

Changed in zope.testing:
status: Confirmed → Incomplete
Revision history for this message
Tres Seaver (tseaver) wrote :

I have just added a test which shows unicode (non-ASCII) rendering cleanly in the trunk:

  http://svn.zope.org/zope.testing/trunk/?rev=110640&view=rev

Adam Groszer (agroszer)
tags: added: bugday20100424
Revision history for this message
Gary Poster (gary) wrote :

I'll say that this is fixed in Foundations, then, since we are using zope.testing 3.9.4, which was released after Tres added the test.

Changed in launchpad-foundations:
status: Triaged → Fix Released
status: Fix Released → Fix Committed
milestone: none → 10.04
Tres Seaver (tseaver)
Changed in zope.testing:
status: Incomplete → Fix Released
Curtis Hovey (sinzui)
Changed in launchpad-foundations:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.