Launchpad itself

UnicodeDecodeError crack in doctest

Bug #69988 reported by Francis J. Lacoste on 2006-11-03

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Launchpad itself	Fix Released	High	Unassigned	Launchpad itself 10.04
	zope.testing	Fix Released	High	Christian Theune

Bug Description

Putting this in a doctest will result in an UnicodeDecodeError when running it:

Unicode crack:

>>> print u'abc'
abc

>>> print u'\xe9'.encode('utf-8')
Ã©

Traceback (most recent call last):
  File "/usr/lib/python2.4/unittest.py", line 260, in run
    testMethod()
  File "/home/francis/Launchpad/tt-localized-requests-notifications/lib/zope/testing/doctest.py", line 2190, in runTest
    failures, tries = runner.run(
  File "/home/francis/Launchpad/tt-localized-requests-notifications/lib/zope/testing/doctest.py", line 1389, in run
    return self.__run(test, compileflags, out)
  File "/home/francis/Launchpad/tt-localized-requests-notifications/lib/zope/testing/doctest.py", line 1265, in __run
    got = self._fakeout.getvalue() # the actual output
  File "/home/francis/Launchpad/tt-localized-requests-notifications/lib/zope/testing/doctest.py", line 263, in getvalue
    result = StringIO.getvalue(self)
  File "/usr/lib/python2.4/StringIO.py", line 271, in getvalue
    self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

This is crack because if you remove one or the either test. The test will run just fine.

Also, if you replace the ur'' with a regular r'', it will also run fine. You can place the ur'' print below the other one and you'll also get the error.

What is frustating is that you don't have any idea where this error comes from since it aborts the whole output printing.

What is the moral here, is that once there are 8bit characters printed on the doctest stdout, it sets a UnicodeDecodeError time bomb that will trigger as soon as any unicode string is printed. And trust me, you can have a hard time knowing which print statement generated 8bits non-unicode string.

Tags:

Revision history for this message

Stuart Bishop (stub) wrote on 2006-11-03:

This is a Zope or Python bug, isn't it?

Revision history for this message

Francis J. Lacoste (flacoste) wrote on 2006-11-03:

Yes, the problem is in doctest.py

I would expect doctest.py to catch that kind of error and still output meaningul results for the other tests.

I should probably post it into python and/or zope collector. I first posted it here because I wanted to have a place to document that tricky problem on which you can waste a lot of time.

Revision history for this message

Guilherme Salgado (salgado) wrote on 2006-11-05: Re: [Bug 69988] UnicodeDecodeError crack in doctest

We had some discussion about this on the launchpad mailing list back in June;
you can see it on the "Problem with unicode on new-style pagetests" thread.

Revision history for this message

Francis J. Lacoste (flacoste) wrote on 2006-11-06:

The thread mentioned by salgado is archived there: https://lists.ubuntu.com/mailman/private/launchpad/2006-June/009612.html

Revision history for this message

Francis J. Lacoste (flacoste) wrote on 2007-09-24:

See also http://wiki.python.org/moin/PrintFails for some possible work-arounds.

Revision history for this message

Barry Warsaw (barry) wrote on 2007-09-24:

What about making this change to lib/zope/testing/doctest.py, 'round about
line 1141:

=== modified file 'src/zope/testing/doctest.py'
--- src/zope/testing/doctest.py 2006-04-11 14:08:08 +0000
+++ src/zope/testing/doctest.py 2007-09-24 20:32:49 +0000
@@ -1138,7 +1138,8 @@
self._name2ft = {}

# Create a fake output target for capturing doctest output.
- self._fakeout = _SpoofOut()
+ import codecs
+ self._fakeout = codecs.getwriter('utf-8')(_SpoofOut())

#/////////////////////////////////////////////////////////////////
# Reporting methods

This way, doctest's print goes to a utf-8 compatible 'terminal'.

Revision history for this message

Barry Warsaw (barry) wrote on 2007-09-24:

See bug 144569

Francis J. Lacoste (flacoste) on 2007-09-26

Changed in launchpad:
importance:	Undecided → Medium
status:	New → Triaged

Revision history for this message

David Allouche (ddaa) wrote on 2007-10-02:

This bug is currently blocking me from fixing bug 146302 properly.

The proper fix for bug 146302 involves using smartquote (at least with the format string sabdfl asked for). Which causes this UnicodeDecodeError crack to be triggered all over the place.

I understand that we generally want to avoid diverging from Zope upstream, but we also generally do not want infrastructure bugs to block us from doing The Right Thing™.

Revision history for this message

Francis J. Lacoste (flacoste) wrote on 2007-10-02:

Raising to high since this is affecting more and more developers.

Changed in launchpad:
assignee:	nobody → flacoste
importance:	Medium → High

Francis J. Lacoste (flacoste) on 2007-10-13

Changed in launchpad:
milestone:	1.1.10 → 1.1.11

Francis J. Lacoste (flacoste) on 2007-11-19

Changed in launchpad:
milestone:	1.1.11 → none

Revision history for this message

Muharem Hrnjadovic (al-maisan) wrote on 2008-09-13:

#10

I am sorry if this sounds too harsh but the fact that this bug is open for almost two years now is a disgrace (even though a patch was kindly contributed by barry in September of *last* year).

I lost many hours due to this bug and suffered frustration on a level that causes physical pain because two of my tests would fail in most mysterious and obscure ways.

Last night (in a post-midnight, "last day of week 3" hacking session) I consulted Barry since the code I was working on happened to be related to sending emails and he might know something.

To cut the story short: he pointed me to this bug and I applied his patch. The effects were immediate and "miraculous": one of the tests in question worked immediately, in case of the other I got a *meaningful* error message and was able to resolve the error in a couple of minutes.

I have had only a few interactions with Barry so far but have to say that he's as good as gold.

Barry, I am already starting to admire you :) and should you happen to read these lines: please do come to London. I would love to meet you in person :)

Francis J. Lacoste (flacoste) on 2008-09-15

Changed in launchpad-foundations:
assignee:	flacoste → nobody

Jonathan Lange (jml) on 2009-09-02

tags:

added: build-infrastructure
removed: infrastructure test-system

Revision history for this message

Christian Theune (ctheune) wrote on 2009-09-10:

#11

I've been seeing this too, lately. I'll check for Barry's fix.

Changed in zope.testing:
assignee:	nobody → Christian Theune (ct-gocept)
status:	New → Confirmed
importance:	Undecided → High

Revision history for this message

JC Brand (jcbrand) wrote on 2009-09-13:

#12

Cillian de Róiste and I took a look at this at the DZUG Conf sprint.

We added the following doctest, to try and reproduce the reported bug:

----------------------------------------------------------------------------------------
>>> print u'abc'
abc

>>> print u'\xe9'.encode('utf-8')
é

>>> print u'\xe9'.encode('utf-8')
Ã©
----------------------------------------------------------------------------------------

We received the following output: (which seems to suggest that the bug is no longer present, or at least not reproducible by us.)

File "/home/jc/work/zope/zope.testing/trunk/src/zope/testing/testrunner/testrunner-unicode.txt", line 13, in testrunner-unicode.txt
Failed example:
    print u'\xe9'.encode('utf-8')
Expected:
    Ã©
Got:
    é

As Christian mentioned and as also mentioned here http://bugs.python.org/issue1293741, when error messages of nonmatching output containing extended utf-8 encoded characters are displayed, the UnicodeDecodeError occurs.

But as seen from the above example, this wasn't the case for us.

We used the 'test-ztk-zope.testing' test in Jim's branch: 'zopetoolkit/branches/jim-kgs/kgs'.

We used python 2.4.6 and python 2.6.2 on Debian and Ubuntu, and both times the test passed succesfully.

We then wanted to test with python 2.3.7 to see if it was broken there, but it seems that python 2.3 is not anymore supported by zope.testing 3.8.0, so we left it.

Has this bug perhaps been fixed upstream? Or is this perhaps related to the system locale (or some other system setting?)

We ran the test with the following locales: en_IE.UTF-8, en_US.UTF-8 and en_GB.ISO8859-1 .

Curtis Hovey (sinzui) on 2009-10-09

tags:

added: tech-debt

Revision history for this message

Tres Seaver (tseaver) wrote on 2010-04-08:

#13

I concur with JC Brand's report that the bug is not reproducible against the current zope.testing trunk. If there is a version which Launchpad relies on that *does* show the bug, please re-open if you would like the fix applied and a release made from that branch.

Changed in zope.testing:
status:	Confirmed → Incomplete

Revision history for this message

Tres Seaver (tseaver) wrote on 2010-04-08:

#14

I have just added a test which shows unicode (non-ASCII) rendering cleanly in the trunk:

http://svn.zope.org/zope.testing/trunk/?rev=110640&view=rev

Adam Groszer (agroszer) on 2010-04-21

tags:

added: bugday20100424

Revision history for this message

Gary Poster (gary) wrote on 2010-04-23:

#15

I'll say that this is fixed in Foundations, then, since we are using zope.testing 3.9.4, which was released after Tres added the test.

Changed in launchpad-foundations:
status:	Triaged → Fix Released
status:	Fix Released → Fix Committed
milestone:	none → 10.04

Tres Seaver (tseaver) on 2010-04-24

Changed in zope.testing:
status:	Incomplete → Fix Released

Curtis Hovey (sinzui) on 2010-06-02

Changed in launchpad-foundations:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

python-roundup #1293741
[2:6] Edit

Bug watches keep track of this bug in other bug trackers.