UnicodeDecodeError: 'utf8' codec can't decode bytes

Reported by Zooko Wilcox-O'Hearn on 2009-09-22
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Tahoe-LAFS
New
Unknown
pyOpenSSL
New
Undecided
Unassigned
python-setuptools
Unknown
Unknown

Bug Description

Two different users of Tahoe-LAFS have reported the following symptoms:

User #1, David Abrahams a.k.a. bewst, wrote:

  File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 442, in createCertificate
    132)
  File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 539, in signCertificateRequest
    hlreq = CertificateRequest.load(requestData, requestFormat)
  File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 310, in load
    dn._copyFrom(req.get_subject())
  File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 64, in _copyFrom
    value = getattr(x509name, name, None)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-5: unsupported Unicode code range

on this ticket:

http://allmydata.org/trac/tahoe/ticket/704

User #2, midnightmagic wrote:

  File "/v/tahoe/allmydata-tahoe-1.4.1/Twisted-8.2.0-py2.5-netbsd-3.99.23-
i386.egg/twisted/internet/_sslverify.py", line 539, in signCertificateRequest
    hlreq = CertificateRequest.load(requestData, requestFormat)
  File "/v/tahoe/allmydata-tahoe-1.4.1/Twisted-8.2.0-py2.5-netbsd-3.99.23-
i386.egg/twisted/internet/_sslverify.py", line 310, in load
    dn._copyFrom(req.get_subject())
  File "/v/tahoe/allmydata-tahoe-1.4.1/Twisted-8.2.0-py2.5-netbsd-3.99.23-
i386.egg/twisted/internet/_sslverify.py", line 64, in _copyFrom
    value = getattr(x509name, name, None)
exceptions.UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-5:
unsupported Unicode code range

On this ticket:

http://bugs.python.org/setuptools/issue78

bewst said that the problem stopped happening when he stopped using one pyOpenSSL package and started using another. I don't know what the status of midnightmagic's problem is.

I guess the next step is to ask bewst for more details about which package of pyOpenSSL fails and which one works.

Also, could you run the following steps to generate a new certificate and
then examine it to see what the "Subject" names are?

{{{
% python
>>> from foolscap import Tub
>>> t = Tub(certFile="dummy.pem")
>>> (Control-D)
% ls dummy.pem
dummy.pem
% openssl x509 -in dummy.pem -text
}}}

On my OS-X system, I see "Subject: CN=newpb_thingy". Do you get the same? It
might also help us if you could attach that dummy.pem file to this ticket
(but of course don't use it for anything else).

My current hunch is that the Foolscap-generated x509 certificates are either
being created with corrupt (i.e. non-UTF-8) subject-name strings, or they're
somehow being corrupted afterwards.

We're waiting for more information from the original bug reporter, bewst.

Replying to [comment:1 warner]:
> Interesting.. I see a lot of unicode decode exceptions while trying to parse the "subject" of an X.509 certificate (must be some underlying SSL thing, since foolscap doesn't care about fields like that).
>
> Does your system perhaps have a non-ascii hostname?

Nope. The {{{hostname}}} command yields: “zreba.local”

> Could you run the Foolscap unit tests (see http://foolscap.lothar.com/trac to download a tarball directly) and see if they complain about the same sort of thing?

Looks like it does. See attached foolscap.log.

> What version of Twisted are these tests using? ("twistd --version" is probably the easiest way to get it, although "./bin/tahoe --version" from your Tahoe tree will give even more information if it works)

Hmm,

{{{
$ twistd --version
twistd (the Twisted daemon) 2.5.0
Copyright (c) 2001-2006 Twisted Matrix Laboratories.
See LICENSE for details.
$ # err, OK, that was the one installed with the system's python (2.4)
$ twistd2.5 --version
twistd (the Twisted daemon) 8.2.0
Copyright (c) 2001-2008 Twisted Matrix Laboratories.
See LICENSE for details.
$ ./bin/tahoe --version
allmydata-tahoe: 1.4.1, foolscap: 0.3.2, pycryptopp: 0.5.10, zfec: 1.4.2, Twisted: 8.2.0, Nevow: 0.9.32, zope.interface: 3.3.0, python: 2.5.4, platform: Darwin-9.7.0-i386-32bit, simplejson: 2.0.1, argparse: 0.8.0, pyOpenSSL: 0.7, pyutil: 1.3.28, zbase32: 1.1.1, setuptools: 0.6c12dev
}}}

> Also, please check to see what Python's default encodings are.. here's how I look at them on my system:

<schnipp>

Looks the same as yours:

{{{
 python2.5
Python 2.5.4 (r254:67916, May 6 2009, 18:40:46)
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'utf-8'
>>>
}}}

Replying to [comment:2 warner]:
> Also, could you run the following steps to generate a new certificate and
> then examine it to see what the "Subject" names are?
<schnipp>

> On my OS-X system, I see "Subject: CN=newpb_thingy". Do you get the same? It
> might also help us if you could attach that dummy.pem file to this ticket
> (but of course don't use it for anything else).
>
> My current hunch is that the Foolscap-generated x509 certificates are either
> being created with corrupt (i.e. non-UTF-8) subject-name strings, or they're
> somehow being corrupted afterwards.

Looks like things are going wrong much earlier:
{{{
#!python
$ python2.5
Python 2.5.4 (r254:67916, May 6 2009, 18:40:46)
[GCC 4.0.1 (Apple Inc. build 5490)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from foolscap import Tub
>>> t = Tub(certFile="dummy.pem")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 222, in __init__
    self.setupEncryptionFile(certFile)
  File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 234, in setupEncryptionFile
    self.setupEncryption(certData)
  File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 249, in setupEncryption
    cert = self.createCertificate()
  File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 442, in createCertificate
    132)
  File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 539, in signCertificateRequest
    hlreq = CertificateRequest.load(requestData, requestFormat)
  File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 310, in load
    dn._copyFrom(req.get_subject())
  File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 64, in _copyFrom
    value = getattr(x509name, name, None)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-5: unsupported Unicode code range
>>>
}}}

I don't know if this is any help, but pdb is showing me this:

{{{
(Pdb) p x509name
<X509Name object '/CN=\xFD\xAE\x99\x97\x9D\xB0\xFD\xA2\x97\xB7\x91\xA8\xFD\xA9\x9B\xA6\x9D\xB9'>
}}}

Problem solved, I guess. I mean, it's still a mystery how this could have happened, but I had a pyOpenSSL egg installed that was causing the problem... and it masked the py25-openssl package that I subsequently installed with macports. Everything started working once I had removed the original egg. My strong suspicion is that it was built with a different Python2.5, with a UCS4 setting.

My current Python says:
{{{
$ python -c "import sys;print(sys.maxunicode<66000)and'UCS2'or'UCS4'"
UCS2
}}}

[http://www.egenix.com/products/python/pyOpenSSL/ This page] put me onto that possibility.

Wow, that's wacky. My OS-X box also reports UCS2, while my linux box reports UCS4. I wonder if that means the pyopenssl library is doing naieve string conversion: interpreting some underlying openssl field as a unicode string, and hoping that openssl is using the same representation as python is using.

Anyways, thanks for tracking this down! I'm sure others will run into this problem again in the future, and it's great to have a searchable page that explains how to fix it.

I opened a ticket for setuptools:

http://bugs.python.org/setuptools/issue78 # egg platform names don't reflect unicode variant (UCS2, UCS4)

Thanks for tracking this one down, bewst.

Zooko, what are you waiting for me to do/answer? I don't see it above.

There was no request for you outstanding, so this should have been unassigned from you. However, just recently I started a discussion on the python-dev list, and referenced this ticket, and they said that the symptoms that we observed are not the symptoms they would expect from having an inconsistency of internal unicode format between Python interpreter and Python module. If that were the problem, we should have seen something like "undefined symbol:
PyUnicodeUCS4_FromUnicode", not the utf-8 decode error that we saw.

Here is the comment on python-dev to that effect:

http://mail.python.org/pipermail/python-dev/2009-September/091943.html

So, now there ''is'' something you could do to help: see if you still have that pyOpenSSL library that you mentioned, the removal of which fixed this problem for you, so we can try to see what was wrong with it.

By the way, over on http://bugs.python.org/setuptools/issue78 midnightmagic says that he had the same symptoms. Maybe he could help us diagnose it.

Zooko Wilcox-O'Hearn (zooko) wrote :

Two different users of Tahoe-LAFS have reported the following symptoms:

User #1, David Abrahams a.k.a. bewst, wrote:

  File "/opt/local/lib/python2.5/site-packages/foolscap-0.3.2-py2.5.egg/foolscap/pb.py", line 442, in createCertificate
    132)
  File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 539, in signCertificateRequest
    hlreq = CertificateRequest.load(requestData, requestFormat)
  File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 310, in load
    dn._copyFrom(req.get_subject())
  File "/opt/local/lib/python2.5/site-packages/twisted/internet/_sslverify.py", line 64, in _copyFrom
    value = getattr(x509name, name, None)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-5: unsupported Unicode code range

on this ticket:

http://allmydata.org/trac/tahoe/ticket/704

User #2, midnightmagic wrote:

  File "/v/tahoe/allmydata-tahoe-1.4.1/Twisted-8.2.0-py2.5-netbsd-3.99.23-
i386.egg/twisted/internet/_sslverify.py", line 539, in signCertificateRequest
    hlreq = CertificateRequest.load(requestData, requestFormat)
  File "/v/tahoe/allmydata-tahoe-1.4.1/Twisted-8.2.0-py2.5-netbsd-3.99.23-
i386.egg/twisted/internet/_sslverify.py", line 310, in load
    dn._copyFrom(req.get_subject())
  File "/v/tahoe/allmydata-tahoe-1.4.1/Twisted-8.2.0-py2.5-netbsd-3.99.23-
i386.egg/twisted/internet/_sslverify.py", line 64, in _copyFrom
    value = getattr(x509name, name, None)
exceptions.UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-5:
unsupported Unicode code range

On this ticket:

http://bugs.python.org/setuptools/issue78

bewst said that the problem stopped happening when he stopped using one pyOpenSSL package and started using another. I don't know what the status of midnightmagic's problem is.

I guess the next step is to ask bewst for more details about which package of pyOpenSSL fails and which one works.

I opened a bug report with the pyOpenSSL project: https://bugs.launchpad.net/setuptools/+bug/434411 . pyOpenSSL uses launchpad as its issue tracker, and launchpad has a nice quality of integrating with other issue trackers in order to track issues which span multiple projects. launchpad bug 434411 is currently linked to pyOpenSSL, Tahoe-LAFS, and setuptools, although it may turn out that this issue is independent of the setuptools issue, which has to do with whether your python packages use UCS4 or UCS2 internal unicode encoding.

Changed in allmydata.org:
status: Unknown → New
Zooko Wilcox-O'Hearn (zooko) wrote :

Rhamphoryncus does a bit of analysis on this bug report: http://mail.python.org/pipermail/python-dev/2009-October/092745.html

Jean-Paul Calderone (exarkun) wrote :

As far as pyOpenSSL goes, the cause of this error is simple. All names fields are treated as UTF-8 encoded. However, the format allows any sequence of bytes. Any sequence of bytes which is not valid UTF-8 will trigger a decoding error when pyOpenSSL attempts to make the information available as a unicode string to Python.

The fix is less clear. It probably involves deprecating the existing APIs and introducing new ones which do not try to treat arbitrary bytes as unicode.

Zooko Wilcox-O'Hearn (zooko) wrote :

But why did the problem with Tahoe-LAFS and foolscap and pyOpenSSL go away when bewst installed a different distribution of pyOpenSSL? I'll ask bewst to comment on this ticket.

Zooko Wilcox-O'Hearn (zooko) wrote :

I wrote:

> Do you have a copy of the pyOpenSSL package which caused this problem? Or do you
> remember anything else about where you got it? Please comment on this ticket.
> Thanks.

bewst wrote:

> Sorry, I'm afraid all the information I have is in
> http://allmydata.org/trac/tahoe/ticket/704

midnightmagic (midnightlaunch) wrote :

I must apologise I'm afraid. :( I do not recall how or under what circumstances this problem revealed itself. Forever and from now on I promise I'll write down much more detail when I participate in bug-tracking. I should really know better. From my end, unfortunately I have no more details about it; however, the software is now working basically as written.

Zooko Wilcox-O'Hearn (zooko) wrote :

I'm thinking we should probably close this ticket as "irreproducible" and next time we'll make sure to keep copies of all the relevant files, write down all the observed behavior etc.. :-(

Zooko Wilcox-O'Hearn (zooko) wrote :
Download full text (3.8 KiB)

Kevin Reid encountered this same problem. Here is the stack trace that he got:

/Applications/allmydata-tahoe-1.6.1/support/lib/python2.5/site-packages/zope.interface-3.5.2-py2.5-macosx-10.5-i386.egg/zope/__init__.py:3: UserWarning: Module twisted was already imported from /Applications/allmydata-tahoe-1.6.1/support/lib/python2.5/site-packages/Twisted-8.2.0-py2.5-macosx-10.5-i386.egg/twisted/__init__.pyc, but /Applications/allmydata-tahoe-1.6.1/support/lib/python2.5/site-packages/Nevow-0.9.33_r17222-py2.5.egg is being added to sys.path
  import pkg_resources
Traceback (most recent call last):
  File "/Applications/allmydata-tahoe-1.6.1/Twisted-8.2.0-py2.5-macosx-10.5-i386.egg/twisted/application/app.py", line 694, in run
    runApp(config)
  File "/Applications/allmydata-tahoe-1.6.1/Twisted-8.2.0-py2.5-macosx-10.5-i386.egg/twisted/scripts/twistd.py", line 23, in runApp
    _SomeApplicationRunner(config).run()
  File "/Applications/allmydata-tahoe-1.6.1/Twisted-8.2.0-py2.5-macosx-10.5-i386.egg/twisted/application/app.py", line 411, in run
    self.application = self.createOrGetApplication()
  File "/Applications/allmydata-tahoe-1.6.1/Twisted-8.2.0-py2.5-macosx-10.5-i386.egg/twisted/application/app.py", line 494, in createOrGetApplication
    application = getApplication(self.config, passphrase)
--- <exception caught here> ---
  File "/Applications/allmydata-tahoe-1.6.1/Twisted-8.2.0-py2.5-macosx-10.5-i386.egg/twisted/application/app.py", line 505, in getApplication
    application = service.loadApplication(filename, style, passphrase)
  File "/Applications/allmydata-tahoe-1.6.1/Twisted-8.2.0-py2.5-macosx-10.5-i386.egg/twisted/application/service.py", line 390, in loadApplication
    application = sob.loadValueFromFile(filename, 'application', passphrase)
  File "/Applications/allmydata-tahoe-1.6.1/Twisted-8.2.0-py2.5-macosx-10.5-i386.egg/twisted/persisted/sob.py", line 214, in loadValueFromFile
    exec fileObj in d, d
  File "tahoe-client.tac", line 10, in <module>
    c = client.Client()
  File "/Applications/allmydata-tahoe-1.6.1/src/allmydata/client.py", line 122, in __init__
    node.Node.__init__(self, basedir)
  File "/Applications/allmydata-tahoe-1.6.1/src/allmydata/node.py", line 68, in __init__
    self.create_tub()
  File "/Applications/allmydata-tahoe-1.6.1/src/allmydata/node.py", line 142, in create_tub
    self.tub = Tub(certFile=certfile)
  File "/Applications/allmydata-tahoe-1.6.1/support/lib/python2.5/site-packages/foolscap-0.4.2-py2.5.egg/foolscap/pb.py", line 225, in __init__
    self.setupEncryptionFile(certFile)
  File "/Applications/allmydata-tahoe-1.6.1/support/lib/python2.5/site-packages/foolscap-0.4.2-py2.5.egg/foolscap/pb.py", line 237, in setupEncryptionFile
    self.setupEncryption(certData)
  File "/Applications/allmydata-tahoe-1.6.1/support/lib/python2.5/site-packages/foolscap-0.4.2-py2.5.egg/foolscap/pb.py", line 252, in setupEncryption
    cert = self.createCertificate()
  File "/Applications/allmydata-tahoe-1.6.1/support/lib/python2.5/site-packages/foolscap-0.4.2-py2.5.egg/foolscap/pb.py", line 452, in createCertificate
    132)
  File "/Applications/allmydata-tahoe-1.6.1/Twisted-8.2.0-py2.5-macosx-10.5-i386.eg...

Read more...

Zooko Wilcox-O'Hearn (zooko) wrote :

The version of pyOpenSSL that Kevin was using was built from source by the Macports version of Python 2.5.4 on his Mac OS 10.5.8. The same error happened with pyOpenSSL-0.9 and with pyOpenSSL-0.10. Then he switched to pyOpenSSL-0.7 from Macports, which is also a build-from-source, and the issue went away. Here is the binary egg that was built from source by distutils that gave this error, attached.

Zooko Wilcox-O'Hearn (zooko) wrote :

Sorry that may have been unclear -- all three versions of pyOpenSSL were built by the Macports version of Python 2.5.4. The two failing ones (-0.9 and -0.10) were built by distutils, i.e the same result as "python setup.py build". The succeeding one (-0.7) was built by the Macports port file.

Zooko Wilcox-O'Hearn (zooko) wrote :

This is not reproducible as far as I know. Perhaps we should close this ticket.

I can reproduce it on Mac OS 10.5. The certfile being passed is, on My system, /Users/Frank/allmydata-tahoe-1.10.0/_trial_temp/test_client.Basic.test_create_drop_uploader1/private/node.pem, which does not exist.

This is not the node.pem file I referenced earlier but the file to which test_client.py is trying to write immediately before cutting out.

Something occurs to Me. I am using pyOpenSSL 0.13 from MacPorts instead of the included 0.12. Does this make a difference?

Zooko Wilcox-O'Hearn (zooko) wrote :

Frank: I think we need the contents of the node.pem file, and possibly also of the pyOpenSSL shared library or egg that you used.

I think one or both of the people who earlier reported this problem were also using Macports, so maybe it is a bug in Macports or in pyOpenSSL when it is built by Macports.

That's the thing. It looks like it doesn't get created. In /Users/Frank/allmydata-tahoe-1.10.0/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg/foolscap/pb.py, line 249, "certData = open(certFile, "rb").read()" is attempted. It seems, since the file doesn't exist, the invocation fails.

Zooko Wilcox-O'Hearn (zooko) wrote :
Download full text (4.2 KiB)

Frank: on the tahoe-dev mailing list (https://tahoe-lafs.org/pipermail/tahoe-dev/2013-August/008664.html), you posted this stack trace:

allmydata.test.test_client
  Basic
    test_create_drop_uploader ... Traceback (most recent call last):
  File "/Users/Frank/allmydata-tahoe-1.10.0/support/lib/python2.7/site-packages/mock-1.0.1-py2.7.egg/mock.py", line 1201, in patched
    return func(*args, **keywargs)
  File "/Users/Frank/allmydata-tahoe-1.10.0/src/allmydata/test/test_client.py", line 235, in test_create_drop_uploader
    self.failUnlessRaises(MissingConfigEntry, client.Client, basedir1)
  File "/Users/Frank/allmydata-tahoe-1.10.0/support/lib/python2.7/site-packages/Twisted-13.0.0-py2.7-macosx-10.5-i386.egg/twisted/trial/_synctest.py", line 335, in assertRaises
    failure.Failure().getTraceback()))
twisted.trial.unittest.FailTest: <type 'exceptions.UnicodeDecodeError'> raised instead of MissingConfigEntry:
 Traceback (most recent call last):
  File "/Users/Frank/allmydata-tahoe-1.10.0/support/lib/python2.7/site-packages/Twisted-13.0.0-py2.7-macosx-10.5-i386.egg/twisted/internet/defer.py", line 137, in maybeDeferred
    result = f(*args, **kw)
  File "/Users/Frank/allmydata-tahoe-1.10.0/support/lib/python2.7/site-packages/Twisted-13.0.0-py2.7-macosx-10.5-i386.egg/twisted/internet/_utilspy3.py", line 37, in runWithWarningsSuppressed
    result = f(*a, **kw)
  File "/Users/Frank/allmydata-tahoe-1.10.0/support/lib/python2.7/site-packages/mock-1.0.1-py2.7.egg/mock.py", line 1201, in patched
    return func(*args, **keywargs)
  File "/Users/Frank/allmydata-tahoe-1.10.0/src/allmydata/test/test_client.py", line 235, in test_create_drop_uploader
    self.failUnlessRaises(MissingConfigEntry, client.Client, basedir1)
--- <exception caught here> ---
  File "/Users/Frank/allmydata-tahoe-1.10.0/support/lib/python2.7/site-packages/Twisted-13.0.0-py2.7-macosx-10.5-i386.egg/twisted/trial/_synctest.py", line 328, in assertRaises
    result = f(*args, **kwargs)
  File "/Users/Frank/allmydata-tahoe-1.10.0/src/allmydata/client.py", line 130, in __init__
    node.Node.__init__(self, basedir)
  File "/Users/Frank/allmydata-tahoe-1.10.0/src/allmydata/node.py", line 82, in __init__
    self.create_tub()
  File "/Users/Frank/allmydata-tahoe-1.10.0/src/allmydata/node.py", line 174, in create_tub
    self.tub = Tub(certFile=certfile)
  File "/Users/Frank/allmydata-tahoe-1.10.0/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg/foolscap/pb.py", line 240, in __init__
    self.setupEncryptionFile(certFile)
  File "/Users/Frank/allmydata-tahoe-1.10.0/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg/foolscap/pb.py", line 252, in setupEncryptionFile
    self.setupEncryption(certData)
  File "/Users/Frank/allmydata-tahoe-1.10.0/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg/foolscap/pb.py", line 267, in setupEncryption
    cert = self.createCertificate()
  File "/Users/Frank/allmydata-tahoe-1.10.0/support/lib/python2.7/site-packages/foolscap-0.6.4-py2.7.egg/foolscap/pb.py", line 476, in createCertificate
    132)
  File "/Users/Frank/allmydata-tahoe-1.10.0/support/lib/python2.7/site-packages/Twisted-13.0.0-py2.7-macosx-10.5-i386.egg/twi...

Read more...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.