selftests fail assertActivitiesMatch for pycurl compiled against openssl

Bug #614713 reported by Martin von Gagern on 2010-08-07
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Bazaar
Medium
Vincent Ladeuil
2.0
Medium
Vincent Ladeuil
2.1
Medium
Vincent Ladeuil
2.2
Medium
Vincent Ladeuil
2.3
Medium
Vincent Ladeuil
2.4
Medium
Vincent Ladeuil
Gentoo Linux
Confirmed
Undecided
Unassigned

Bug Description

Running the selftest suite on a Gentoo system with curl compiled against openssl, 8 tests of the selftest suite fail:
FAIL: bzrlib.tests.test_http.TestActivity.test_get(pycurl,https,HTTP/1.0)
FAIL: bzrlib.tests.test_http.TestActivity.test_get(pycurl,https,HTTP/1.1)
FAIL: bzrlib.tests.test_http.TestActivity.test_has(pycurl,https,HTTP/1.0)
FAIL: bzrlib.tests.test_http.TestActivity.test_has(pycurl,https,HTTP/1.1)
FAIL: bzrlib.tests.test_http.TestActivity.test_post(pycurl,https,HTTP/1.0)
FAIL: bzrlib.tests.test_http.TestActivity.test_post(pycurl,https,HTTP/1.1)
FAIL: bzrlib.tests.test_http.TestActivity.test_readv(pycurl,https,HTTP/1.0)
FAIL: bzrlib.tests.test_http.TestActivity.test_readv(pycurl,https,HTTP/1.1)

The issue can be reproduced both in bzr.dev and in the 2.2 branch. I've attached the full selftest output, but the central part of these selftests always looks like this:

    self.assertActivitiesMatch()
  File "/home/mvg/src/up/bzr/bzr.dev/bzrlib/tests/test_http.py", line 1971, in assertActivitiesMatch
    self.activities.get('write', 0), 'written bytes')
AssertionError: written bytes
not equal:

Of the numbers a and b printed after this, a is always much smaller than b:
test_get a=219 b=958
test_has a=220 b=957
test_post a=312 b=1051
test_readv a=249 b=988

Related branches

Martin von Gagern (gagern) wrote :
John A Meinel (jameinel) wrote :

Sounds like pycurl + ssl is either recording a different number of bytes, or actually writing a different number.

Changed in bzr:
importance: Undecided → Medium
status: New → Confirmed
Vincent Ladeuil (vila) wrote :

This sounds vaguely familiar but I already have some fixes that may be related in my leaking-tests loom, so I'll ping you again when it lands for more feedback.

tags: added: https pycurl
Vincent Ladeuil (vila) wrote :

Hmm, looks like I forgot to ping. The changes I had in mind have landed long ago now. Do you still encounter the issue ? For which bzr version ?

The babune slave is quite happy to run these tests http://babune.ladeuil.net:24842/view/%20High/job/selftest-gentoo/ for quite some time now.

Martin von Gagern (gagern) wrote :

With current bzr.dev (r6072), the same 8 tests fail, but with a different error message and backtrace:

ERROR: test_http.TestActivity.test_get(pycurl,https,HTTP/1.0)
    Text attachment: log
------------
2.477 got pycurl error: 60, Peer certificate cannot be authenticated with given CA certificates, (60, 'Peer certificate cannot be authenticated with given CA certificates'), url: https://127.0.0.1:36486/foo/bar
------------
Text attachment: traceback
------------
  File "/usr/lib64/python2.7/site-packages/testtools/runtest.py", line 169, in _run_user
    return fn(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/testtools/testcase.py", line 540, in _run_test_method
    return self._get_test_method()()
  File "bzr.dev/bzrlib/tests/test_http.py", line 2145, in test_get
    t.get('foo/bar').read())
  File "bzr.dev/bzrlib/transport/http/__init__.py", line 126, in get
    code, response_file = self._get(relpath, None)
  File "bzr.dev/bzrlib/transport/http/_pycurl.py", line 171, in _get
    return self._get_full(relpath)
  File "bzr.dev/bzrlib/transport/http/_pycurl.py", line 206, in _get_full
    self._curl_perform(curl, header)
  File "bzrlib/transport/http/_pycurl.py", line 401, in _curl_perform
    'curl connection error (%s)\non %s' % (e[1], url))
ConnectionError: Connection error: curl connection error (Peer certificate cannot be authenticated with given CA certificates)
on https://127.0.0.1:36486/foo/bar
------------
Text attachment: traceback-1
------------
Traceback (most recent call last):
  File "/usr/lib64/python2.7/site-packages/testtools/runtest.py", line 169, in _run_user
    return fn(*args, **kwargs)
  File "bzr.dev/bzrlib/tests/test_server.py", line 564, in stop_server
    self._server_thread.join()
  File "bzr.dev/bzrlib/tests/test_server.py", line 254, in join
    super(TestThread, self).join(timeout)
  File "bzr.dev/bzrlib/cethread.py", line 126, in run
    super(CatchingExceptionThread, self).run()
  File "/usr/lib64/python2.7/threading.py", line 505, in run
    self.__target(*self.__args, **self.__kwargs)
  File "bzr.dev/bzrlib/tests/test_server.py", line 527, in run_server
    self.server.serve()
  File "bzr.dev/bzrlib/tests/test_server.py", line 297, in serve
    self.handle_request()
  File "bzr.dev/bzrlib/tests/test_server.py", line 310, in handle_request
    if self.verify_request(request, client_address):
  File "bzr.dev/bzrlib/tests/https_server.py", line 52, in verify_request
    request.do_handshake()
  File "/usr/lib64/python2.7/ssl.py", line 296, in do_handshake
    self._sslobj.do_handshake()
SSLError: [Errno 1] _ssl.c:503: error:14094418:SSL routines:SSL3_READ_BYTES:tlsv1 alert unknown ca
------------

The other 7 errors look a lot like this one.

Changed in gentoo:
status: New → Invalid
status: Invalid → Incomplete
Martin von Gagern (gagern) wrote :

Ran a full selftest. A large number of tests error out with the same kind of error as well. They usually are designated as (HTTPS_pycurl_transport,HTTPSServer_PyCurl). I assume almost every test that relies on pycurl handling of https will be affected by this new manifestation of this bug.

Martin von Gagern (gagern) wrote :

OK, it turns out my curl is currently configured to use nss not openssl, in contrast to when I first reported this issue. Reason is that mail-client/thunderbird-bin-5.0 requires it this way.

After enabling curl debug symbols in https://bugs.gentoo.org/379533 I got a backtrace of the curl location where this error code is first stored:
#0 Curl_nss_connect at nss.c:1386
#1 Curl_ssl_connect at sslgen.c:199
#2 Curl_http_connect at http.c:1307
#3 Curl_protocol_connect at url.c:3328
#4 Curl_setup_conn at url.c:5057
#5 Curl_async_resolved at hostasyn.c:145
#6 connect_host at transfer.c:1992
#7 Curl_do_perform at transfer.c:2122
#8 do_curl_perform at src/pycurl.c:1024
#9 call_function at Python/ceval.c:3997

See https://github.com/bagder/curl/blob/curl-7_21_7/lib/nss.c#L1386 for source code.

Will try to build an instance of curl linked against openssl, just to see what happens, but probably won't keep that installed by default.

Martin von Gagern (gagern) wrote :

OK, with openssl I still get the same errors as the original report indicates, except the lengths for test_post are now a = 303 and b = 1042.

Martin von Gagern (gagern) wrote :

For the sake of completeness: when using gnutls, the third possible ssl backend gentoo has to offer for curl, then the selftests succeed. Could it be your babune slave is using that config?

Vincent Ladeuil (vila) wrote :

It seems so:

[D] net-misc/curl
     Available versions: 7.19.4 7.19.6 {ares gnutls idn ipv6 kerberos ldap libssh2 nss ssl test}
     Installed versions: 7.21.4(16:03:11 08/17/11)(gnutls ipv6 ldap ssl -ares -idn -kerberos -libssh2 -nss -static-libs -test -threads)
     Homepage: http://curl.haxx.se/ http://curl.planetmirror.com
     Description: A Client that groks URLs

if you can guide me in reproducing your config there (speak slowly, gentoo is not my native OS ;) I could try to reproduce the issue.

Keep in mind though, that these tests are a bit pedantic and doesn't indicate an issue in the data transferred and as such shouldn't be seen as blockers to deploy bzr on gentoo.

Martin von Gagern (gagern) wrote :

Reinstall using OpenSSL:
# USE="-gnutls" emerge --oneshot curl

Reinstall using NSS:
# USE="-gnutls nss" emerge --oneshot curl

If you want your settings to persist, particularly when upgrading curl, you'd write its USE flags to a file:
# echo "net-misc/curl -gnutls nss" >> /etc/portage/package.use
This is for NSS, for OpenSSL you'd obviously omit the nss. If the command fails because package.use is a directory, write to an arbitrarily named file within that directory.

I'm not percieving the test failures as a blocker, and neither do gentoo bzr maintainers. But they are an annoying piece of white noise, as I cannot simply run the complete selftest suite when working on a fix, but will have to either explicitely disable some tests or actually read the results to see if the failures can safely be ignored.

Vincent Ladeuil (vila) wrote :

Wow, thanks !

> I'm not percieving the test failures as a blocker

Good. Thanks for explaining.

> I cannot simply run the complete selftest suite

I understand the pain. I'll have a look at it tomorrow and will report here.

I can reproduce the issue locally after running USE="-gnutls" emerge --oneshot curl

Vincent Ladeuil (vila) on 2011-08-18
Changed in gentoo:
status: Incomplete → Confirmed
Changed in bzr:
status: Confirmed → In Progress
assignee: nobody → Vincent Ladeuil (vila)
Vincent Ladeuil (vila) wrote :

With:
# USE="-gnutls nss" emerge --oneshot curl

   ./bzr selftest -s bt.test_http.TestActivity

succeeds.

Vincent Ladeuil (vila) wrote :

Ok, I've got a fix.

The issue is that we use a debug callback to track the bytes read/written. This callback is called with a kind and a text.

Some curl implementations calls our callback for INFOTYPE_SSL_DATA_OUT and INFOTYPE_SSL_DATA_IN (i.e. the encrypted text), some don't.

The bug is that we should never take these kinds into account anyway but until I could reproduce the issue I didn't realize that.

Thanks for your reproducing instructions !!

Martin von Gagern (gagern) wrote :

Thanks for fixing the original issue!

What about the NSS problem, particularly as you fail to reproduce it? Shall I open a new bug report for that one? How is verification of the CA intended to work?

I've dug a bit deeper into the code and got this backtrace:
#0 CERT_FindCertIssuer at certvfy.c:276
#1 cert_VerifyCertChainOld at certvfy.c:640
#2 cert_VerifyCertChain at certvfy.c:885
#3 CERT_VerifyCertChain at certvfy.c:894
#4 CERT_VerifyCert at certvfy.c:1490
#5 CERT_VerifyCertNow at certvfy.c:1541
#6 SSL_AuthCertificate at sslauth.c:261
#7 nss_auth_cert_hook at nss.c:635
#8 ssl3_HandleCertificate at ssl3con.c:7902
#9 ssl3_HandleHandshakeMessage at ssl3con.c:8601
#10 ssl3_HandleHandshake at ssl3con.c:8725
#11 ssl3_HandleRecord at ssl3con.c:9064
#12 ssl3_GatherCompleteHandshake at ssl3gthr.c:209
#13 ssl_GatherRecord1stHandshake at sslcon.c:1258
#14 ssl_Do1stHandshake at sslsecur.c:151
#15 SSL_ForceHandshake at sslsecur.c:407
#16 Curl_nss_connect at nss.c:1382
#17 Curl_ssl_connect at sslgen.c:199
#18 Curl_http_connect at http.c:1307
#19 Curl_protocol_connect at url.c:3328
#20 Curl_setup_conn at url.c:5057
#21 Curl_async_resolved at hostasyn.c:145
#22 connect_host at transfer.c:1992
#23 Curl_do_perform at transfer.c:2122
#24 Curl_perform at transfer.c:2268
#25 curl_easy_perform at easy.c:553
#26 do_curl_perform at src/pycurl.c:1024
#27 call_function at Python/ceval.c:3997

That's the place where NSS decides that it does not know the issuer and therefore won't trust the cert either.
http://mxr.mozilla.org/mozilla-central/source/security/nss/lib/certhigh/certvfy.c#276
At that point the certificate in question (CERTCertificate *cert) contains the following data:
subjectName="E=https_server@locahost,CN=127.0.0.1,OU=https server,O=Testing Ltd,L=LocalHost,ST=Internet,C=LH",
<email address hidden>,CN=Master of certificates,OU=VCS,O=Distributed,L=Bazaar,ST=Internet,C=BZ"
Using strace, I see no access to any list of trusted CA certificates on disk, but I might have somehow missed that.

It would be nice to know if your code reaches the location indicated by the backtrace even if you fail to reproduce this issue with nss. If the call to CERT_FindCertIssuer succeeds for you (i.e. doesn't return NULL), we would have to investigate behaviour inside that function more closely. If your function returns NULL as well, the difference must be somewhere after that, and if your system doesn't even reach that function, we would have to look for differences somewhere deeper (i.e. called earlier) in the call stack.

Vincent Ladeuil (vila) wrote :

What does nss means by "doesn't know" the issuer ? bzr provides its own ! :-}

It's quite hard to debug without being able to reproduce but if you want to give it a try here are some hints:

- for tests, pycurl uses a special (transport, server) permutations (from get_tests_permutations() in bzrlib/transport/http/_pycurl.py and add a specific 'ca.crt' (generated for tests in bzrlib/tests/ssl_certs)

- the https server is defined in bzrlib/tests/https_server.py, all https test servers use a special certificate and and key (also from ssl_certs)

http://curl.haxx.se/libcurl/c/curl_easy_setopt.html seems doesn't match my memory, may be things have changed since I wrote.maintained the code.

The code use self.cabundle to set pycurl.CAINFO (originally for windows, later on for tests too). But reading the doc now, I wonder if CURLOPT_ISSUERCERT (aka pycurl.ISSUERCERT) should be used instead for *tests* (I think windows still needs to use CAINFO but I may be wrong).

I think the issue is different enough to file a new bug to continue the discussion.

We may also try to chat on IRC to compare our configs and see if I can reproduce the issue locally.

Vincent Ladeuil (vila) on 2011-08-19
Changed in bzr:
milestone: none → 2.5b1
Vincent Ladeuil (vila) wrote :

https://bugs.launchpad.net/bzr/+bug/829333 filed to track the remaining issue.

Christian Faulhammer (fauli) wrote :

A word from the Bazaar Gentoo maintainer: I disabled the tests in Gentoo, the rest is fine, so 2.4 is the first version in a long time that can be emerged with tests.

Vincent, I would like to see the test runs done within the Portage system because that is the most common case where Bazaar users will see the test suite in action. There is a live ebuild in the Gentoo Bazaar overlay which always fetches the current code. Should I open a new issue for that?

>>>>> Christian Faulhammer writes:

    > A word from the Bazaar Gentoo maintainer:

You're very warmly welcome !

    > I disabled the tests

All the tests or just these ones ? How did you disable them ?

    > in Gentoo, the rest is fine, so 2.4 is the first version in a long
    > time that can be emerged with tests.

\o/

    > Vincent, I would like to see the test runs done within the Portage
    > system because that is the most common case where Bazaar users
    > will see the test suite in action.

I read that as the same goal we're trying to achieve in Ubuntu where the
full test suite is run as part of the package build for our stable
series.

I'm all for achieving this goal on as many supported platforms as is
humanly possible.

    > There is a live ebuild in the Gentoo Bazaar overlay which always
    > fetches the current code.

And by that you mean lp:bzr or lp:bzr/2.4 ? I.e. do you track the trunk
or the latest stable series ?

    > Should I open a new issue for that?

Yes, please do ! And let's try to align what is done on your side with
what is done on
http://babune.ladeuil.net:24842/view/%20High/job/selftest-gentoo/ where
bzr.dev (lp:bzr) runs the full test suite daily with only sporadic
failures.

I.e.: I'd really like to address the distortion between the gentoo
install where *all* tests pass every *day* without any significant
failure and your install (probably not the right word nor concept[1])
where there seem to be far too much failures for you to enable all the
tests.

Feel free to contact me privately if you want to discuss this or let's
do that on a new bug, both are fine with me.

[1]: I'm only trying to do a minimal amount of administration on the
babune's gentoo slave while maintaining a "standard" gentoo install. If
several configurations needs to be tracked (openssl/nss/you name it),
I'd like to understand which ones are needed and how we can maintain
them.

Vincent Ladeuil (vila) on 2011-08-21
Changed in bzr:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments