enlisting of nodes: seed_random fails due to self signed certificate

Bug #1424549 reported by Martin Nowack on 2015-02-23
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
Medium
Unassigned
cloud-init
Undecided
Unassigned

Bug Description

Using Maas 1.7.1 on trusty, the following error message in the MAAS provided ephemeral image for the step pollinate is executed:

curl: SSL certificate problem: self signed certificate in certificate chain.

This way random number generator is not initialized correctly.

affects: maas (Ubuntu) → maas
Blake Rouse (blake-rouse) wrote :

Does your node have full access to the internet?

Can you provide the dmesg output for this error?

Changed in maas:
status: New → Incomplete
Martin Nowack (martin-nowack) wrote :

Yes, the node has full access to the internet:

By explicitly executing: sudo pollinate

I get:
Feb 26 14:35:23 stream1 pollinate[26489]: client sent challenge to [https://entropy.ubuntu.com/]
Feb 26 14:35:24 stream1 pollinate[26513]: ERROR: Network communication failed [60]\n14:35:24.298494 * Hostname was NOT found in DNS cache
  % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed
  0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 014:35:24.299301 * Trying 91.189.94.50...
14:35:24.324169 * Connected to entropy.ubuntu.com (91.189.94.50) port 443 (#0)
14:35:24.325201 * successfully set certificate verify locations:
14:35:24.325254 * CAfile: /etc/pollinate/entropy.ubuntu.com.pem
  CApath: /dev/null
14:35:24.325410 * SSLv3, TLS handshake, Client hello (1):
14:35:24.325460 } [data not shown]
14:35:24.350528 * SSLv3, TLS handshake, Server hello (2):
14:35:24.350592 { [data not shown]
14:35:24.363801 * SSLv3, TLS handshake, CERT (11):
14:35:24.363852 { [data not shown]
14:35:24.364434 * SSLv3, TLS alert, Server hello (2):
14:35:24.364486 } [data not shown]
14:35:24.364643 * SSL certificate problem: self signed certificate in certificate chain
  0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
14:35:24.364884 * Closing connection 0
curl: (60) SSL certificate problem: self signed certificate in certificate chain
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

Martin Nowack (martin-nowack) wrote :

One side remark, this is now executed from the deployed image.
So, it's not only important for enlisting only.

Launchpad Janitor (janitor) wrote :

[Expired for MAAS because there has been no activity for 60 days.]

Changed in maas:
status: Incomplete → Expired
Guy Halfon (s-gh) wrote :

I'm getting the same behavior in Vivid.

Karl (karl-martin2) wrote :

This is happening still and can't seem to find a work around:

backdoor@maas-enlisting-node:~$ sudo pollinate
sudo: unable to resolve host maas-enlisting-node
Oct 14 14:06:51 maas-enlisting-node pollinate[1776]: client sent challenge to [https://entropy.ubuntu.com/]
Oct 14 14:06:51 maas-enlisting-node pollinate[1800]: ERROR: Network communication failed [60]\n14:06:51.133088 * Hostname was NOT found in DNS cache
  % Total % Received % Xferd Average Speed Time Time Time Current
                                 Dload Upload Total Spent Left Speed
  0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 014:06:51.137568 * Trying 91.189.94.53...
14:06:51.271750 * Connected to entropy.ubuntu.com (91.189.94.53) port 443 (#0)
14:06:51.272710 * successfully set certificate verify locations:
14:06:51.272731 * CAfile: /etc/pollinate/entropy.ubuntu.com.pem
  CApath: /dev/null
14:06:51.272849 * SSLv3, TLS handshake, Client hello (1):
14:06:51.272884 } [data not shown]
14:06:51.404391 * SSLv3, TLS handshake, Server hello (2):
14:06:51.404432 { [data not shown]
14:06:51.417184 * SSLv3, TLS handshake, CERT (11):
14:06:51.417235 { [data not shown]
14:06:51.417754 * SSLv3, TLS alert, Server hello (2):
14:06:51.417776 } [data not shown]
14:06:51.417853 * SSL certificate problem: self signed certificate in certificate chain
  0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
14:06:51.417928 * Closing connection 0
curl: (60) SSL certificate problem: self signed certificate in certificate chain
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

Changed in maas:
status: Expired → In Progress
Mike Pontillo (mpontillo) wrote :

Are you behind an HTTP proxy, or a network security device that could be substituting officially-signed X.509 certificates with X.509 certificates signed by your organization?

From a machine on the same network as the node being deployed, can you pastebin the output of the following:

openssl s_client -connect entropy.ubuntu.com:443 \
                 -showcerts \
                 -CApath /etc/ssl/certs < /dev/null

Changed in maas:
status: In Progress → Incomplete
Karl (karl-martin2) wrote :

Here you go.

Karl (karl-martin2) wrote :

I am not behind an HTTP proxy - the node being deployed does access the internet through a pfsense router. The nodes being deployed are VM's.

Mike Pontillo (mpontillo) wrote :

Hmm, strange. The certificates you posted in the openssl trace match the ones in /etc/pollinate/entropy.ubuntu.com.pem. For it to not validate, I would have expected them to be different.

Is it possible that the system time is incorrect on the VM, which in turn causes the certificates to not validate for some reason? (from what I've seen in your debug output, it's probably correct, but I'm running out of theories now.)

From the same node where you ran 'openssl s_client', I'm curious if there is a difference between the output of the following two commands:

pollinate -t > /dev/null
pollinate -i -t > /dev/null

Are you certain that the pfsense router is not acting as a man-in-the-middle for some types of traffic? (Again, though - if it is, I'm just not sure why we wouldn't have seen signs of that in the OpenSSL output.)

Karl (karl-martin2) wrote :

I attached the output, but I was noticing earlier this morning when I was TS that I could use the insecure flag of pollinate and it would seed correctly. pfSense router shouldn't be modifying anything, I went through the configs a few times and could not find anything. It is a minimal installation to route traffic to the internet only, but I have been spending time to research if it could be modifying anything. The openssl s_client commands were run through there as well.

Changed in maas:
status: Incomplete → Confirmed
Karl (karl-martin2) wrote :

I ensured the MAAS and VM times match and both are set to UTC. I noticed when the image is booting up, one service fails and it had to do with entropy seeding on first boot. It scrolled too fast to quite catch it all and I don't see it in the logs, but it said failed to start pseudo random number generator. Something might be going wonky affecting entropy?

Mike Pontillo (mpontillo) wrote :

OK. I'll set this bug to "Confirmed" since it affects multiple users. But we won't be able to move forward on this bug unless we can triage this enough to determine why [in some circumstances] pollinate can't validate what appears to be a perfectly good certificate.

I suppose it could be a 'curl' issue. Perhaps if we can find out the exact 'curl' command pollinate is running, we can narrow it down. From what I understood from the logs, could you please try comparing the output from the following two commands:

curl -v --cacert /etc/pollinate/entropy.ubuntu.com.pem \
    --capath /dev/null https://entropy.ubuntu.com/

curl --insecure -v --cacert /etc/pollinate/entropy.ubuntu.com.pem \
    --capath /dev/null https://entropy.ubuntu.com/

It could be useful to get a packet capture from curl and/or pollinate (so we can see the certificates present in the TLS headers), if we have reason to believe they would be different from your OpenSSL output.

The other question to ask is: what images URL are you using, and which subset of images are you working with? (I assumed you were using the default URL and deploying amd64 images.)

I can only reproduce this bug if I edit /etc/pollinate/entropy.ubuntu.com.pem and remove a subset of the trusted certificates.

But here's the other curious part [in my output]:

* Server certificate:
* subject: OU=Domain Control Validated; CN=entropy.ubuntu.com
* start date: 2014-10-14 23:21:25 GMT
* expire date: 2015-10-15 16:10:53 GMT
* subjectAltName: entropy.ubuntu.com matched
* issuer: C=US; ST=Arizona; L=Scottsdale; O=GoDaddy.com, Inc.; OU=http://certs.godaddy.com/repository/; CN=Go Daddy Secure Certificate Authority - G2
* SSL certificate verify ok.

It looks like the certificate is due to expire *tomorrow*. Which might mean two things:
 - A clock skew could cause the certificate not to validate
 - If the certificate has been updated, perhaps the trusted CA in /etc/pollinate/entropy.ubuntu.com.pem needs to be updated as well.

Karl (karl-martin2) wrote :

MAAS Version 1.8.2+bzr4041-0ubuntu1 (trusty1) - This is a stock install from the repos, nothing extra.

I am using the 14.04 LTS image - AMD64
http://archive.ubuntu.com/ubuntu
http://ports.ubuntu.com/ubuntu-ports
http://maas.ubuntu.com/images/ephemeral-v2/releases/

I took dumps of the curl cmnds, normal vs insecure. I really didn't see anything in wireshark that stood out.

Let me know if you want me to gather any other info.

Mike Pontillo (mpontillo) wrote :

Sorry for all the questions. I looked at the Wireshark output, and it looks okay. (it confirms that there is no clock skew, and that the certificates have the expected validity dates, but I didn't dig any deeper than that.) Two more questions:

(1) What are the contents of /etc/pollinate/entropy.ubuntu.com.pem on your system? (perhaps yours has been updated in advance of a pending certificate change, and mine hasn't? ideally you'd want there to be some overlap for a certain time, to prevent this race condition, if that's true.)

(2) What is the output of the following command:
curl -v --capath /etc/ssl/certs https://entropy.ubuntu.com

To explain, (1) above is a sanity check to determine if you're using the same trust roots that I'm seeing locally. (2) checks if using the default trust roots on the system lead to success. (perhaps that should be the default, since the way we've "pinned" this certificate seems to be problematic.)

Karl (karl-martin2) wrote :

backdoor@os-test:~$ curl -v --capath /etc/ssl/certs https://entropy.ubuntu.com
* Rebuilt URL to: https://entropy.ubuntu.com/
* Hostname was NOT found in DNS cache
* Trying 91.189.94.50...
* Connected to entropy.ubuntu.com (91.189.94.50) port 443 (#0)
* successfully set certificate verify locations:
* CAfile: none
  CApath: /etc/ssl/certs
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server key exchange (12):
* SSLv3, TLS handshake, Server finished (14):
* SSLv3, TLS handshake, Client key exchange (16):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSLv3, TLS change cipher, Client hello (1):
* SSLv3, TLS handshake, Finished (20):
* SSL connection using DHE-RSA-AES128-GCM-SHA256
* Server certificate:
* subject: OU=Domain Control Validated; CN=entropy.ubuntu.com
* start date: 2014-10-14 23:21:25 GMT
* expire date: 2015-10-15 16:10:53 GMT
* subjectAltName: entropy.ubuntu.com matched
* issuer: C=US; ST=Arizona; L=Scottsdale; O=GoDaddy.com, Inc.; OU=http://certs.godaddy.com/repository/; CN=Go Daddy Secure Certificate Authority - G2
* SSL certificate verify ok.
> GET / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: entropy.ubuntu.com
> Accept: */*
>
* HTTP 1.0, assume close after body
< HTTP/1.0 400 Bad Request
< Content-Type: text/plain; charset=utf-8
< Date: Wed, 14 Oct 2015 20:06:51 GMT
< Content-Length: 162
< X-Cache: MISS from localhost
< X-Cache-Lookup: MISS from localhost:3128
< Via: 1.0 localhost (squid/3.1.19)
* HTTP/1.0 connection set to keep alive!
< Connection: keep-alive
<
Please use the pollinate client. 'sudo apt-get install pollinate' or download from: https://bazaar.launchpad.net/~pollinate/pollinate/trunk/view/head:/pollinate
* Connection #0 to host entropy.ubuntu.com left intact

Mike Pontillo (mpontillo) wrote :

It seems that the problem is (1). (but it isn't quite what I expected) The certificates in your file are completely different from what I would expect, in order to properly validate. The leaf certificate in your file (per "openssl x509 -inform pem -in <file> -text", after placing the individual certificate into <file>) is the following:

        Issuer: C=US, ST=Arizona, L=Scottsdale, O=Starfield Technologies, Inc., OU=http://certs.starfieldtech.com/repository/, CN=Starfield Secure Certificate Authority - G2
        Validity
            Not Before: Apr 8 08:26:03 2014 GMT
            Not After : Oct 15 16:10:53 2014 GMT
        Subject: OU=Domain Control Validated, CN=entropy.ubuntu.com

The remainder of the certificates in the file are the CA and intermediate certificates.

Maybe out of date MAAS images are at fault? (though if the packages get updated, you shouldn't see this problem, since you'll get a new "pinned" certificate chain.) You could try updating the MAAS images, or even try using the 'daily' URL (which is updated for security updates and/or every couple of weeks with the latest updated packages):

https://maas.ubuntu.com/images/ephemeral-v2/daily/

Perhaps the daily images contain the appropriate certificates. And I hope that's still the case in 20 hours. ;-) I just checked, and the following certificate is actually in *my* pinned trust store:

        Issuer: C=US, O=DigiCert Inc, CN=DigiCert SHA2 Secure Server CA
        Validity
            Not Before: Aug 7 00:00:00 2015 GMT
            Not After : Aug 11 12:00:00 2016 GMT
        Subject: C=GB, ST=Southwark, L=London, O=Canonical Group Ltd, CN=entropy.ubuntu.com

So my conclusion is that everything should work fine, provided that you have the most up-to-date MAAS images.

Karl (karl-martin2) wrote :

OK, that would be the curious thing is how I would have received a cert from 2014 when I installed the MAAS server fresh on 10/10/2015?

I just now ran: maas admin node-groups import-boot-images after switching to the daily URL that you have provided, but when I restarted the node the certificate had not changed?

These are the repos I added when performing the installation:
sudo add-apt-repository ppa:juju/stable
sudo add-apt-repository ppa:maas-maintainers/stable
sudo add-apt-repository ppa:cloud-installer/stable
sudo apt update

Source:
http://www.ubuntu.com/download/cloud/install-ubuntu-openstack

Karl (karl-martin2) wrote :

The auto-updater for the ephemeral images don't seem to be updating every 60 mins.

I went to: https://maas.ubuntu.com/images/ephemeral-v2/daily/trusty/amd64/20150930/
and downloaded them to:
/var/lib/maas/boot-resources/current/ubuntu/amd64/generic/trusty/release

Updated these files from ubuntu:
-rw-r--r-- 1 maas maas 1.4G Oct 5 23:09 root-image
-rw-r--r-- 1 maas maas 5.6M Oct 5 23:09 boot-kernel
-rw-r--r-- 1 maas maas 24M Oct 5 23:09 boot-initrd

After the image booted up the cert still read:
 Validity
            Not Before: Apr 8 08:26:03 2014 GMT
            Not After : Oct 15 16:10:53 2014 GMT

Are those the correct files to update and the correct location?

Mike Pontillo (mpontillo) wrote :

Rather than downloading the images and replacing them in /var, can you change your image sync URL to https://maas.ubuntu.com/images/ephemeral-v2/daily/ and redeploy the node?

Images are kept in the database in the region and periodically synchronized to the clusters; manually changing them on the cluster is not supported.

Mike Pontillo (mpontillo) wrote :

Also (just curious) - which hypervisor are you using?

Karl (karl-martin2) wrote :

Mike,

I am using ESXi 6 and have a fresh install with the image URL pointing to dailys.

MAAS Version 1.8.2+bzr4041-0ubuntu1 (trusty1)

Thanks

Andres Rodriguez (andreserl) wrote :

Quick questions:

1. Are you using MAAS DNS?
2. If you are using MAAS DNS, are you using an upstream DNS server?
3. Are you enabling DNSSEC validation?

Thanks

Andres Rodriguez (andreserl) wrote :

Also, the big question is... why is this running at all? Why would this cause a failure? When working under offlines environment, where we don't have access to the internet at all, this doesn't represent an issue.

Andres Rodriguez (andreserl) wrote :

Ok, so looking into this further, this may be because of using an older version of pollinate. This, however, shouldn't really represent an issue at all.

The latest pollinate [1], might shed some light. So, the big question now is, have you tried the latest image? (changing from 'releases' to 'daily' for the streams)?

[1]: https://launchpad.net/ubuntu/+source/pollinate/4.7-0ubuntu1.4

Changed in maas:
status: Confirmed → Incomplete
Karl (karl-martin2) wrote :

I changed the URL from releases to Daily, and it now boots up with 14.04.3 LTS, but I am unable to ssh into the box, but eventually I see connection failures to archive.ubuntu.com, so I am guessing it still is not working.

It was my understanding from the docs, that after enlisting and shutting down, you commission the node, it boots up and needs to communicate to archive.ubuntu.com to download packages and install them prior to finishing the commissioning of the nodes to ready state? How does this work for you in an environment that does not have internet connection?

Previously, I had used the code below to create a backdoor login account. But since re-installing MAAS, I wanted to leave it stock and was not sure if this code modified the image, because when I had changed from releases to daily on the previous build it didn't seem like the images updated.

=-=-=-=-
sudo apt-get install --assume-yes bzr
bzr branch lp:~maas-maintainers/maas/backdoor-image backdoor-image

imgs=$(echo /var/lib/maas/boot-resources/*/*/*/*/*/*/root-image)
for img in $imgs; do
    [ -f "$img.dist" ] || sudo cp -a --sparse=always $img $img.dist
done

for img in $imgs; do
    sudo ./backdoor-image/backdoor-image -v --user=backdoor --password-auth --password=ubuntu $img
done
=-=-=-=-

Mike Pontillo (mpontillo) wrote :

After looking at the code, I believe the commissioning is failing for a different reason. Reasoning:

(1) cloud-init does specifies "required=False" for the random_seed configuration option when calling pollinate [1]

(2) MAAS does not currently send the random_seed option when calling cloud-init.[2]

So while I think it's true that we could handle this better, I think the symptom in this bug may be a red herring.

[1]:
https://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/view/head:/cloudinit/config/cc_seed_random.py

[2]:
https://bazaar.launchpad.net/~maas-committers/maas/1.8/view/head:/src/maasserver/compose_preseed.py

Changed in maas:
status: Incomplete → Invalid
status: Invalid → Triaged
importance: Undecided → Medium
Mike Pontillo (mpontillo) wrote :

Actually, I'll go ahead and mark this "Triaged"; it *is* a real bug, it just isn't as critical as we assumed.

To fix this bug, we should configure cloud-init to NOT call pollinate during enlistment (to avoid this spurious error).

As a follow-on fix, it might be a good idea for cloud-init to fall back to 'insecure" mode (or simply use the public CA roots in /etc/ssl/certs rather than a pinned chain) and log this as a warning, if the pinned certificate could not be validated.

Karl (karl-martin2) wrote :

To get this working in my environment are there any suggestions to move forward?

Scott Moser (smoser) wrote :

well, this will probably mess things up, but i'll attempt to explain a few things.
The summary is that I'm almost certain this does not affect your maas enlistment or commissioning.

a.) maas images in 'released' are old.
   this is quite unfortunate, but the images there are out of date and need updating. We're looking into ways we can produce up to date images without risk of regression to users.

   since this is old, 'pollinate' inside is old. And since it uses its own certificate, that fails to work. Any unpatched ubuntu image will show that error. It is "just" a warning though.
    This is why updating to daily got rid of the red-herring problem for you.

b.) cloud-init really has nothing to do with pollinate. It calls it is all. MAAS can instruct it *not* to call pollinate, but that may defeat the purpose that pollinate is serving. Note, seed is sometimes believed to be more useful in VMs which have less entropy, and maas is targetting hardware. So, in the case where maas is pointed at "real hardware", disabling pollinate may be less harmful. (note, i'm not speaking as a qualified security engineer here).

c.) we should probably add to maas metadata service some random seed. This would alleviate 'b' as then we *would* be getting a random seed from somewhere.

d.) I've submitted merge proposal to document random_seed better at https://code.launchpad.net/~smoser/cloud-init/trunk.doc-seedrandom/+merge/275062

Andres Rodriguez (andreserl) wrote :

Hi Karl,

In one of your comments i see "eventually I see connection failures to archive.ubuntu.com". That seems that the reason why it may be failing it is actually because it cnanot connect to the archive.

1. are you sure maas-proxy is running ? logs are in /var/log/maas/proxy/*.log.
2. The reason it may be failing to access the proxy is because of a missing upstream DNS> Have you set an upstream DNS?

Andres Rodriguez (andreserl) wrote :

I believe that his is no longer an issue. I'm going to mark this bug as Invalid. Please re-open if you believe the issue still exists or file a new one.

Changed in maas:
status: Triaged → Invalid
Changed in cloud-init:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers