Undercloud installation fails intermitently when using SSL

Bug #1712377 reported by Alfredo Moralejo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Juan Antonio Osorio Robles

Bug Description

Sometimes, when using SSL in undercloud, undercloud installation fails with following error in undercloud_install.log [1]

2017-08-22 14:34:54 | 2017-08-22 14:34:54,849 INFO: Error: /Stage[main]/Keystone::Roles::Admin/Keystone_user[admin]: Could not evaluate: Execution of '/bin/openstack token issue --format value' returned 1: Unable to establish connection to https://192.168.24.2:13000/v3/auth/tokens: HTTPSConnectionPool(host='192.168.24.2', port=13000): Max retries exceeded with url: /v3/auth/tokens (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x3f77f90>: Failed to establish a new connection: [Errno 111] Connection refused',)) (tried 34, for a total of 170 seconds)

Digging in logs, i found following error when starting haproxy in messages file [2]:

Aug 22 14:29:16 undercloud systemd: Started HAProxy Load Balancer.
Aug 22 14:29:16 undercloud systemd: Starting HAProxy Load Balancer...
Aug 22 14:29:16 undercloud certmonger: Certificate in file "/etc/pki/tls/certs/undercloud-front.crt" issued by CA and saved.
Aug 22 14:29:16 undercloud haproxy-systemd-wrapper: [ALERT] 233/142916 (1279) : parsing [/etc/haproxy/haproxy.cfg:107] : 'bind 192.168.24.2:13050' : unable to load SSL private key from PEM file '/etc/pki/tls/certs/undercloud-192.168.24.2.pem'.
Aug 22 14:29:16 undercloud haproxy-systemd-wrapper: [ALERT] 233/142916 (1279) : parsing [/etc/haproxy/haproxy.cfg:123] : 'bind 192.168.24.2:13000' : unable to load SSL private key from PEM file '/etc/pki/tls/certs/undercloud-192.168.24.2.pem'.
Aug 22 14:29:16 undercloud haproxy-systemd-wrapper: [ALERT] 233/142916 (1279) : parsing [/etc/haproxy/haproxy.cfg:134] : 'bind 192.168.24.2:13989' : unable to load SSL private key from PEM file '/etc/pki/tls/certs/undercloud-192.168.24.2.pem'.
Aug 22 14:29:16 undercloud haproxy-systemd-wrapper: [ALERT] 233/142916 (1279) : Error(s) found in configuration file : /etc/haproxy/haproxy.cfg
Aug 22 14:29:16 undercloud haproxy-systemd-wrapper: [WARNING] 233/142916 (1279) : config : missing timeouts for proxy 'rabbitmq'.
Aug 22 14:29:16 undercloud haproxy-systemd-wrapper: | While not properly invalid, you will certainly encounter various problems
Aug 22 14:29:16 undercloud haproxy-systemd-wrapper: | with such a configuration. To fix this, please ensure that all following
Aug 22 14:29:16 undercloud haproxy-systemd-wrapper: | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
Aug 22 14:29:16 undercloud haproxy-systemd-wrapper: [WARNING] 233/142916 (1279) : Setting tune.ssl.default-dh-param to 1024 by default, if your workload permits it you should set it to at least 2048. Please set a value >= 1024 to make this warning disappear.
Aug 22 14:29:16 undercloud haproxy-systemd-wrapper: [ALERT] 233/142916 (1279) : Proxy 'ironic-inspector': no SSL certificate specified for bind '192.168.24.2:13050' at [/etc/haproxy/haproxy.cfg:107] (use 'crt').
Aug 22 14:29:16 undercloud haproxy-systemd-wrapper: [ALERT] 233/142916 (1279) : Proxy 'keystone_public': no SSL certificate specified for bind '192.168.24.2:13000' at [/etc/haproxy/haproxy.cfg:123] (use 'crt').
Aug 22 14:29:16 undercloud haproxy-systemd-wrapper: [ALERT] 233/142916 (1279) : Proxy 'mistral': no SSL certificate specified for bind '192.168.24.2:13989' at [/etc/haproxy/haproxy.cfg:134] (use 'crt').
Aug 22 14:29:16 undercloud haproxy-systemd-wrapper: [ALERT] 233/142916 (1279) : Fatal errors found in configuration.
Aug 22 14:29:16 undercloud haproxy-systemd-wrapper: haproxy-systemd-wrapper: exit, haproxy RC=1
Aug 22 14:29:16 undercloud systemd: haproxy.service: main process exited, code=exited, status=1/FAILURE
Aug 22 14:29:16 undercloud systemd: Unit haproxy.service entered failed state.
Aug 22 14:29:16 undercloud systemd: haproxy.service failed.
Aug 22 14:29:16 undercloud systemd: Unit haproxy.service cannot be reloaded because it is inactive.

This issue is intermitent, so my guess is that it may be some kind of race condition between certificates creation and haproxy restart.

[1] https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-master-current-tripleo-delorean-minimal-256/undercloud/home/stack/undercloud_install.log.gz
[2] https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-master-current-tripleo-delorean-minimal-256/undercloud/var/log/messages.gz

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (master)

Fix proposed to branch: master
Review: https://review.openstack.org/496501

Changed in tripleo:
assignee: nobody → Juan Antonio Osorio Robles (juan-osorio-robles)
status: New → In Progress
Changed in tripleo:
importance: Undecided → High
milestone: none → pike-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to instack-undercloud (master)

Fix proposed to branch: master
Review: https://review.openstack.org/496564

Changed in tripleo:
milestone: pike-rc1 → pike-rc2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.openstack.org/496501
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=351ab932514f13d7a139b0b41fdc4f6f7e990c8f
Submitter: Jenkins
Branch: master

commit 351ab932514f13d7a139b0b41fdc4f6f7e990c8f
Author: Juan Antonio Osorio Robles <email address hidden>
Date: Wed Aug 23 09:01:53 2017 +0300

    Certmonger: Only attempt to reload haproxy is it's active

    Previously, certmonger tried to reload haproxy every time after a
    certificate is requested. This is useful for certificate resubmits or
    renewals. However, it turned out problematic on installation, when
    haproxy is not yet active, as it would try many times and end up having
    a race-condition with puppet.

    This checks if haproxy is active and only then will it attempt to reload
    it.

    Change-Id: I51f9cccb5d1518a9647778e7bf6f9426a02ceb60
    Closes-Bug: #1712377

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/498295

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on puppet-tripleo (stable/pike)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: stable/pike
Review: https://review.openstack.org/498295
Reason: I need to purge the gate because TripleO CI gate has critical issues right now, I'll make this patch goes to the gate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/pike)

Reviewed: https://review.openstack.org/498295
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=aaeace8c72ad7e9ea540c7055f0e16e2ed797f58
Submitter: Jenkins
Branch: stable/pike

commit aaeace8c72ad7e9ea540c7055f0e16e2ed797f58
Author: Juan Antonio Osorio Robles <email address hidden>
Date: Wed Aug 23 09:01:53 2017 +0300

    Certmonger: Only attempt to reload haproxy is it's active

    Previously, certmonger tried to reload haproxy every time after a
    certificate is requested. This is useful for certificate resubmits or
    renewals. However, it turned out problematic on installation, when
    haproxy is not yet active, as it would try many times and end up having
    a race-condition with puppet.

    This checks if haproxy is active and only then will it attempt to reload
    it.

    Change-Id: I51f9cccb5d1518a9647778e7bf6f9426a02ceb60
    Closes-Bug: #1712377
    (cherry picked from commit 351ab932514f13d7a139b0b41fdc4f6f7e990c8f)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to instack-undercloud (master)

Reviewed: https://review.openstack.org/496564
Committed: https://git.openstack.org/cgit/openstack/instack-undercloud/commit/?id=fe25c53fe9c43e42c2148cd0b192917568570a8c
Submitter: Jenkins
Branch: master

commit fe25c53fe9c43e42c2148cd0b192917568570a8c
Author: Juan Antonio Osorio Robles <email address hidden>
Date: Wed Aug 23 12:02:43 2017 +0300

    Undercloud/Certmonger: Only attempt to reload haproxy is it's active

    Previously, certmonger tried to reload haproxy every time after a
    certificate is requested. This is useful for certificate resubmits or
    renewals. However, it turned out problematic on installation, when
    haproxy is not yet active, as it would try many times and end up having
    a race-condition with puppet.

    This checks if haproxy is active and only then will it attempt to reload
    it.

    Closes-Bug: #1712377
    Change-Id: I4edd42b888a0bbbb8eb0e71f5c17750bac46c2ce

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to instack-undercloud (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/500248

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to instack-undercloud (stable/pike)

Reviewed: https://review.openstack.org/500248
Committed: https://git.openstack.org/cgit/openstack/instack-undercloud/commit/?id=830554939be83690b5268c014b79633e349a7c22
Submitter: Jenkins
Branch: stable/pike

commit 830554939be83690b5268c014b79633e349a7c22
Author: Juan Antonio Osorio Robles <email address hidden>
Date: Wed Aug 23 12:02:43 2017 +0300

    Undercloud/Certmonger: Only attempt to reload haproxy is it's active

    Previously, certmonger tried to reload haproxy every time after a
    certificate is requested. This is useful for certificate resubmits or
    renewals. However, it turned out problematic on installation, when
    haproxy is not yet active, as it would try many times and end up having
    a race-condition with puppet.

    This checks if haproxy is active and only then will it attempt to reload
    it.

    Closes-Bug: #1712377
    Change-Id: I4edd42b888a0bbbb8eb0e71f5c17750bac46c2ce
    (cherry picked from commit fe25c53fe9c43e42c2148cd0b192917568570a8c)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/instack-undercloud 7.4.0

This issue was fixed in the openstack/instack-undercloud 7.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 7.4.0

This issue was fixed in the openstack/puppet-tripleo 7.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/instack-undercloud 8.0.0

This issue was fixed in the openstack/instack-undercloud 8.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 8.0.0

This issue was fixed in the openstack/puppet-tripleo 8.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.