Upgrades: haproxy fails swact due to missing admin-ep-cert.pem

Bug #1876378 reported by John Kung
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Carmen Rata

Bug Description

Brief Description
-----------------
During upgrade of SystemController, on host-swact to N+1 controller haproxy fails to startup due
to missing /etc/ssl/private/admin-ep-cert.pem

Severity
--------
Major: System/Feature is usable but degraded

Steps to Reproduce
------------------
Perform upgrade to N+1 on DC SystemController (with the admin-ep-cert feature enabled for ssl on admin endpoints), and host-swact.

Expected Behavior
------------------
controller-1 should take activity with haproxy starting up.

Actual Behavior
----------------
controller-1 fails to take activity because haproxy fails to startup, suspect due to missing
/etc/ssl/private/admin-ep-cert.pem

Reproducibility
---------------
Reproducible.

System Configuration
--------------------
AIO-DX System Controller

Branch/Pull Time/Commit
-----------------------
2020-04-30_20-00-00

Last Pass
---------
This passed a few days prior; prior to the integration of admin-ep.

Timestamp/Logs
--------------
2020-05-01T19:29:31.960

2020-05-01T19:29:31.960 | 298 | service-scn | haproxy | enabling | disabling | enable failed
| 2020-05-01T19:29:32.318 | 309 | service-scn | haproxy | enabling | disabling | enable failed
| 2020-05-01T19:29:32.786 | 317 | service-scn | haproxy | disabled | disabled-failed | enabled-active state requested
| 2020-05-01T19:29:32.786 | 318 | service-group-scn | oam-services | go-active | go-active-failed | haproxy(disabled, failed)
| 2020-05-01T19:29:35.418 | 332 | service-group-scn | oam-services | go-active-failed | disabling-failed | haproxy(disabled, failed)

controller-1 is missing this file:
[root@controller-0 private(keystone_admin)]# ls -altr
total 36
-r--------. 1 root root 1539 Apr 20 15:49 self-signed-server-cert.pem
drwxr-xr-x. 3 root root 4096 May 1 14:46 ..

Test Activity
-------------
 Feature Testing

 Workaround
 ----------
N/A

Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 - issue introduced by recent code submissions for enabling https endpoints for DC systems

Changed in starlingx:
assignee: nobody → Bin Qian (bqian20)
tags: added: stx.4.0 stx.update
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
John Kung (john-kung) wrote :

There appears to be the following alternatives for resolving this issue:

1) Ensure the secure_system.yaml is generated for the N+1 controller.
   This can be done on the host-unlock of controller-1 (e.g. sysinv/conductor/manager.py::
            _configure_controller_host(). self._puppet.update_host_config_upgrade() or a new hook for update_secure_system_config())

2) Ensure the pem file is copied to /opt/platform/config so the N+1 controller copy it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/725150

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on config (master)

Change abandoned by Bin Qian (<email address hidden>) on branch: master
Review: https://review.opendev.org/725150
Reason: abandon this review, fix will be enabling access kubernetes from N+1 controller

Revision history for this message
Bin Qian (bqian20) wrote :

It has been agreed that this issue should be fixed as suggested by Bart in a more generic way:
"a possible solution would be to give controller-1 access to the kubernetes API during the upgrade, so it can retrieve any data it needs and write it into its own hiera data as part of the existing code that generates hiera data. I think the solution might look something like this:

When upgrade is started, controller-0 will make a copy of the /etc/kubernetes/admin.conf file that controller-1 can use during its upgrade. In prepare_upgrade (controllerconfig/upgrades/management.py) copy the admin.conf file into /opt/platform/config/N+1/kubernetes.
Before controller-1 generates its "regular" hiera data (from upgrade_controller in controllerconfig/upgrades/controller.py), it can copy the admin.conf file from /opt/platform to /etc/kubernetes/admin.conf. That way the puppet plugins that need to access the kubernetes API should all work without modification.
This change would probably also eliminate the need for the special call to self._config_update_hosts above as we could also generate the join command from controller-1."

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: Bin Qian (bqian20) → Carmen Rata (crata)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/729297

Revision history for this message
John Kung (john-kung) wrote :

In attempting to test with only admin.conf copy, the join_cmd creation appears to fail.

Upon investigating, on controller-1, we can observe that it does not have access to keyring: keyring.get_password('kubernetes', 'certificate-key') prior to the host-unlock

As this key data is already stored in /opt/platform/puppet/<version>/hieradata/secure_static.yaml:kubernetes::kubeadm::certificate-key:

An option is to retrieve it from there during an upgrade on controller-1.

Revision history for this message
John Kung (john-kung) wrote :

Please ignore comment#7; it appears the keyring is not accessible due to some interim investigation.

 actually the hierdataa stored on host-upgrade puppet hierdata generation join_cmd has a certificate-key (that is not None and is correct

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/729297
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=8fea59914e08f96203a2d59c0a8e60cd806b51b9
Submitter: Zuul
Branch: master

commit 8fea59914e08f96203a2d59c0a8e60cd806b51b9
Author: Carmen Rata <email address hidden>
Date: Tue May 19 11:15:38 2020 -0400

    Fix SystemController upgrade's host-swact failure

    This commit gives controller-1 access to the kubernetes API during
    the upgrade, so it can retrieve any data it needs and write it
    into its own hiera data as part of the existing code that generates
    hiera data.
    This is accomplished by copying /etc/kubernetes/admin.conf file.

    Test:
    Executed the upgrade procedure to completion.
    Verified that openstack admin endpoints use https.
    Verified that haproxy service is "enabled-active".
    Checked that "admin-ep-cert.pem" exists in /etc/ssl/private.
    Configured and ran stx-monitor.
    Verified that the stx-monitor is working by bringing up the
    Kibana dashboard.

    Closes-Bug: 1876378

    Change-Id: I847e246e4d1e915cebf1c9c2cb05e82a47ab28bd
    Signed-off-by: Carmen Rata <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.