Old etcd certs/keys are copied when standby controller is lock/unlocked

Bug #2018317 reported by Andy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Andy

Bug Description

Brief Description
-----------------
When standby controller is locked then unlocked, the etcd certs/keys in /opt/platform/config/<release> directory are copied to /etc/etcd directory, overwritten the new certs/keys rotated by kubernetes certificate rotation cron job.

Severity
--------
Critical (if the certs in /opt/platform/config/<release> directory are expired, when they are copied to /etc/etcd directory, the cluster will stop working since etcd is using expired certs).

Steps to Reproduce
------------------
- Generate and replace etcd certs/keys with validation time shorter than 15 days.
- Wait until /usr/bin/kube-cert-rotation.sh to rotate etcd certs/keys (happens at midnight)
- Lock and then unlock standby controller
- Check certs/keys in /etc/etcd directory on standby controller

Expected Behavior
------------------
/etc/etcd contains the newly rotated certs/keys

Actual Behavior
----------------
/etc/etcd contains the certs/keys before rotation (copies from /opt/platform/config/<release> directory).

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
STX master

Last Pass
---------
Unknown

Timestamp/Logs
--------------
Reference to "Steps to Reproduce"

Test Activity
-------------
Developer Testing

Workaround
----------
Manually copy the newly rotated certs/keys from active controller.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/882095

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/882095
Committed: https://opendev.org/starlingx/config/commit/ce1056716b9e4137f9fdb85d7a7b8df6d742010a
Submitter: "Zuul (22348)"
Branch: master

commit ce1056716b9e4137f9fdb85d7a7b8df6d742010a
Author: Andy Ning <email address hidden>
Date: Mon May 1 13:10:47 2023 -0400

    Update etcd certs/keys shared with standby controller

    Currently the kubernetes certificate rotation cron job doesn't update
    the etcd certs/keys in /opt/platform/config/<release> directory when
    it rotates k8s certificates and keys.

    When standby controller is locked then unlocked, the etcd certs/keys
    in /opt/platform/config/<release> directory are copied to /etc/etcd
    directory, overwritten the newly rotated certs/keys. If the certs in
    /opt/platform/config/<release> directory are expired, the cluster will
    stop working since etcd is using expired certs.

    This change fixed this issue by updating the etcd cert/key copies in
    /opt/platform/config/<release> directory when rotating k8s certs.

    Test Plan:
    PASS: On a DX system, generate and replace etcd certs/keys with
          validation time shorter than 15 days, run the cert rotation
          scripts, verify etcd certs/keys in /etc/etcd directory are
          renewed, and the copies in /opt/platform/config/<release>
          directory are the same as the ones in active controller.
    PASS: Lock and unlock standby controller, verify that the etcd
          certs/keys are the same as the ones in active controller.
    PASS: Verify etcd is working fine with "kubectl get pod", then swact
          to standby controller, verify again etcd is working fine by
          "kubectl get pod".

    Closes-Bug: 2018317
    Signed-off-by: Andy Ning <email address hidden>
    Change-Id: I6e9fa8e5d862e7b5a3ae437d003ef65acdef77e6

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Andy (andy.wrs)
importance: Undecided → High
tags: added: stx.9.0 stx.security
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.