cert-rotation cron job doesn't renew certs in 3 conf files

Bug #1937288 reported by Andy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Andy

Bug Description

Brief Description
-----------------
cert-rotation cron job doesn't renew certs in admin.conf, scheduler.conf, controller-manager.conf as it does to other certificates. This could cause the certs in these conf files expire and kubectl stops working.

Severity
--------
<Critical: System/Feature is not usable due to the defect>

Steps to Reproduce
------------------
- Update apiserver.crt to have expiry date within 90 days (10 days for example)
- Let the system run over night (the kube cert rotation cron job runs every day at midnight)
- Check kube certificate expiration by
  kubeadm alpha certs check-expiration

Expected Behavior
------------------
The admin.conf should be updated and have 364 days before expiration.

Actual Behavior
----------------
The cert in admin.conf doesn't get renewed and still have the original expiry date.

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
stx master

Last Pass
---------
Unknown

Timestamp/Logs
--------------
/var/log/cron.log
2021-04-16T00:10:01.000 controller-1 CROND[3875739]: info (root) CMD (/usr/bin/kube-cert-rotation.sh)
2021-04-16T00:10:01.000 controller-1 CROND[3875840]: info (root) CMD (/usr/lib64/sa/sa1 1 1)
2021-04-16T00:10:01.000 controller-1 CROND[3875572]: info (root) CMDOUT ([renew] Reading configuration from the cluster...)
2021-04-16T00:10:01.000 controller-1 CROND[3875572]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml')
2021-04-16T00:10:01.000 controller-1 CROND[3875572]: info (root) CMDOUT ()
2021-04-16T00:10:02.000 controller-1 CROND[3875572]: info (root) CMDOUT (certificate for serving the Kubernetes API renewed)
2021-04-16T00:10:02.000 controller-1 CROND[3875572]: info (root) CMDOUT ([renew] Reading configuration from the cluster...)
2021-04-16T00:10:02.000 controller-1 CROND[3875572]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml')
2021-04-16T00:10:02.000 controller-1 CROND[3875572]: info (root) CMDOUT ()
2021-04-16T00:10:02.000 controller-1 CROND[3875572]: info (root) CMDOUT (certificate for the API server to connect to kubelet renewed)
2021-04-16T00:10:02.000 controller-1 CROND[3875572]: info (root) CMDOUT ([renew] Reading configuration from the cluster...)
2021-04-16T00:10:02.000 controller-1 CROND[3875572]: info (root) CMDOUT ([renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml')
2021-04-16T00:10:02.000 controller-1 CROND[3875572]: info (root) CMDOUT ()
2021-04-16T00:10:02.000 controller-1 CROND[3875572]: info (root) CMDOUT (certificate for the front proxy client renewed)
2021-04-16T00:10:02.000 controller-1 CROND[3875572]: info (root) CMDOUT (error: no configuration has been provided, try setting KUBERNETES_MASTER environment variable)
2021-04-16T00:10:02.000 controller-1 CROND[3875572]: info (root) CMDOUT (/usr/bin/kube-cert-rotation.sh: line 147: fmClientCli: command not found)

Test Activity
-------------
Developer Testing

Workaround
----------
Manually renew cert in admin.conf, scheduler.conf, controller-manager.conf

Andy (andy.wrs)
Changed in starlingx:
assignee: nobody → Andy (andy.wrs)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/802610

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fault (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/fault/+/802612

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/803651

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on config (master)

Change abandoned by "Andy Ning <email address hidden>" on branch: master
Review: https://review.opendev.org/c/starlingx/config/+/803651
Reason: Accidentally submitted this pre-mature review.

Ghada Khalil (gkhalil)
tags: added: stx.config stx.security
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/802610
Committed: https://opendev.org/starlingx/config/commit/a670080e07723cb904f0aea4a6b5c59b48e7cd0a
Submitter: "Zuul (22348)"
Branch: master

commit a670080e07723cb904f0aea4a6b5c59b48e7cd0a
Author: Andy Ning <email address hidden>
Date: Thu Jul 22 12:11:08 2021 -0400

    Fix cert rotation cron job not renewing conf files

    The kube certificate rotation cron job doesn't update
    admin.conf, scheduler.conf, controller-manager.conf as expected.
    This update fixed this issue and made several enhancements:
    - check the expiry date for each of the kubernetes certificates
      to be renewed by "kubeadm alpha certs check-expiration"
    - update the conf files by "kubeadm alpha renew", consistent with
      renewing the cert files.
    - restart sysinv-conductor, cert-mon after admin.conf is renewed
      since they use admin.conf for authentication.
    - specify absolute path for fmClientCli command.
    - re-struct the code with functions.
    - added checking/renewal of 3 etcd certificates.

    Change-Id: I8b2ff1b02651600f3a837e9f8a61ad50601ace9d
    Closes-Bug: 1937288
    Signed-off-by: Andy Ning <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.6.0
Changed in starlingx:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fault (master)

Reviewed: https://review.opendev.org/c/starlingx/fault/+/802612
Committed: https://opendev.org/starlingx/fault/commit/4ab96ad4114735366fc7841377465ea70e6e50b7
Submitter: "Zuul (22348)"
Branch: master

commit 4ab96ad4114735366fc7841377465ea70e6e50b7
Author: Andy Ning <email address hidden>
Date: Fri Jul 23 22:00:48 2021 -0400

    Update event.yaml for alarm 250.003

    This change updated 250.003 definition in events.yaml so
    it's consistent with the alarm actually raised in
    kube-cert-rotation.sh.

    Depends-On: https://review.opendev.org/c/starlingx/config/+/802610
    Closes-Bug: 1937288
    Signed-off-by: Andy Ning <email address hidden>
    Change-Id: I13404483c22ded4d35c613576f1d2020b31c1797

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/804933

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/804933
Committed: https://opendev.org/starlingx/config/commit/6b2e0e44cb64fabefe839f979d68aa106afc6198
Submitter: "Zuul (22348)"
Branch: master

commit 6b2e0e44cb64fabefe839f979d68aa106afc6198
Author: Andy Ning <email address hidden>
Date: Tue Aug 17 20:45:50 2021 -0400

    Skip renewal if the cert file doesn't exist

    Added checking to not attempt to renew a etcd certificate if it
    doesn't exist. This is especially neccessary during system upgrade,
    when the cron job could run while upgrade hasn't generated and copied
    etcd certificates into /etc/etcd. Otherwise the cron job will fail and
    raise an alarm.

    This update also removed the extra "/" from the certificate directory.

    Closes-Bug: 1937288
    Signed-off-by: Andy Ning <email address hidden>
    Change-Id: I758d270d4d675ff1e949b51dc10af6d52ff347d7

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.