kube cert rotation cronjob failed for kubernetes v1.21.3

Bug #1948719 reported by Reinildes Oliveira
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Reinildes Oliveira

Bug Description

Brief Description
-------------------
kube-cert-rotation cronjob failed if kubernetes version is v1.21.3, generating alarm 250.003 (Kubernetes certificates renewal failed.)

Severity
-------------------
Major: System/Feature is usable but degraded

Steps to Reproduce
-------------------
Install a fresh starlingx load and ensure kubernetes version is v1.21.3.

Let the system run over night (so that the kube cert rotation cronjob runs during midnight).

Expected Behavior
-------------------
The cronjob runs successfully and no 250.003 alarm is raised.

Actual Behavior
-------------------
The cronjob failed and 250.003 alarm is raised.

Reproducibility
-------------------
100%

System Configuration
-------------------
AIO DX with kubernetes v1.21.31

Branch/Pull Time/Commit
-------------------
SW_VERSION="21.12"
BUILD_DATE="2021-10-22 00:08:06 -0400"

Last Pass
-------------------
Pass with kubernetes v1.18

Timestamp/Logs
-------------------
[root@controller-0 changes(keystone_admin)]# fm alarm-list
------------------------------------------------++----------------------------------------
Alarm ID Reason Text Entity ID Severity Time Stamp

------------------------------------------------++----------------------------------------
250.003 Kubernetes certificates renewal failed. host=controller-0 major 2021-10-25T1
            5:11:10.
            131002

250.003 Kubernetes certificates renewal failed. host=controller-1 major 2021-10-25T0
            0:10:01.
            845339

------------------------------------------------++----------------------------------------

Alarms
-------------------
Alarm 250.003 is raised.

Test Activity
-------------------
Developer Testing

Workaround
-------------------
Update /usr/bin/kube-cert-rotation.sh on both controllers, remove "alpha" from kubeadm certs commands.

Changed in starlingx:
assignee: nobody → Reinildes Oliveira (rjosemat)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/815383

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.6.0 stx.containers
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/815383
Committed: https://opendev.org/starlingx/config/commit/c7923a5b33eb0235b6e1c263014c23f2b16bc07d
Submitter: "Zuul (22348)"
Branch: master

commit c7923a5b33eb0235b6e1c263014c23f2b16bc07d
Author: Rei Oliveira <email address hidden>
Date: Mon Oct 25 17:53:14 2021 -0300

    kubeadm alpha command causing script to fail

    kube-cert-rotation cronjob fails if kubernetes version is v1.21.3,
    generating alarm 250.003 (Kubernetes certificates renewal failed.)

    Script is failing because "kubeadm alpha certs" is no longer available.
    In Kubernetes v1.21.3 the 'kubeadm certs' command went from alpha to GA.

    This change replaces 'kubeadm alpha certs' with 'kubeadm certs',
    while keeping 'kubeadm alpha certs' as failover.

    It also adds a check if alarm exists before attempting to delete it.

    Test plan:

    PASS: Verify that the script runs without fail
    PASS: Verify that after the overnight job no alarm 250.003 is raised

    Closes-Bug: 1948719
    Signed-off-by: Rei Oliveira <email address hidden>
    Change-Id: I3d6bfb0ade8a0d23a4393663ffde0a87e9dafd58

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/832871

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on config (master)

Change abandoned by "Reinildes Oliveira <email address hidden>" on branch: master
Review: https://review.opendev.org/c/starlingx/config/+/832871
Reason: duplicate

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.