kubernetes certificates renewal failure on controller-1

Bug #2029506 reported by Reinildes Oliveira
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Reinildes Oliveira

Bug Description

Brief Description
----------------------------------------

Alarm 250.003, kubernetes certificates renewal failed is raised for controller-1 on multiple labs at the exact same time. This is causing multiple failures over many automated suites.

Severity
----------------------------------------

Major: System is usable but degraded

Steps to Reproduce
----------------------------------------

Install master debian build 2023-07-27_18-00-35

wait for k8s certificates to expire on controller-1

Expected Behavior
----------------------------------------

all k8s certificates should get renewed without issues

Actual Behavior
----------------------------------------

k8s certificates are fail to be renewed on controller-1

Reproducibility
----------------------------------------

Seen once

System Configuration
----------------------------------------

Issue seen on:

dx, multi-node

Load info (eg: 2022-03-10_20-00-07)

all mentioned labs shared same load

SW_VERSION="23.09"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="2023-07-27_18-00-35"

Last Pass
----------------------------------------

Not applicable. Alarm was not seen during previous weekend regression

Alarms
----------------------------------------

[sysadmin@controller-1 ~(keystone_admin)]$ fm alarm-list
+----------+-------------------------------------------------------------------------+-------------------+----------+---------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+-------------------------------------------------------------------------+-------------------+----------+---------------+
| 250.001 | controller-0 Configuration is out-of-date. (applied: 58060d78-56be- | host=controller-0 | major | 2023-07-31T19 |
| | 46e1-9044-b3cac15cf0cb target: 6bec1cde-e23a-430b-b6c2-a919c4889614) | | | :17:26.857075 |
| | | | | |
| 250.003 | Kubernetes certificates renewal failed. | host=controller-1 | major | 2023-07-31T00 |
| | | | | :10:01.454623 |
| | | | | |
+----------+-------------------------------------------------------------------------+-------------------+----------+---------------+
Test Activity
----------------------------------------

Regression Testing

Workaround
----------------------------------------

No workaround known so far

Changed in starlingx:
assignee: nobody → Reinildes Oliveira (rjosemat)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/890442

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/890442
Committed: https://opendev.org/starlingx/config/commit/8170a07bca3e7e5c425240174c89cc1354d96696
Submitter: "Zuul (22348)"
Branch: master

commit 8170a07bca3e7e5c425240174c89cc1354d96696
Author: Rei Oliveira <email address hidden>
Date: Thu Aug 3 14:07:17 2023 -0300

    Fix error when running k8s cert rotation on c1

    Review https://review.opendev.org/c/starlingx/config/+/884627
    introduced parameter --config /etc/kubernetes/kubeadm.yaml for
    kubeadm certs check-expiration, which generates a better output
    without warnings, that are print when run without.

    File /etc/kubernetes/kubeadm.yaml, however, is not present on
    controller-1 so this is not a good solution. This commit removes the
    usage of the parameter so that the script can work on both environments.

    This script runs as a cron job only on controller nodes.

    PASS: Trigger execution of script kube-cert-rotation.sh on c0, verify
          that it runs without error. Run 'fm alarm-list' and verify no
          'Kubernetes certificates renewal failed are present'
    PASS: Trigger execution of script kube-cert-rotation.sh on c1, verify
          that it runs without error. Run 'fm alarm-list' and verify no
          'Kubernetes certificates renewal failed are present'

    Closes-Bug: 2029506
    Closes-Bug: 2029378

    Signed-off-by: Rei Oliveira <email address hidden>
    Change-Id: I1ea458163dc4250a3bc3550eaaa68d314224023e

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.9.0 stx.fault stx.security
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.