K8S runtime - Add valid parameter with invalid argument is crashing cluster

Bug #1992207 reported by Jorge Saffe
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Jorge Saffe

Bug Description

Brief Description
------
K8S runtime - Add valid parameter with invalid argument is crashing cluster.

Severity
-----
Critical: System/Feature is not usable after the defect.

Steps to Reproduce
-----
1) Install latest build
2) system service-parameter-add kubernetes kube_apiserver audit-log-compress=22
3) system service-parameter-list | grep kube_apiserver
4) system service-parameter-apply kubernetes

Expected Behavior
-----
Kubernetes should discard that change and continue running without any problem.

Actual Behavior
------
Kubernetes cluster is going down and never recover from that incident.

[sysadmin@controller-0 ~(keystone_admin)]$ system service-parameter-add kubernetes kube_apiserver audit-log-compress=22
-------------------------------------------------+

Property Value
-------------------------------------------------+

uuid 2b5142de-f482-4c2c-a4d5-755ba704e2a1
service kubernetes
section kube_apiserver
name audit-log-compress
value 22
personality None
resource None
-------------------------------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$ system service-parameter-list | grep kube_apiserver

2b5142de-f482-4c2c-a4d5-755ba704e2a1 kubernetes kube_apiserver audit-log-compress 22 None None
88188d56-8a16-43ea-a658-b409cc9ea2df kubernetes kube_apiserver audit-log-maxage 3 None None
df564b78-dd2d-4f15-8339-8281bd14bcca kubernetes kube_apiserver audit-log-maxbackup 10 None None
c3fc1f94-eaff-47de-a2be-d0cbe93dad4e kubernetes kube_apiserver audit-log-maxsize 100 None None
8d5d7149-49e5-422d-946c-be443f8b67b7 kubernetes kube_apiserver audit-log-path /var/log/kubernetes/audit/audit.log None None
[sysadmin@controller-0 ~(keystone_admin)]$ system service-parameter-apply kubernetes
Applying kubernetes service parameters
[sysadmin@controller-0 ~(keystone_admin)]$ date
mié sep 21 10:57:32 UTC 2022
[sysadmin@controller-0 ~(keystone_admin)]$ kubectl get pod -A
The connection to the server 192.168.206.1:6443 was refused - did you specify the right host or port?
[sysadmin@controller-0 ~(keystone_admin)]$ date
mié sep 21 11:02:14 UTC 2022
[sysadmin@controller-0 ~(keystone_admin)]$ kubectl get pod -A
The connection to the server 192.168.206.1:6443 was refused - did you specify the right host or port?
[sysadmin@controller-0 ~(keystone_admin)]$ date
mié sep 21 11:03:29 UTC 2022
[sysadmin@controller-0 ~(keystone_admin)]$ kubectl get pod -A
The connection to the server 192.168.206.1:6443 was refused - did you specify the right host or port?

Reproducibility
-----
Reproducible.

System Configuration
-----
AIO-SX

Alarms
----
NTR

Test Activity
----
Feature Testing.

Workaround
-----
NTR

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/860752

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
tags: added: stx.config stx.containers
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/860752
Committed: https://opendev.org/starlingx/stx-puppet/commit/1da873a6b116ad1de4f3ee9cc31a03c4c4644c01
Submitter: "Zuul (22348)"
Branch: master

commit 1da873a6b116ad1de4f3ee9cc31a03c4c4644c01
Author: Jorge Saffe <email address hidden>
Date: Fri Oct 7 17:08:42 2022 -0400

    Fix error add parameter with invalid arg in K8s config

    These changes fix the use case "add a valid
    parameter with invalid argument" for kubernetes
    custom configuration support. Currently, after
    apply the changes, the Kubernetes cluster
    crashes and won't recover.

    It is necessary to restart kubelet because when an wrong
    configuration is set in any of the k8s control plane
    components, kubelet makes a maximum of 5 attempts to
    restart the erroneous component. If we reach the limit,
    no matter if we correct the configuration during the
    automatic recovery, the container does not starts again.

    The puppet timeout is increased to contemplate the
    time to update the different k8s control-plane components
    and the automatic recovery process, if necessary.

    Test Plan:
    * CENTOS and DEBIAN distro:
      - Fresh Install with AIO-SX and DX/STD.
      - Add new valid parameter with invalid arguments.
      - Apply changes on kubernetes service.
      - Verify cluster health and configuration.

    Closes-Bug: 1992207

    Signed-off-by: Jorge Saffe <email address hidden>
    Change-Id: Ieb17f9ff7359813f60b8baffb3d8f53fe1e2d7f8

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Jorge Saffe (jsaffe)
importance: Undecided → Medium
tags: added: stx.8.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.