Cannot run cert update through "sw-manager" when there is an active "cert kube-root-ca expiring" alarm
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Al Bailey |
Bug Description
Brief Description
-----------------
Simulate a scenario where the cert is expiring in 30days which the system raises an active alarm and now user cannot run cert update through "sw-manager" when there is an active "cert kube-root-ca" alarm
Severity
--------
Minor (breaks this feature but nothing else)
Steps to Reproduce
------------------
Create a cert that expires in 30days
2)now update k8s certs using the following
sw-manager kube-rootca-
sw-manager kube-rootca-
sw-manager kube-rootca-
2)System will raise the following major alarm
Every 2.0s: fm alarm-list Thu Oct 21 19:06:04 2021
+------
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+------
| 500.200 | Certificate kubernetes-root-ca is expiring soon on 2021-11-20, 18:09:10 | system.certificate. | major | 2021-10-21T18:53: |
| | | kubernetes-root-ca | | 45.236210 |
| | | | | |
+------
3)Now if the user wants to update the cert using VIM, the system complains about the active major alarm
n@controller-1 ~(keystone_admin)]$ watch fm alarm-list create --cert-file ca_with_key
Operation failed: conflict detected
[sysadmin@
Strategy Kubernetes RootCA Update Strategy:
strategy-uuid: 53c9ff3c-
controller-
storage-
worker-
default-
alarm-
current-phase: build
current-
state: build-failed
build-result: failed
build-reason: active alarms present [ 500.200 ]
[sysadmin@
Expected Behavior
------------------
system should allow the update if the alarm is regarding cert expiry
Actual Behavior
----------------
system blocks the update
Reproducibility
---------------
100% (as long as you wait for the alarm to be raised)
System Configuration
-------
Any
Branch/Pull Time/Commit
-------
Oct 19 2021
Last Pass
---------
It would have passed prior to the alarm code merging (Sept 24 2021)
Timestamp/Logs
--------------
ef-ab9b-
2021-10-
2021-10-
2021-10-
2021-10-
2021-10-
2021-10-
2021-10-
2021-10-
2021-10-
2021-10-
2021-10-
Test Activity
-------------
Feature Testing
Workaround
----------
Manually clear the alarm
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.6.0 stx.nfv |
The bug is in the NFV code. The 2 types of alarms need to be added to the ignore list which the query alarms code runs.
There is no recovery from the 'expired' case, but it probably still makes sense to ignore that alarm, and let the sysinv health check report the failure.