Alarm 500.200 "expiring soon" alarm logic does not properly handle case where certificate is renewing quickly (few days)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Low
|
Karla Felix |
Bug Description
f a TLS SECRET is being renewed quickly (e.g. 3 days ... anything less than 30 days),
the 500.200 Alarm "Certificate namespace=kubevirt, secret=
reports the INCORRECT expiry date in its alarm text.
The expiry date in text appears to be the date of the 'original' time the alarm was raised.
I suspect that the logic to check on status of certificate alarm ONLY checks if the certificate is still "soon to expire" (i.e. < 30 days to expire) ... but forgets to check if the expiry date actually changed and the alarm needs to be cleared and re-generated.
This was seen on cumulus-2 with Kubevirt TLS Secrets.
Kubevirt is internally rotating/renewing its certificates every 3 days.
See details below:
[sysadmin@
Mon Aug 29 14:08:45 UTC 2022
[sysadmin@
---
Alarm ID Reason Text Entity ID Severity Time Stamp
---
800.001 Storage Alarm Condition: HEALTH_WARN. Please check 'ceph -s' for more details. cluster=ca845a9f- warning 2022-08-19T21:22:
100.104 File System threshold exceeded ; threshold 80.00%, actual 80.26% host=controller
| 500.200 | Certificate namespace=kubevirt, secret=
| | 2022-06-27, 18:48:00 | virt-handler-certs | | 47.806981 |
500.200 Certificate namespace=kubevirt, secret=kubevirt-ca is expiring soon on 2022-07-02, 23:35:59 namespace=
500.200 Certificate namespace=cdi, secret=
33 signer 47.245515
500.200 Certificate namespace=kubevirt, secret=
, 18:48:00 operator-certs 46.442284
500.200 Certificate namespace=kubevirt, secret=
500.200 Certificate namespace=kubevirt, secret=
, 18:47:59 virt-api-certs 45.799556
500.200 Certificate namespace=cdi, secret=
500.200 Certificate namespace=cdi, secret=
500.200 Certificate namespace=kubevirt, secret=
500.200 Certificate namespace=cdi, secret=
, 11:49:05 uploadserver-
---
[sysadmin@
[sysadmin@
Secret: kubevirt / kubevirt-
Certificate:
Data:
Serial Number: 4016837071489997431 (0x37beac08cec8
Signature Algorithm: sha256WithRSAEn
Issuer: CN=kubevirt.io
Not Before: Aug 26 13:59:59 2022 GMT
Not After : Aug 29 23:36:00 2022 GMT
Subject Public Key Info:
X509v3 extensions:
Signature Algorithm: sha256WithRSAEn
[sysadmin@
Severity
<Minor: System/Feature is usable with minor issue>
Steps to Reproduce
You really just need to
create a TLS Secret ... with a certificate expiry in 3 days
wait for expiring soon Alarm (? can't remember how often it checks ... every hour or every day ?)
update TLS Secret with renewed certificate ... still to expire in 3 days
wait for expiring soon Alarm logic to run again ... and verify that the certificate expiry date in the Alarm Text did not get updated
Expected Behavior
Original alarm should be cleared and new alarm should get raised with new expiry date in text
Actual Behavior
Original alarm is left SET ... which is sort of correct as the alarm is still valid, the certificate is still expiring soon ... but the text does not have the correct expiry date.
Reproducibility
100% reproducible
System Configuration
Any/All
Load info (eg: 2022-03-
[sysadmin@
SW_VERSION="21.12"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID=
Last Pass
Probably day one problem.
Timestamp/Logs
See description
Alarms
See description
Test Activity
CUMULUS
Workaround
Could configure annotations on SECRETs to disable alarming of these quickly renewing certificates.
Changed in starlingx: | |
status: | New → In Progress |
Changed in starlingx: | |
assignee: | nobody → Karla Felix (kkarolin) |
importance: | Undecided → Low |
tags: | added: stx.8.0 stx.fault |
Reviewed: https:/ /review. opendev. org/c/starlingx /config/ +/864901 /opendev. org/starlingx/ config/ commit/ cea00af70d67ec1 ffab41e26bcc621 9aac996bdf
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit cea00af70d67ec1 ffab41e26bcc621 9aac996bdf
Author: Karla Felix <email address hidden>
Date: Thu Nov 17 10:33:44 2022 -0300
Alarm 500.200 "expiring soon" not updating after change
This issue was been caused because when renewing the certificate
it wasn't updating fields, like "Reason Text". To fix it,
this change will delete the expiring soon alarm and replace it by
with the info of the new certificate, if it is necessary raising a new
expiring soon alarm.
Test Plan:
PASS: Renew a certificate with "expiring soon" alarm, and verify if the
information.
time registered in 'cert-alarm.log' and "Reason Text" in fm-alarm
list match with the time "Not After" in the certificate
PASS: Renew a expired certificate with a certificate with less than 3
days to expire and verify if the expired alarm is deleted
and replaced by the "expiring soon" alarm.
PASS: Delete the certificate of the "expiring soon" alarm and check if
the alarm is deleted.
Closes-Bug: 1997037 6ab95f6ba862836 f1fc4e49f32
Change-Id: I0f724566de10e6
Signed-off-by: Karla Felix <email address hidden>