Alarm 500.200 "expiring soon" alarm logic does not properly handle case where certificate is renewing quickly (few days)

Bug #1997037 reported by Karla Felix
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Karla Felix

Bug Description

f a TLS SECRET is being renewed quickly (e.g. 3 days ... anything less than 30 days),
the 500.200 Alarm "Certificate namespace=kubevirt, secret=kubevirt-virt-handler-certs is expiring soon on 2022-06-27, 18:48:00"
reports the INCORRECT expiry date in its alarm text.

The expiry date in text appears to be the date of the 'original' time the alarm was raised.
I suspect that the logic to check on status of certificate alarm ONLY checks if the certificate is still "soon to expire" (i.e. < 30 days to expire) ... but forgets to check if the expiry date actually changed and the alarm needs to be cleared and re-generated.

This was seen on cumulus-2 with Kubevirt TLS Secrets.
Kubevirt is internally rotating/renewing its certificates every 3 days.
See details below:

    [sysadmin@controller-1 ~(keystone_admin)]$ date
    Mon Aug 29 14:08:45 UTC 2022
    [sysadmin@controller-1 ~(keystone_admin)]$ fm alarm-list
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Alarm ID Reason Text Entity ID Severity Time Stamp

    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
    800.001 Storage Alarm Condition: HEALTH_WARN. Please check 'ceph -s' for more details. cluster=ca845a9f- warning 2022-08-19T21:22:
                                                                                                              451e-4470-8af4-c462b756f64c 53.824296

    100.104 File System threshold exceeded ; threshold 80.00%, actual 80.26% host=controller-1.filesystem=/var/lib major 2022-08-08T17:51:
                                                                                                              /docker-distribution 00.239616

    | 500.200 | Certificate namespace=kubevirt, secret=kubevirt-virt-handler-certs is expiring soon on | namespace=kubevirt.secret=kubevirt- | major | 2022-06-26T23:14: |
    | | 2022-06-27, 18:48:00 | virt-handler-certs | | 47.806981 |

    500.200 Certificate namespace=kubevirt, secret=kubevirt-ca is expiring soon on 2022-07-02, 23:35:59 namespace=kubevirt.secret=kubevirt-ca major 2022-06-26T23:14:
                                                                                                                                                                47.606443

    500.200 Certificate namespace=cdi, secret=cdi-apiserver-signer is expiring soon on 2022-06-27, 23:48: namespace=cdi.secret=cdi-apiserver- major 2022-06-26T23:14:
               33 signer 47.245515

    500.200 Certificate namespace=kubevirt, secret=kubevirt-operator-certs is expiring soon on 2022-06-27 namespace=kubevirt.secret=kubevirt- major 2022-06-26T23:14:
               , 18:48:00 operator-certs 46.442284

    500.200 Certificate namespace=kubevirt, secret=kubevirt-controller-certs is expiring soon on namespace=kubevirt.secret=kubevirt- major 2022-06-26T23:14:
               2022-06-27, 18:47:59 controller-certs 46.241288

    500.200 Certificate namespace=kubevirt, secret=kubevirt-virt-api-certs is expiring soon on 2022-06-27 namespace=kubevirt.secret=kubevirt- major 2022-06-26T23:14:
               , 18:47:59 virt-api-certs 45.799556

    500.200 Certificate namespace=cdi, secret=cdi-apiserver-server-cert is expiring soon on 2022-06-27, namespace=cdi.secret=cdi-apiserver- major 2022-06-26T23:14:
               11:49:04 server-cert 45.238404

    500.200 Certificate namespace=cdi, secret=cdi-uploadproxy-server-cert is expiring soon on 2022-06-27, namespace=cdi.secret=cdi-uploadproxy- major 2022-06-26T23:14:
               11:49:04 server-cert 44.795763

    500.200 Certificate namespace=kubevirt, secret=kubevirt-virt-handler-server-certs is expiring soon on namespace=kubevirt.secret=kubevirt- major 2022-06-26T23:14:
               2022-06-27, 18:47:59 virt-handler-server-certs 44.594887

    500.200 Certificate namespace=cdi, secret=cdi-uploadserver-client-cert is expiring soon on 2022-06-27 namespace=cdi.secret=cdi- major 2022-06-26T23:14:
               , 11:49:05 uploadserver-client-cert 44.394256

    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
    [sysadmin@controller-1 ~(keystone_admin)]$

    [sysadmin@controller-1 ~(keystone_admin)]${color:#de350b} ./show-cert.sh -n kubevirt -s kubevirt-virt-handler-certs
    Secret: kubevirt / kubevirt-virt-handler-certs

    Certificate:
        Data:
            Version: 3 (0x2)
            Serial Number: 4016837071489997431 (0x37beac08cec81a77)
        Signature Algorithm: sha256WithRSAEncryption
            Issuer: CN=kubevirt.io
            Validity
                Not Before: Aug 26 13:59:59 2022 GMT
                Not After : Aug 29 23:36:00 2022 GMT
            Subject: CN=kubevirt.io:system:client:virt-handler
            Subject Public Key Info:
                Public Key Algorithm: rsaEncryption
                    Public-Key: (2048 bit)
                    Modulus:
                        00:cb:20:53:4b:ca:6c:34:6d:71:3f:6c:ef:89:86:
                        4b:bb:ce:dc:98:7b:25:a8:48:48:1b:ef:34:77:f2:
                        43:65:a5:18:8b:93:34:a2:17:15:3b:32:4f:e4:bf:
                        da:66:62:c7:2a:32:3f:50:8f:5d:ba:7f:08:66:57:
                        a4:2c:e3:6f:2e:e9:19:97:21:c2:24:61:98:75:69:
                        4b:0c:fd:af:31:ef:b5:70:50:13:f5:a9:9b:54:ae:
                        15:4d:f1:ed:83:48:2b:da:09:77:09:3c:ab:39:70:
                        56:be:db:9c:1c:5c:24:00:1d:8f:4b:f7:77:a9:e1:
                        6c:54:75:09:8d:42:7d:f0:c5:87:62:7e:bd:f7:48:
                        be:a7:f0:34:e3:b9:5b:41:61:4d:c6:44:b9:28:4d:
                        2c:4f:66:37:1c:9d:ff:b7:98:3e:56:d6:20:dd:1b:
                        d2:a0:52:7d:52:7a:2b:0a:17:af:7c:88:75:87:3b:
                        dc:5e:17:26:a7:33:9e:06:2b:29:ee:5d:49:f4:96:
                        f0:f1:66:19:17:55:71:af:0a:36:d4:21:e9:67:30:
                        24:16:8f:5c:16:67:b7:32:4e:f7:87:d7:1d:0d:10:
                        43:0a:00:18:6e:aa:09:05:0a:2a:67:d9:68:df:5c:
                        e2:7f:d1:86:3c:34:b5:e0:11:4e:3a:05:c0:7d:77:
                        91:1d
                    Exponent: 65537 (0x10001)
            X509v3 extensions:
                X509v3 Key Usage: critical
                    Digital Signature, Key Encipherment
                X509v3 Extended Key Usage:
                    TLS Web Client Authentication
                X509v3 Authority Key Identifier:
                    keyid:F0:69:6D:24:D3:D4:A8:75:D7:9E:3F:11:B9:EB:32:87:33:C2:23:B4

        Signature Algorithm: sha256WithRSAEncryption
             69:b4:2b:a8:20:e0:e2:e2:fd:7c:41:6a:05:df:be:37:8a:9c:
             78:6c:3f:58:cb:17:bb:9b:9b:59:b5:6b:84:3b:84:d4:c6:b0:
             ca:be:69:47:35:76:6b:5d:1d:01:50:b1:0b:14:76:9f:c4:a3:
             a0:b8:ed:6d:0d:c9:d3:33:54:3b:36:1f:1e:c8:f8:89:3f:dd:
             7d:ec:42:96:60:ad:9d:f3:ec:d1:f7:8d:dd:ab:0c:73:26:69:
             09:94:a5:23:8c:c7:b7:e4:7c:a6:03:46:cf:a7:52:45:e3:cc:
             1a:12:ea:76:7d:61:3e:55:5f:60:2e:c6:ed:4f:d0:16:60:c8:
             f9:21:87:cd:73:ab:18:68:0d:0c:ec:0c:15:5e:00:e8:6d:45:
             bb:64:2b:d3:eb:ec:4d:71:79:db:fc:57:38:40:f1:32:ba:63:
             ed:94:ce:ee:3c:47:73:69:60:a4:bc:82:0c:b0:25:7b:4d:61:
             94:d3:d1:b1:c4:62:c8:69:63:ad:e5:27:79:7b:89:06:99:26:
             f0:8f:1c:e3:26:de:3b:9a:f7:91:61:90:af:fc:40:1c:af:77:
             9c:b0:97:95:1a:56:f0:44:86:25:ab:15:95:89:01:f0:c6:6d:
             fe:28:ee:08:0b:24:bd:b0:02:e9:8c:5a:59:00:de:32:38:2c:
             7a:f0:f9:44
    [sysadmin@controller-1 ~(keystone_admin)]$

Severity

<Minor: System/Feature is usable with minor issue>

Steps to Reproduce

You really just need to

    create a TLS Secret ... with a certificate expiry in 3 days
    wait for expiring soon Alarm (? can't remember how often it checks ... every hour or every day ?)
    update TLS Secret with renewed certificate ... still to expire in 3 days
    wait for expiring soon Alarm logic to run again ... and verify that the certificate expiry date in the Alarm Text did not get updated

Expected Behavior

Original alarm should be cleared and new alarm should get raised with new expiry date in text

Actual Behavior

Original alarm is left SET ... which is sort of correct as the alarm is still valid, the certificate is still expiring soon ... but the text does not have the correct expiry date.

Reproducibility

100% reproducible

System Configuration

Any/All

Load info (eg: 2022-03-10_20-00-07)

[sysadmin@controller-1 ~(keystone_admin)]$ cat /etc/build.info

SW_VERSION="21.12"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="2022-01-10_02-00-35"

Last Pass

Probably day one problem.

Timestamp/Logs

See description

Alarms

See description

Test Activity

CUMULUS

Workaround

Could configure annotations on SECRETs to disable alarming of these quickly renewing certificates.

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/864901
Committed: https://opendev.org/starlingx/config/commit/cea00af70d67ec1ffab41e26bcc6219aac996bdf
Submitter: "Zuul (22348)"
Branch: master

commit cea00af70d67ec1ffab41e26bcc6219aac996bdf
Author: Karla Felix <email address hidden>
Date: Thu Nov 17 10:33:44 2022 -0300

    Alarm 500.200 "expiring soon" not updating after change

    This issue was been caused because when renewing the certificate
    it wasn't updating fields, like "Reason Text". To fix it,
    this change will delete the expiring soon alarm and replace it by
    with the info of the new certificate, if it is necessary raising a new
    expiring soon alarm.

    Test Plan:

    PASS: Renew a certificate with "expiring soon" alarm, and verify if the
          time registered in 'cert-alarm.log' and "Reason Text" in fm-alarm
          list match with the time "Not After" in the certificate
          information.
    PASS: Renew a expired certificate with a certificate with less than 3
          days to expire and verify if the expired alarm is deleted
          and replaced by the "expiring soon" alarm.
    PASS: Delete the certificate of the "expiring soon" alarm and check if
          the alarm is deleted.

    Closes-Bug: 1997037
    Change-Id: I0f724566de10e66ab95f6ba862836f1fc4e49f32
    Signed-off-by: Karla Felix <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Karla Felix (kkarolin)
importance: Undecided → Low
tags: added: stx.8.0 stx.fault
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.