Alarm 500.200 is raised before the alarm-before window

Bug #2056071 reported by ayyappa
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
ayyappa

Bug Description

Brief Description
-----------------
Alarm 500.200 is raised before the alarm-before window

Severity
--------
major

Steps to Reproduce
------------------
Consider the following values for certificate managed by cert-manager

alarm-before = 30days(default value set by system for all certs)
renew-before = 14days 2 hours
duration = 182days 12 hours

since the renew-before < alarm-before, the threshold to raise the alarm is set to 14days(audit only considers days), notice the timestamp of alarm in timestamp/logs section "2024-02-28T06:48:16.388415"(the full audit might have triggered which has 24h interval or cert-alarm service restarted from node reboot, lock/unlock) for an alarm that is expiring on "2024-03-13, 23:59:34" the difference is 14days(14 days, 17 hours, 11 minutes and 18 seconds exactly but we only consider days), hence the cert-alarm service started raising this alarm.

And the certificate got renewed around Wed Feb 28 16:51:48 2024(UTC time is 21:51:48) by cert-manager, so when the next hourly audit ran the alarm got cleared.

so user will notice the alarm between Wed feb 28 6:48 to Web feb 28 16:51+next hourly audit scheduled = 12hours approximately on Wed feb 28th

Expected Behavior
------------------
alarm should not be raised before the renew-before time

Actual Behavior
----------------
alarm raised before the renew-before time

Reproducibility
---------------
100%

System Configuration
--------------------
all lab types
stx 8.0

Branch/Pull Time/Commit
-----------------------
na

Last Pass
---------
na

Timestamp/Logs
--------------
~(keystone_admin)]$ fm alarm-list
+----------+-----------------------------------------------------------------------------------------------+--------------------------------+----------+-------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+-----------------------------------------------------------------------------------------------+--------------------------------+----------+-------------------+
| 500.200 | Certificate namespace=monitor, certificate=mon-elastic-services-extca-crt is expiring soon on | namespace=monitor.certificate= | major | 2024-02-28T06:48: |
| | 2024-03-13, 23:59:34 | mon-elastic-services-extca-crt | | 16.388415 |
| | | | | |
| 500.200 | Certificate namespace=monitor, certificate=mon-elastic-services-ca-crt is expiring soon on | namespace=monitor.certificate= | major | 2024-02-28T06:48: |
| | 2024-03-13, 23:59:34 | mon-elastic-services-ca-crt | | 15.825511 |
| | | | | |
+----------+-----------------------------------------------------------------------------------------------+--------------------------------+----------+-------------------+

Test Activity
-------------
debugging request

Workaround
----------
not required

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/910994

Changed in starlingx:
status: New → In Progress
ayyappa (mantri425)
description: updated
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/910994
Committed: https://opendev.org/starlingx/config/commit/ce7f87aeb0515128cadafa7b5f6d90415222190a
Submitter: "Zuul (22348)"
Branch: master

commit ce7f87aeb0515128cadafa7b5f6d90415222190a
Author: amantri <email address hidden>
Date: Mon Mar 4 14:22:35 2024 -0500

    Change cert-alarm service audit behavior

    Cert-alarm audit only considering days while comparing the alarm_before
    ,renew_before and expiry times this leaves a window for few hours where
    an alarm is raised before the renew_before time of the certificate.
    This change addresses this issue by considering hours,mins
    along with days.

    TestCases:
    PASS: Create a certificate with duration 3hr, renewbefore 2h30min
          now wait for 15mins and run full audit and verify that no alarm
          is raised since expiry(2hr45min)> threshold(2hr30min)
    PASS: Create a certificate with duration 3hr,renewbefore 2h30min.
          delete the issuer which issued the certificate, after 30mins
          the certificate renew fails then the expiry of the certificate
          becomes less than threshold which is 2h30min, restart cert-alarm
          service to run the full audit, notice an alarm 500.200 is raised
          for this certificate, let it expire and notice that 500.200 is
          cleared and 500.210 expired alarm is raised,create the issuer
          and notice that 500.210 alarm cleared when active alarm audit
          runs.
    PASS: Install a ssl_ca certificate which expires in 1 day, notice that
          an alarm 500.200 is raised and let it expire, notice that
          500.210 alarm is raised and 500.200 is cleared on this
          certificate.

    Closes-Bug: 2056071

    Change-Id: I4f1a866d101d0b8d8cb50f1bf5a2e6698511296a
    Signed-off-by: amantri <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
assignee: nobody → ayyappa (mantri425)
tags: added: stx.9.0 stx.fault stx.security
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.