Certificate expiry alarm updates result in clearing/re-raising the alarm

Bug #2046374 reported by ayyappa
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
ayyappa

Bug Description

Brief Description
-----------------
Certificate alarm update result in clearing/raising the alarm

Severity
--------
major

Steps to Reproduce
------------------
1)create a cert-manager certificate with duration to 1h and renewbefore to 55m
2)when is cert-alarm audit is run, the system will generate a certificate expiry alarm 500.200 type
3)every 5 mins, the alarm is cleared and reraised instead of update it

Expected Behavior
------------------
alarm update should not clear,reraise the alarm

Actual Behavior
----------------
alarm update is clearing,reraising the alarm

Reproducibility
---------------
100%

System Configuration
--------------------
all configurations

Branch/Pull Time/Commit
-----------------------
na

Last Pass
---------
na

Timestamp/Logs
--------------
na

Test Activity
-------------
general use

Workaround
----------
not required

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/903617

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fault (master)

Change abandoned by "ayyappa <email address hidden>" on branch: master
Review: https://review.opendev.org/c/starlingx/fault/+/903359
Reason: Decided not to go forward with this change.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/903617
Committed: https://opendev.org/starlingx/config/commit/6ddcaaa0ce72fbea31cd3fa82ce14999ac3d774e
Submitter: "Zuul (22348)"
Branch: master

commit 6ddcaaa0ce72fbea31cd3fa82ce14999ac3d774e
Author: amantri <email address hidden>
Date: Wed Dec 13 14:29:28 2023 -0500

    Set the alarm to update the reason text

    On alarm audit, we are clearing and re-raising the
    same alarm to update the reason text, this is overwhelming
    the system with alarms on every audit. This change addresses
    this issue by doing a set request to update the alarm instead
    of clear,set and also only raises the alarm if the reason text
    is changed.

    Test Cases:
    PASS: Create a certificate that expires in 30days, run
          full audit to raise the alarm and verify that on
          active alarm audit, system is replacing the exiting
          alarm with new alarm and corresponding event is raised
          when the reason text is changed.
    PASS: Create a certificate that expires with in 30days,
          run the full cert-alarm audit to raise the alarm.
          now run the hourly audit and verify that the system
          is not raising an alarm if the alarm reason text is
          not changed.
    PASS: Create multiple certificates, one that changes reason
          text periodically and the other never changes its
          reason text, monitor active alarm audit and verify
          that set alarm request is raised only on the
          certificate where the reason text is changed and
          for the other certificate no alarm is raised.
    PASS: Create a certificate that expires with in 30days,
          run the full cert-alarm audit to raise the alarm.
          now adjust duration >30days and renewbefore to 15days
          of the certificate, observe on active alarm audit
          the alarm is cleared and a corresponding clear event
          is raised.
    PASS: Build the ISO with changes and install the lab,run all
          the above testcases have successfully passed

    Closes bug: 2046374

    Change-Id: Ia908bc5331b6a2af97d4995017f0b99293093094
    Signed-off-by: amantri <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.fault stx.security
Changed in starlingx:
assignee: nobody → ayyappa (mantri425)
Revision history for this message
Peng Peng (ppeng) wrote :

The issue was reproduced on
BUILD_ID="20240318T180056Z"
JOB="STX_9.0_build_debian"

TC: test_pvc_enable_additional_storage_classes

[2024-03-27 10:39:27,695] 349 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://10.80.34.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2024-03-27 10:39:27,746] 551 DEBUG MainThread ssh.exec_cmd:: Expecting \[.*@controller\-[01] .*\(keystone_admin\)\]\$ in prompt
[2024-03-27 10:39:30,702] 471 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+----------+----------------------------+
| 441abda5-9f28-4f5a-97b7-7e074333b87d | 500.210 | Certificate 'system certificate-show bdb24022-2489-49fd-939b-c1d7a716a3b5' (mode=docker_registry) expired. | system.certificate.mode=docker_registry.uuid=bdb24022-2489-49fd-939b-c1d7a716a3b5 | critical | 2024-03-27T10:35:44.122897 |
| ac5428aa-bf55-41ff-8c5b-1554469716e9 | 500.210 | Certificate 'system certificate-show 6363a567-e52e-4495-ae8f-801ffb698191' (mode=ssl) expired. | system.certificate.mode=ssl.uuid=6363a567-e52e-4495-ae8f-801ffb698191 | critical | 2024-03-27T10:35:43.620550 |
| 85631f0a-cd13-48c7-b755-6719d7a04e7c | 900.002 | Patch installation failed on the following hosts: controller-0

log
http://auto-dashboard-logs.wrs.com/logs/StarlingX/Regression/wrcp_sx_014/202403251634/case_424_test_pvc_enable_additional_storage_classes/case_424_test_pvc_enable_additional_storage_classes

Revision history for this message
Peng Peng (ppeng) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.