Cannot run cert update through "sw-manager" when there is an active "cert kube-root-ca expiring" alarm

Bug #1948673 reported by Al Bailey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Al Bailey

Bug Description

Brief Description
-----------------
Simulate a scenario where the cert is expiring in 30days which the system raises an active alarm and now user cannot run cert update through "sw-manager" when there is an active "cert kube-root-ca" alarm

Severity
--------
Minor (breaks this feature but nothing else)

Steps to Reproduce
------------------
Create a cert that expires in 30days
2)now update k8s certs using the following

sw-manager kube-rootca-update-strategy create --cert-file ca_with_key.crt
sw-manager kube-rootca-update-strategy apply
sw-manager kube-rootca-update-strategy delete

2)System will raise the following major alarm

Every 2.0s: fm alarm-list Thu Oct 21 19:06:04 2021

+----------+-------------------------------------------------------------------------+----------------------------+----------+-------------------+
| Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+----------+-------------------------------------------------------------------------+----------------------------+----------+-------------------+
| 500.200 | Certificate kubernetes-root-ca is expiring soon on 2021-11-20, 18:09:10 | system.certificate. | major | 2021-10-21T18:53: |
| | | kubernetes-root-ca | | 45.236210 |
| | | | | |
+----------+-------------------------------------------------------------------------+----------------------------+----------+-------------------+

3)Now if the user wants to update the cert using VIM, the system complains about the active major alarm

n@controller-1 ~(keystone_admin)]$ watch fm alarm-list create --cert-file ca_with_key
Operation failed: conflict detected
[sysadmin@controller-1 ~(keystone_admin)]$ sw-manager kube-rootca-update-strategy show
Strategy Kubernetes RootCA Update Strategy:
  strategy-uuid: 53c9ff3c-288b-42ef-ab9b-f9eaaec59ef8
  controller-apply-type: serial
  storage-apply-type: serial
  worker-apply-type: serial
  default-instance-action: stop-start
  alarm-restrictions: strict
  current-phase: build
  current-phase-completion: 100%
  state: build-failed
  build-result: failed
  build-reason: active alarms present [ 500.200 ]
[sysadmin@controller-1 ~(keystone_admin)]$

Expected Behavior
------------------
system should allow the update if the alarm is regarding cert expiry

Actual Behavior
----------------
system blocks the update

Reproducibility
---------------
100% (as long as you wait for the alarm to be raised)

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
Oct 19 2021

Last Pass
---------
It would have passed prior to the alarm code merging (Sept 24 2021)

Timestamp/Logs
--------------
ef-ab9b-f9eaaec59ef8, reason=.
2021-10-21T18:59:56.889 controller-1 VIM_Thread[1741049] WARNING _strategy_steps.py.1676 Alarm: 500.200
2021-10-21T18:59:56.889 controller-1 VIM_Thread[1741049] INFO _strategy_steps.py.2816 Step (query-kube-rootca-update) apply.
2021-10-21T18:59:56.909 controller-1 VIM_Thread[1741049] INFO nfvi_infrastructure_api.py.1318 No kube rootca update exists, num=0
2021-10-21T18:59:56.910 controller-1 VIM_Thread[1741049] INFO _strategy_stage.py.235 Stage (kube-rootca-update-query) cleanup called
2021-10-21T18:59:56.912 controller-1 VIM_Thread[1741049] INFO _strategy_phase.py.244 Phase (build) cleanup called
2021-10-21T18:59:56.916 controller-1 VIM_Thread[1741049] WARNING _strategy.py.2895 kube rootca update: Active alarms present [ 500.200 ]
2021-10-21T18:59:56.916 controller-1 VIM_Thread[1741049] WARNING _strategy.py.2607 Strategy Build Failed: active alarms present [ 500.200 ]
2021-10-21T18:59:56.917 controller-1 VIM_Thread[1741049] INFO _kube_rootca_update.py.77 Kubernetes root ca update strategy build complete.
2021-10-21T19:00:07.036 controller-1 VIM_Thread[1741049] INFO _vim_sw_update_api_events.py.193 Apply sw-update strategy: (kube-rootca-update) called.
2021-10-21T19:00:07.039 controller-1 VIM_Event-Log_Thread[1741294] INFO fm.py.517 Generated customer log, fm_uuid=bfcec5eb-836d-4cbf-ad96-a1c8e5fee5d3.
2021-10-21T19:00:07.040 controller-1 VIM_Thread[1741049] INFO _vim_sw_update_api_events.py.173 Apply sw-update strategy callback, uuid=53c9ff3c-288b-42e

Test Activity
-------------
Feature Testing

Workaround
----------
Manually clear the alarm

Revision history for this message
Al Bailey (albailey1974) wrote :

The bug is in the NFV code. The 2 types of alarms need to be added to the ignore list which the query alarms code runs.
There is no recovery from the 'expired' case, but it probably still makes sense to ignore that alarm, and let the sysinv health check report the failure.

Changed in starlingx:
assignee: nobody → Al Bailey (albailey1974)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nfv (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/nfv/+/815295

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nfv (master)

Reviewed: https://review.opendev.org/c/starlingx/nfv/+/815295
Committed: https://opendev.org/starlingx/nfv/commit/849c386e84911af0ac8d0df815f911347b39f8d6
Submitter: "Zuul (22348)"
Branch: master

commit 849c386e84911af0ac8d0df815f911347b39f8d6
Author: albailey <email address hidden>
Date: Mon Oct 25 07:50:13 2021 -0500

    Kube Root ca update orchestration must ignore cert alarms

    The 'Expiring' and 'Expired' alarms for the certificate need
    to be added to the ignored alarms list when creating an update
    strategy for kube rootca update orchestration.

    The expiring alarm is often used to indicate that an orchestration
    is needed.
    The expired alarm is added for completeness, even though there is
    no automated recovery from it.

    Test Plan:
      Tested strategy can be created/applied when expiring alarm reported.

    Closes-Bug: 1948673
    Signed-off-by: albailey <email address hidden>
    Change-Id: I5348a0011f52db7bf9872ba0de13a8802688929f

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.6.0 stx.nfv
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.