system kube-rootca-update-complete does not clear the 900.008 alarm when 500.200 alarm raised

Bug #1949238 reported by Rafael Lucas Camargos
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Rafael Lucas Camargos

Bug Description

Brief Description
-----------------
system kube-rootca-update-complete does not clear the 900.008 alarm when 500.200 alarm raised

(500.200 alarm relates to Certificate kubernetes-root-ca is expiring soon ie. inside of one month from today - on 2021-11-26, 20:13:50

Severity
--------
Minor

Steps to Reproduce
------------------
Perform procedure to update the kubernetes-root-ca cert however the cert created was one that is valid but will expire inside of one month from today -

For example: where today is Oct 27th:

system kube-rootca-update-generate-cert --expiry-date="2021-11-26" --subject="C=CA ST=ON L=Ottawa O=Company OU=Blah CN=kubernetes"

Run through the system commands successfully to the point where you are completing the update:

system kube-rootca-update-complete

Expected Behavior
------------------
Expected alarm 500.002 alarm to be raised

Did not expect to see management affecting alarm (ie. 900.008 should have cleared)

$ system kube-rootca-update-complete
System is not healthy. Run system health-query for more details.

2021-10-27 20:47:25.134 1637808 INFO sysinv.api.controllers.v1.kube_rootca_update [-] Health query failure for kube-rootca-update-complete:
[sysadmin@controller-0 log(keystone_admin)]$ system health-query
System Health:
All hosts are provisioned: [OK]
All hosts are unlocked/enabled: [OK]
All hosts have current configurations: [OK]
All hosts are patch current: [OK]
Ceph Storage Healthy: [OK]
No alarms: [Fail]
[2] alarms found, [1] of which are management affecting
All kubernetes nodes are ready: [OK]
All kubernetes control plane pods are ready: [OK]

Actual Behavior
----------------
?????????

Reproducibility
---------------
100%

System Configuration
--------------------
Standard (IP 20-27)

Branch/Pull Time/Commit
-----------------------
2021-10-26_00-00-08

Last Pass
---------
N/A

Timestamp/Logs
--------------
sysinv 2021-10-19 23:40:04.848 330838 INFO sysinv.common.rest_api [-] GET cmd:http://192.168.204.1:5491/v1/query_hosts/ hdr:None payload:None
sysinv 2021-10-19 23:40:04.887 330838 INFO sysinv.common.rest_api [-] Response={u'data': [{u'sw_version': u'21.12', u'hostname': u'controller-0', u'nodetype': u'controller', u'patch_failed': False, u'allow_insvc_patching': True, u'ip': u'192.168.204.2', u'requires_reboot': False, u'installed': {}, u'state': u'idle', u'interim_state': False, u'secs_since_ack': 24, u'missing_pkgs': [], u'subfunctions': [u'controller', u'worker'], u'stale_details': False, u'to_remove': [], u'patch_current': True, u'duplicated_pkgs': {}}]}
sysinv 2021-10-19 23:40:06.704 330838 INFO ceph_client [-] Request params: url=https://controller-0:7999/request?wait=1, json={'prefix': 'status', 'format': 'json'}
sysinv 2021-10-19 23:40:06.718 330838 INFO ceph_client [-] Result: {u'waiting': [], u'has_failed': False, u'state': u'success', u'is_waiting': False, u'running': [], u'failed': [], u'finished': [{u'outb': u'{"fsid":"758115a2-3942-4780-b325-acbd84e4726d","health":{"checks":{},"status":"HEALTH_OK","overall_status":"HEALTH_WARN"},"election_epoch":3,"quorum":[0],"quorum_names":["controller-0"],"monmap":{"epoch":1,"fsid":"758115a2-3942-4780-b325-acbd84e4726d","modified":"2021-10-19 22:47:52.157901","created":"2021-10-19 22:47:52.157901","features":{"persistent":["kraken","luminous","mimic","osdmap-prune"],"optional":[]},"mons":[{"rank":0,"name":"controller-0","addr":"192.168.204.2:6789/0","public_addr":"192.168.204.2:6789/0"}]},"osdmap":{"osdmap":{"epoch":19,"num_osds":1,"num_up_osds":1,"num_in_osds":1,"full":false,"nearfull":false,"num_remapped_pgs":0}},"pgmap":{"pgs_by_state":[{"state_name":"active+clean","count":192}],"num_pgs":192,"num_pools":3,"num_objects":22,"data_bytes":2286,"bytes_used":112676864,"bytes_avail":9287786496,"bytes_total":9400463360},"fsmap":{"epoch":4,"id":1,"up":1,"in":1,"max":1,"by_rank":[{"filesystem_id":1,"rank":0,"name":"controller-0","status":"up:active"}]},"mgrmap":{"epoch":4,"active_gid":4133,"active_name":"controller-0","active_addr":"192.168.204.2:6800/104364","available":true,"standbys":[],"modules":["restful"],"available_modules":[{"name":"balancer","can_run":true,"error_string":""},{"name":"dashboard","can_run":false,"error_string":"Frontend assets not found: incomplete build?"},{"name":"hello","can_run":true,"error_string":""},{"name":"iostat","can_run":true,"error_string":""},{"name":"localpool","can_run":true,"error_string":""},{"name":"prometheus","can_run":true,"error_string":""},{"name":"restful","can_run":true,"error_string":""},{"name":"selftest","can_run":true,"error_string":""},{"name":"smart","can_run":true,"error_string":""},{"name":"status","can_run":true,"error_string":""},{"name":"telegraf","can_run":true,"error_string":""},{"name":"telemetry","can_run":true,"error_string":""},{"name":"zabbix","can_run":true,"error_string":""}],"services":{"restful":"https://controller-0:7999/"}},"servicemap":{"epoch":1,"modified":"0.000000","services":{}}}\n', u'outs': u'', u'command': u'status format=json'}], u'is_finished': True, u'id': u'139914071192016'}
sysinv 2021-10-19 23:40:06.902 331471 INFO sysinv.api.controllers.v1.kube_rootca_update [-] Health query failure for kube-rootca-update-complete:
System Health:
All hosts are provisioned: [OK]
All hosts are unlocked/enabled: [OK]
All hosts have current configurations: [OK]
All hosts are patch current: [OK]
Ceph Storage Healthy: [OK]
No alarms: [Fail]
[1] alarms found, [0] of which are management affecting
All kubernetes nodes are ready: [OK]
All kubernetes control plane pods are ready: [OK]
All kubernetes applications are in a valid state: [OK]
sysinv 2021-10-19 23:40:06.902 331471 WARNING wsme.api [-] Client-side error: System is not healthy. Run system health-query for more details.: ClientSideError: System is not healthy. Run system health-query for more details.

Test Activity
-------------
Feature Testing

Workaround
----------
N/A

Changed in starlingx:
assignee: nobody → Rafael Lucas Camargos (rcamargo)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/816069

Changed in starlingx:
status: New → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

screening: stx.6.0 / low - issue related to stx.6.0 feature: https://storyboard.openstack.org/#!/story/2008675

Changed in starlingx:
importance: Undecided → Low
tags: added: stx.6.0 stx.config stx.security
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/816069
Committed: https://opendev.org/starlingx/config/commit/947226b935c77b86464b80e189e302bc05feb380
Submitter: "Zuul (22348)"
Branch: master

commit 947226b935c77b86464b80e189e302bc05feb380
Author: Rafael Camargos <email address hidden>
Date: Fri Oct 29 18:18:07 2021 -0300

    Ignore 500.200 alarm on kube rootca update

    Generating a close-to-expiration certificate is a possible scenario of
    the rootca update procedure but it is not handled within the
    `kube-rootca-update-complete` command.

    This adds the 'certificate expiring soon' (500.200) alarm to the
    `kube-rootca-update-start` and `kube-rootca-update-complete` ignore list
    in order to allow starting and completing the update using a certificate
    that has an expiration date below the threshold.

    Note that it is still not possible starting the update if the rootca
    certificate has already expired. Same for generating an expired
    certificate during the update.

    Test Plan:

    PASS: Verify that the rootca update can be started if a certificate is
    expiring soon
    PASS: Verify that the rootca update can be completed after generating a
    certificate expiring soon

    Closes-Bug: 1949238
    Signed-off-by: Rafael Camargos <email address hidden>
    Change-Id: I241861890f56abd32b35e6e7b465cfd0515b75d9

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/819020

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/819020
Committed: https://opendev.org/starlingx/config/commit/2ef6c9524c5346547872959c68f7741a9c13b280
Submitter: "Zuul (22348)"
Branch: master

commit 2ef6c9524c5346547872959c68f7741a9c13b280
Author: Andy Ning <email address hidden>
Date: Tue Nov 23 13:40:59 2021 -0500

    Remove force option for k8s rootca update complete/abort

    Currently the k8s rootca update complete and abort will be rejected
    if system has alarms that are not ignored internally. "-f" option
    can be used to suppress none mgmt_affecting alarms, but mgmt_affecting
    alarms will still prevent complete and abort.

    Blocking complete and abort by alarms is not really neccessary since
    it won't help user to solve the alarms anyway, and at the same time
    causes confusion to the end users.

    This update remove alarm checking (and the -force option) from REST
    API and system CLI.

    Test Plan:
    PASS: rootca update complete while system has alarm
    PASS: rootca update abort at host-update trust-both-cas while system
          has alarms
    PASS: rootca update abort at host-update update-certs while system
          has alarms
    PASS: rootca update abort at host-update trust-new-ca while system
          has alarms
    PASS: rootca update abort at pods-update trust-both-cas while system
          has alarms

    Closes-Bug: 1949238
    Signed-off-by: Andy Ning <email address hidden>
    Change-Id: I82b922f39d185990704c591f02781d581822b162

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.