Brief Description
During geo-redundancy testing we are seeing the cert-mon watchers fire and be handled by the standby controller.
The watchers are kubernetes 'watch' entities that fire an event when the cert-mon/cert-manager secrets which hold the certificates are changed. Because these watches are firing, we are handling them on the standby system controller. They need to be ignored on the standby.
Severity
Major
Steps to Reproduce
This is a geo-redundant system. The easiest way to see this is to restart the cert-mon service.
Expected Behavior
The cert-mon watches should be ignored for any subcloud that is in non-valid deploy-status
Actual Behavior
The deploy-status is ignored when handling cert-mon watch events. The system controller attempts to handle the DC_Certwatcher events for subclouds that are standby on this system.
Reproducibility
Reproducible
System Configuration
Geo-redundant configuration
Load info (eg: 2022-03-10_20-00-07)
Master load
Last Pass
New
Logs
2024-03-28T19:20:07.952 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.watcher [-] DCIntermediateCertRenew check_filter[c336b7f8285f4b909fc980fda57ebfb2]: subcloud is not online
2024-03-28T19:20:07.952 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.utils [-] api_cmd http://[2620:10a:a001:df0::2]:8119/v1.0/subclouds/c367e92af29844f28057792a61de17b3
2024-03-28T19:20:08.232 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.watcher [-] DCIntermediateCertRenew check_filter[c367e92af29844f28057792a61de17b3]: subcloud is not online
2024-03-28T19:20:08.232 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.utils [-] api_cmd http://[2620:10a:a001:df0::2]:8119/v1.0/subclouds/c36e4d851c5b4efd88e0e63de3c74768
2024-03-28T19:20:08.437 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.watcher [-] DCIntermediateCertRenew check_filter[c36e4d851c5b4efd88e0e63de3c74768]: subcloud is not online
2024-03-28T19:20:08.438 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.utils [-] api_cmd http://[2620:10a:a001:df0::2]:8119/v1.0/subclouds/c36f5dad82934246901108c9d1559dd2
2024-03-28T19:20:08.646 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.watcher [-] DCIntermediateCertRenew do_action: action EXISTING (c36f5dad82934246901108c9d1559dd2-adminep-ca-certificate)
hash: ca_crt: 8286e79fb954b8cf9df8ea973458ae89 tls_crt 702b356a74ed896b3f709931dc238fb5 tls_key c5a4e60f7020ea584e00ab4f54404354
created at 2024-03-13 16:29:43 last operation Apply last update at 2024-03-13 16:29:43
2024-03-28T19:20:08.646 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.watcher [-] update_certificate: subcloud c36f5dad82934246901108c9d1559dd2 action EXISTING (c36f5dad82934246901108c9d1559dd2-adminep-ca-certificate)
hash: ca_crt: 8286e79fb954b8cf9df8ea973458ae89 tls_crt 702b356a74ed896b3f709931dc238fb5 tls_key c5a4e60f7020ea584e00ab4f54404354
created at 2024-03-13 16:29:43 last operation Apply last update at 2024-03-13 16:29:43
2024-03-28T19:20:09.134 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.utils [-] Update c36f5dad82934246901108c9d1559dd2 intermediate CA cert request succeed
2024-03-28T19:20:09.134 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.subcloud_audit_queue [-] Enqueued: SubcloudAuditData: {name: c36f5dad82934246901108c9d1559dd2, audit_count: 1}
2024-03-28T19:20:09.135 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.utils [-] api_cmd http://[2620:10a:a001:df0::2]:8119/v1.0/subclouds/c38a5a8dbe0343bba7991a9997493688
2024-03-28T19:20:09.336 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.watcher [-] DCIntermediateCertRenew check_filter[c38a5a8dbe0343bba7991a9997493688]: subcloud is not online
2024-03-28T19:20:09.337 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.utils [-] api_cmd http://[2620:10a:a001:df0::2]:8119/v1.0/subclouds/c3900b6586c749fc9af2e18969971ce2
2024-03-28T19:20:09.439 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.certificate_mon_manager [-] Auditing subcloud c31754a0a12040a396b170d65770f6f4, attempt #1 [qsize: 1]
2024-03-28T19:20:09.439 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.utils [-] api_cmd http://[2620:10a:a001:df0::2]:8119/v1.0/subclouds/c31754a0a12040a396b170d65770f6f4
2024-03-28T19:20:09.440 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.certificate_mon_manager [-] Auditing subcloud c36f5dad82934246901108c9d1559dd2, attempt #1 [qsize: 0]
2024-03-28T19:20:09.440 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.utils [-] api_cmd http://[2620:10a:a001:df0::2]:8119/v1.0/subclouds/c36f5dad82934246901108c9d1559dd2
2024-03-28T19:20:09.828 controller-0 cert-mon: info 3658130 INFO sysinv.cert_mon.watcher [-] DCIntermediateCertRenew check_filter[c3900b6586c749fc9af2e18969971ce2]: subcloud is not online
Test Activity
Scale testing
Workaround
n/a
Reviewed: https:/ /review. opendev. org/c/starlingx /config/ +/914907 /opendev. org/starlingx/ config/ commit/ 03443ef16c0c47d 15631eb9001b413 a3b8ea39fc
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit 03443ef16c0c47d 15631eb9001b413 a3b8ea39fc
Author: Kyle MacLeod <email address hidden>
Date: Tue Apr 2 11:52:39 2024 -0400
Filter cert-mon for geo-redundancy in audit and DC_CertWatcher
This commit adds a filter for querying all subclouds from dcmanager, to
account for secondary subclouds that should not be audited by cert-mon
for this system controller. The filter is performed against a list of
invalid deploy states that should be considered when querying
the list of subcloud from dcmanager.
Likewise, the DC_CertWatcher -> DCIntermediateC ertRenew flow must ensure
that subclouds which are secondary to this system controller are ignored
by the kubernetes watch in place for the DC intermediate cert renewal
detection. Subclouds are filtered by the watch based on their online
state and their deploy-status. A subcloud with invalid deploy state is
ignored by this system controller.
Test Cases
PASS: ertRenew watch fires are
- Trigger audits on service restart. Verify that offline/secondary
subclouds are excluded.
- Ensure full daily audit is executed. Verify that all subclouds
belonging to this system controller are audited. Secondary subclouds
are not audited.
- Verify that DC_CertWatcher -> DCIntermediateC
ignored for offline and/or invalid deploy state
Closes-Bug: 2060068
Change-Id: Iffe3d7c76db8d2 f17aed0bfebc792 af0f9d75ca2
Signed-off-by: Kyle MacLeod <email address hidden>