Subcloud is offline for system with k8s certificates at expiry date
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Carmen Rata |
Bug Description
Subcloud with valid certs (show certs output), valid routes (subcloud/
Severity
Minor
Steps to Reproduce
Check subcloud status if SystemController has uptime of more than a year,
e.g. uptime_days => 382, which is above the 1 year k8s certificates expiry date.
Expected Behavior
Subcloud is in "online" state. The dcmanager audit should be successful.
K8s certificates have been rotated.
Actual Behavior
dcmanager receives 401 and fails the periodic subcloud audit.
2022-11-04 08:02:51.942 433961 WARNING cgtsclient.
Reproducibility
Happened once.
System Configuration
Distributed cloud.
Logs
The status update stopped for the subclouds after below 401 error messages started to appear in audit.log
1st - 401 error message:
2022-11-04 03:41:07.685 433961 INFO dcmanager.
2022-11-04 03:41:12.503 433961 WARNING cgtsclient.
2022-11-04 03:41:12.504 433961 ERROR dcmanager.
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict(
HTTP response body: {"kind"
sysinv.log
sysinv 2022-11-04 03:41:12.499 3494410 ERROR wsme.api [-] Server-side error: "(401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict(
HTTP response body: {"kind"
audit.log
2022-11-04 00:49:49.603 434242 INFO dcmanager.
dcmanager.log
2022-11-05 11:00:59.921 3497389 INFO dcmanager.
Ignoring subcloud sync_status update for subcloud,
availability:
keystone-all.log
2022-11-06 04:41:48.921 2809622 WARNING keystone.
Failed to validate token: TokenNotFound: Failed to validate token
2022-11-06 04:41:48.924 432835 WARNING keystonemiddlew
2022-11-06 04:41:49.486 2809622 WARNING keystone.
Failed to validate token: TokenNotFound: Failed to validate token
Alarms
Critical alarm: Subcloud is offline.
Workaround
Perform a host swact.
Changed in starlingx: | |
assignee: | nobody → Carmen Rata (crata) |
summary: |
- Subcloud is offline for system with certificates at expiry date + Subcloud is offline for system with k8s certificates at expiry date |
tags: | added: stx.security |
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.8.0 |
Fix proposed to branch: master /review. opendev. org/c/starlingx /config/ +/869782
Review: https:/