DC subclouds go out of sync after changing the admin-ep CA certificates and lock-unlocking controllers

Bug #1999438 reported by Andy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Andy

Bug Description

Brief Description
-----------------
Subclouds go out of sync after changing the admin-ep root CA certificates and lock-unlock procedure.

Severity
--------
Major: System/Feature is usable but degraded

Steps to Reproduce
------------------
1. Create root_renew.yaml on system controller to set root_cert expire in 300s

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: dc-adminep-root-ca-certificate
  namespace: dc-cert
spec:
  commonName: fd01:305::2
  duration: 43800h0m0s
  isCA: true
  issuerRef:
    kind: Issuer
    name: dc-selfsigning-issuer
  renewBefore: 43799h55m0s
  secretName: dc-adminep-root-ca-certificate
  subject:
    organizationalUnits:
    - StarlingX DC Root CA

2. Apply root_renew.yaml to Central controller to set root_cert expire in 300s
kubectl apply -f root_renew.yaml

3. Check if admin endpoint certificate is renewed on subcloud after 300s
 /etc/ssl/private/admin-ep-cert.pem

Expected Behavior
------------------
The subcloud's admin endpoint certificate is renewed.

Actual Behavior
----------------
The subcloud's admin endpoint certificate is NOT renewed. And after lock unlock subcloud1 controller, lock unlock central controller's standby controller, subcloud dc-cert_sync_status is in out-of-sync.

Reproducibility
---------------
Reproducible

System Configuration
--------------------
DC with subclouds.

Branch/Pull Time/Commit
-----------------------
STX master latest.

Last Pass
---------
First seen on Debian DC.

Timestamp/Logs
--------------
On system controller:

cert-mon.log:
-------------
2022-12-12T15:19:55.095 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.watcher [-] RootCARenew check_filter[dc-adminep-root-ca-certificate]: root ca certificate has changed. md5sum 4aa8e785c610bd03d681290efb6b2f4d
2022-12-12T15:19:55.095 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.watcher [-] RootCARenew do_action: action MODIFIED (dc-adminep-root-ca-certificate)
hash: ca_crt: 4aa8e785c610bd03d681290efb6b2f4d tls_crt 4aa8e785c610bd03d681290efb6b2f4d tls_key 12b70e13056f6df5f8974f5385eacfba
created at 2022-12-09 11:10:09 last operation Apply last update at 2022-12-12 15:19:55
2022-12-12T15:19:55.113 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.watcher [-] Secrets to be recreated ['dc-adminep-certificate', 'subcloud1-adminep-ca-certificate', 'subcloud2-adminep-ca-certificate']
2022-12-12T15:19:55.113 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.watcher [-] Recreate dc-cert:dc-adminep-certificate
2022-12-12T15:19:55.128 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.watcher [-] Recreate dc-cert:subcloud1-adminep-ca-certificate
2022-12-12T15:19:55.131 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.watcher [-] Recreate dc-cert:subcloud2-adminep-ca-certificate
2022-12-12T15:19:55.140 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.utils [-] api_cmd http://[fd01:305::2]:8119/v1.0/subclouds/subcloud1
2022-12-12T15:19:55.264 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.utils [-] api_cmd http://[fd01:305::2]:8119/v1.0/subclouds/subcloud2
2022-12-12T15:19:55.293 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.watcher [-] AdminEndpointRenew do_action: action ADDED (dc-adminep-certificate)
hash: ca_crt: 4aa8e785c610bd03d681290efb6b2f4d tls_crt 0d1584b69f263ad37ee4792d5ba5db06 tls_key 6e30ce2318573f94e31e18ca6a848915
created at 2022-12-12 15:19:55 last operation Apply last update at 2022-12-12 15:19:55
2022-12-12T15:19:59.103 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.utils [-] Update admin endpoint certificate request succeeded
2022-12-12T15:19:59.107 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.utils [-] api_cmd http://[fd01:305::2]:8119/v1.0/subclouds/subcloud2
2022-12-12T15:19:59.121 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.watcher [-] DCIntermediateCertRenew do_action: action ADDED (subcloud2-adminep-ca-certificate)
hash: ca_crt: 4aa8e785c610bd03d681290efb6b2f4d tls_crt 89ef432f9a26c23030917eeebe44362a tls_key a106cd7342fffec4e1bf21b8882a0e28
created at 2022-12-12 15:19:55 last operation Apply last update at 2022-12-12 15:19:55
2022-12-12T15:19:59.121 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.watcher [-] update_certificate: subcloud subcloud2 action ADDED (subcloud2-adminep-ca-certificate)
hash: ca_crt: 4aa8e785c610bd03d681290efb6b2f4d tls_crt 89ef432f9a26c23030917eeebe44362a tls_key a106cd7342fffec4e1bf21b8882a0e28
created at 2022-12-12 15:19:55 last operation Apply last update at 2022-12-12 15:19:55
2022-12-12T15:19:59.294 controller-1 cert-mon: err 86072 ERROR sysinv.cert_mon.watcher [-] DC_CertWatcher: monitor action in namespace=dc-cert failed: action ADDED (subcloud2-adminep-ca-certificate)
hash: ca_crt: 4aa8e785c610bd03d681290efb6b2f4d tls_crt 89ef432f9a26c23030917eeebe44362a tls_key a106cd7342fffec4e1bf21b8882a0e28
created at 2022-12-12 15:19:55 last operation Apply last update at 2022-12-12 15:19:55, HTTP Error 400: Bad Request: urllib.error.HTTPError: HTTP Error 400: Bad Request
2022-12-12T15:19:59.294 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.certificate_mon_manager [-] Purging reattempt monitor task for new reattempt: cert-update: subcloud2-adminep-ca-certificate
2022-12-12T15:19:59.299 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.utils [-] api_cmd http://[fd01:305::2]:8119/v1.0/subclouds/subcloud1
2022-12-12T15:19:59.314 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.watcher [-] DCIntermediateCertRenew do_action: action ADDED (subcloud1-adminep-ca-certificate)
hash: ca_crt: 4aa8e785c610bd03d681290efb6b2f4d tls_crt 3fe3ac5bce153423a4b96635f001a221 tls_key a5d92c4c68208a4869b547a37ffa41e2
created at 2022-12-12 15:19:55 last operation Apply last update at 2022-12-12 15:19:55
2022-12-12T15:19:59.314 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.watcher [-] update_certificate: subcloud subcloud1 action ADDED (subcloud1-adminep-ca-certificate)
hash: ca_crt: 4aa8e785c610bd03d681290efb6b2f4d tls_crt 3fe3ac5bce153423a4b96635f001a221 tls_key a5d92c4c68208a4869b547a37ffa41e2
created at 2022-12-12 15:19:55 last operation Apply last update at 2022-12-12 15:19:55
2022-12-12T15:19:59.558 controller-1 cert-mon: err 86072 ERROR sysinv.cert_mon.watcher [-] DC_CertWatcher: monitor action in namespace=dc-cert failed: action ADDED (subcloud1-adminep-ca-certificate)
hash: ca_crt: 4aa8e785c610bd03d681290efb6b2f4d tls_crt 3fe3ac5bce153423a4b96635f001a221 tls_key a5d92c4c68208a4869b547a37ffa41e2
created at 2022-12-12 15:19:55 last operation Apply last update at 2022-12-12 15:19:55, HTTP Error 400: Bad Request: urllib.error.HTTPError: HTTP Error 400: Bad Request
2022-12-12T15:19:59.558 controller-1 cert-mon: info 86072 INFO sysinv.cert_mon.certificate_mon_manager [-] Purging reattempt monitor task for new reattempt: cert-update: subcloud1-adminep-ca-certificate

Subcloud:

sysinv.log:
----------
19179 sysinv 2022-12-07 22:15:07.330 95662 INFO sysinv.common.utils [-] Provided ca cert is invalid
19180 -----BEGIN CERTIFICATE-----
19181 MIIDNzCCAh+gAwIBAgIRAI9mKztYVce4aNDOLys8+UswDQYJKoZIhvcNAQELBQAw
19182 NTEdMBsGA1UECxMUU3RhcmxpbmdYIERDIFJvb3QgQ0ExFDASBgNVBAMTC2ZkMDE6
19183 MzA1OjoyMB4XDTIyMTIwNzIyMTUwMloXDTI3MTIwNjIyMTUwMlowNTEdMBsGA1UE
19184 CxMUU3RhcmxpbmdYIERDIFJvb3QgQ0ExFDASBgNVBAMTC2ZkMDE6MzA1OjoyMIIB
19185 IjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA1Ac+2gg6re/CD5gQTfZRSazv
19186 /YgGeAUjyVNuFf+f4/+pIXe5pbMWwmrMZopbdhuu0qC87XTSQrnqf0ngHgepVwMp
19187 jI/za1idQqIL7n7XX6rfwqdW9vCgGJ7RLrI/1AjGOT4O5/f52Y3e98TS47taXkU0
19188 1k6B6lknS79P6UpwmRt7J1kiFPygg6oN+uFIWJQRvvJ6zg5BdZPN+zM0D5QpWUkZ
19189 k9jt/rfSAumpKDS+akbTRNhEcEWcREewp8IGM/2q0Q/Daw4Ng72KeFhqBKZDNeaV
19190 lS2dy11Myu6X084F7eKMgpQGKumGYPGf8VYzRY9G5ARxZcDz6rmlTDmZdlhwAwID
19191 AQABo0IwQDAOBgNVHQ8BAf8EBAMCAqQwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4E
19192 FgQUB9o31+rWYUcEbFc/NZv0LBCBGQAwDQYJKoZIhvcNAQELBQADggEBADFzIWIe
19193 VkrJGq9LFPm6mF/vRJJs4R6fJOxmxx0qG7ryly2f7MHot1Vbwb31xivil1be6nmy
19194 4c+T74wPOcLmmrYPHbhVPOrFse/8vsPyKKl8ne2CW8a/5zr4nmZXVreqmT9En9OZ
19195 4mahxnX8FDjmjhTG3+Hq2V2jCv4le+Sy8t3CwP60dWc9yGerecdp9gnpG2uwI9qe
19196 7zxPXG/kH2lH8rZ6TODEWzrE8u/vjbGuZk+ggEkEfEUwTV6yvh2Q/0vqIASC9mtM
19197 88o68enjs02HSdGCzlOtrSdpcoEk96rL0YJoYDDCx7KQ9btkuYWB2gENI95O4nFa
19198 EH/QneuqruXOKx0=
19199 -----END CERTIFICATE-----
19200 error stdin: verification failed
19201
19202 OU = StarlingX DC Root CA, CN = fd01:305::2
19203 error 18 at 0 depth lookup: self signed certificate
19204 sysinv 2022-12-07 22:15:07.330 95662 WARNING wsme.api [-] Client-side error: Provided CA cert is invalid: wsme.exc.ClientSideError: Provided CA cert is invalid

Test Activity
-------------
Developer Testing

Workaround
----------
Manually update dc root CA cert on subcloud.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/867517

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/867517
Committed: https://opendev.org/starlingx/config/commit/922fc973a1483c6aed7f8341a7c9a247eb378082
Submitter: "Zuul (22348)"
Branch: master

commit 922fc973a1483c6aed7f8341a7c9a247eb378082
Author: Andy Ning <email address hidden>
Date: Mon Dec 12 16:26:34 2022 -0500

    Fix admin endpoint root CA verification failure

    In DC system, admin endpoint root CA certificate renewal on System
    Controller will trigger subcloud intermediate CA cert and admin
    endpoint cert renewal. During the renewals on subcloud, sysinv API
    will verify the new root CA cert. But the current verification
    algorithm is failing, because no certs in the subcloud can be used
    to verify the self-signed root CA cert.

    This change updated the algorithm to just verify by itself. Since
    the renewal is done over existing HTTPS, the verification is
    sufficient.

    Test Plan:
    PASS: DC admin endpoint root CA renewal is successful,
          dc-cert_sync_status is in in-sync state.
    PASS: Lock/unlock controllers of Central Cloud,
          dc-cert_sync_status is in in-sync state.
    PASS: Lock/unlock controllers of Subcloud,
          dc-cert_sync_status is in in-sync state.

    Closes-Bug: 1999438
    Signed-off-by: Andy Ning <email address hidden>
    Change-Id: Id5c316849cd90cbc2fa44265bcb6658341460132

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Andy (andy.wrs)
importance: Undecided → Medium
tags: added: stx.8.0 stx.distcloud stx.security
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.