Changing OAM IP does not update apiserver SANs

Bug #1878451 reported by David Sullivan
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Andy

Bug Description

Brief Description
-----------------
When the bootstrap manifest is applied the system adds any OAM IP addresses to the apiserver's certificate SAN list. This is used for remote kubectl access. However when the OAM IP address is changed, these IP values are not updated. Without the correct values in apiserver cert remote access will fail.

Severity
--------
Major

Steps to Reproduce
------------------
Bring up a StarlingX system
Change the any of the OAM IP addresses

Expected Behavior
------------------
The new OAM IP address are present in the kubernetes API server certificate SAN list
eg:
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout
Certificate:
...
            X509v3 Subject Alternative Name:
                DNS:controller-0, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, IP Address:10.96.0.1, IP Address:192.168.205.1, IP Address:192.168.205.1, IP Address:192.168.205.1, IP Address:127.0.0.1, IP Address:128.224.x.x, IP Address:128.224.x.x, IP Address:128.224.x.x

Actual Behavior
----------------
The certificate is unchanged. The old values persist in the certificate SAN list.

Reproducibility
---------------
Reproducible

System Configuration
--------------------
All configurations

Branch/Pull Time/Commit
-----------------------
Any build that includes this commit: https://opendev.org/starlingx/ansible-playbooks/commit/208df05af590ab1cbdac16c94f65b29d4fac3e90

Last Pass
---------
NA

Timestamp/Logs
--------------
NA

Test Activity
-------------
Developer Testing

Workaround
----------
A work around may be possible by manually updating the kubeadm conf and regenerating the apiserver cert on all controllers.

Revision history for this message
David Sullivan (dsullivanwr) wrote :

Note this is behavior that was missed as part of this change/bug
https://bugs.launchpad.net/starlingx/+bug/1863798

Ghada Khalil (gkhalil)
tags: added: stx.4.0 stx.config
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → David Sullivan (dsullivanwr)
Revision history for this message
Yatindra Shashi (yshashi) wrote :

I also got this issue while changing OAM IP to 192.168.49.80 from 172.28.235.202.

********
2020-05-27 16:06:04.220 90332 ERROR sysinv.conductor.kube_app KubeAppApplyFailure: Deployment of application platform-integ-apps (1.0-8) failed: failed to download one or more image(s).
2020-05-27 16:06:04.220 90332 ERROR sysinv.conductor.kube_app
sysinv 2020-05-27 16:06:04.370 90332 ERROR sysinv.conductor.kube_app [-] Image registry.local:9001/docker.io/starlingx/ceph-config-helper:v1.15.0 download failed from local registry: 500 Server Error: Internal Server Error ("Get https://registry.local:9001/v2/docker.io/starlingx/ceph-config-helper/manifests/v1.15.0: Get https://192.168.49.80:9002/token/?account=admin&scope=repository%3Adocker.io%2Fstarlingx%2Fceph-config-helper%3Apull&service=192.168.204.1%3A9001: x509: certificate is valid for 192.168.204.1, 172.28.235.202, not 192.168.49.80"): APIError: 500 Server Error: Internal Server Error ("Get https://registry.local:9001/v2/docker.io/starlingx/ceph-config-helper/manifests/v1.15.0: Get https://192.168.49.80:9002/token/?account=admin&scope=repository%3Adocker.io%2Fstarlingx%2Fceph-config-helper%3Apull&service=192.168.204.1%3A9001: x509: certificate is valid for 192.168.204.1, 172.28.235.202, not 192.168.49.80")
sysinv 2020-05-27 16:06:04.375 90332 ERROR sysinv.conductor.kube_app [-] Application apply aborted!.: KubeAppApplyFailure: Deployment of application platform-integ-apps (1.0-8) failed: failed to download one or more image(s).
sysinv 2020-05-27 16:06:04.376 90332 INFO sysinv.conductor.kube_app [-] Deregister the abort status of app platform-integ-apps
sysinv 2020-05-27 16:06:04.376 90332 ERROR sysinv.openstack.common.rpc.amqp [-] Exception during message handling: KubeAppApplyFailure: Deployment of application platform-integ-apps (1.0-8) failed: failed to download one or more image(s).

*******

Is there any fixes on it.
How can I do the workaround of manually updating the kubeadm conf and regenerating the apiserver cert on all controllers

Revision history for this message
Yatindra Shashi (yshashi) wrote :

But my $openssl x509 -in /etc/kubernetes/pki/apiserver.crt -text -noout
Certificate:
...

X509v3 extensions:
            X509v3 Key Usage: critical
                Digital Signature, Key Encipherment
            X509v3 Extended Key Usage:
                TLS Web Server Authentication
            X509v3 Subject Alternative Name:
                DNS:controller-0, DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster.local, IP Address:10.96.0.1, IP Address:192.168.206.1, IP Address:192.168.206.1, IP Address:127.0.0.1

Revision history for this message
David Sullivan (dsullivanwr) wrote :

This comment describes a possible solution
https://github.com/kubernetes/kubeadm/issues/1447#issuecomment-487045942
but I have not tested it.

The kubeadm conf would only need to be updated on the active controller. However the cert would need to be regenerated on all controllers (if this is a duplex system).

Revision history for this message
David Sullivan (dsullivanwr) wrote :

@yshashi
I think your issue is related to this https://bugs.launchpad.net/starlingx/+bug/1875891
or at least I believe it would be fixed by
https://review.opendev.org/#/c/725394/
While updating the apiserver SAN will help, the registry shouldn't be pointing to the OAM in the first place.
Can you retest with a recent load?

Revision history for this message
Yatindra Shashi (yshashi) wrote :

@david,

I changed the registry crt and key newly generated for the OAM IP as well and it worked now.

Generated cert and key thorugh openssl as below with the guidance from Sun Austin.
openssl req -x509 -sha256 -nodes -days 365 -newkey rsa:2048 -keyout /home/sysadmin/registry-cert.key -out /home/sysadmin/registry-cert.crt -config /home/sysadmin/regisry-cent-extfile.cnf

and changed on the following locations

/etc/docker/certs.d/registry.local\:9001/registry-cert.crt
/etc/ssl/private/registry-cert.key
/etc/ssl/private/registry-cert.crt

I had one setup of Stx 3.0 which I can not update and test with the new fixes at my client lab, so have not tested yet the fix that you have shared earlier.
Will test in my lab in future.

Ghada Khalil (gkhalil)
tags: added: stx.security
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Moving to stx.5.0 as the effort to fully address this issue is large and requires further investigation. This will be considered a limitation for stx.4.0

tags: added: stx.5.0
removed: stx.4.0
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: David Sullivan (dsullivanwr) → Andy (andy.wrs)
Andy (andy.wrs)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/751891

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/751892

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/751891
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=62a5358753c277886f50ede1e14e424239e66914
Submitter: Zuul
Branch: master

commit 62a5358753c277886f50ede1e14e424239e66914
Author: Andy Ning <email address hidden>
Date: Mon Sep 14 13:56:17 2020 -0400

    Update apiserver certificate's SANs when OAM IP change

    When the bootstrap manifest is applied the system adds any OAM IP
    addresses to the apiserver's certificate SAN list. This is used for
    remote kubectl access. However when the OAM IP address is changed,
    these IP values are not updated. Without the correct values in
    apiserver cert remote access will fail.

    This update introduces a kubernetes certsans runtime puppet manifest
    which will be applied during OAM IP change process to update apiserver's
    cert SANs list with the new IPs.

    Change-Id: Iedf35ddaedef5cae2e81941446fc6a8de39639f6
    Closes-Bug: 1878451
    Signed-off-by: Andy Ning <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/751892
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=5c5a6d4acbd0048c3e364d87926723b7881f9bf3
Submitter: Zuul
Branch: master

commit 5c5a6d4acbd0048c3e364d87926723b7881f9bf3
Author: Andy Ning <email address hidden>
Date: Mon Sep 14 14:06:57 2020 -0400

    Apply a runtime manifest to update apiserver certSANs

    When the bootstrap manifest is applied the system adds any OAM IP
    addresses to the apiserver's certificate SAN list. This is used for
    remote kubectl access. However when the OAM IP address is changed,
    these IP values are not updated. Without the correct values in
    apiserver cert remote access will fail.

    This change makes sysinv to apply the kubernetes certsans runtime
    puppet manifest during OAM IP change process to update apiserver's
    cert SANs list with the new IPs.

    Change-Id: I48eaf4bc3128c0c63591b77ceae69c7db0ea88ab
    Depends-On: https://review.opendev.org/#/c/751891/
    Closes-Bug: 1878451
    Signed-off-by: Andy Ning <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/762919

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.