unable to bring up second controller due to "kubeadm join" error

Bug #2017146 reported by Chris Friesen
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Critical
Chris Friesen

Bug Description

When attempting to bring up the second controller, puppet shows the following:

// Puppet logs on controller-1
2023-04-18T05:22:22.484 ESC[mNotice: 2023-04-18 05:22:22 +0000 /Stage[main]/Platform::Kubernetes::Master::Init/Exec[configure master node]/returns: [download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" NamespaceESC[0m

2023-04-18T05:22:22.487 ESC[mNotice: 2023-04-18 05:22:22 +0000 /Stage[main]/Platform::Kubernetes::Master::Init/Exec[configure master node]/returns: error execution phase control-plane-prepare/download-certs: error downloading certs: the Secret does not include the required certificate or key - name: external-etcd.key, path: /etc/kubernetes/pki/apiserver-etcd-client.keyESC[0m

2023-04-18T05:22:22.489 ESC[mNotice: 2023-04-18 05:22:22 +0000 /Stage[main]/Platform::Kubernetes::Master::Init/Exec[configure master node]/returns: To see the stack trace of this error execute with --v=5 or higherESC[0m

2023-04-18T05:22:22.491 ESC[1;31mError: 2023-04-18 05:22:22 +0000 'kubeadm join 192.168.206.1:6443 --token 38in8f.l11j00vog0y2gvhf --discovery-token-ca-cert-hash sha256:285cc582818645874bf04422b3cb7cf6a6116d4cd166308cf1193dd333639188 --control-plane --certificate-key c613b828ca8e903e045005966fa12b2fd5be76e1f7f5578cc987d232d87ca821 --apiserver-advertise-address 192.168.206.3 --cri-socket /var/run/containerd/containerd.sock' returned 1 instead of one of [0]

Reverting the change from https://review.opendev.org/c/starlingx/config/+/880240 seems to resolve the issue.

Chris Friesen (cbf123)
Changed in starlingx:
assignee: nobody → Chris Friesen (cbf123)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/880897
Committed: https://opendev.org/starlingx/config/commit/ceb5852fcef1e643020f55b607dd6ad56a1c9ff2
Submitter: "Zuul (22348)"
Branch: master

commit ceb5852fcef1e643020f55b607dd6ad56a1c9ff2
Author: Chris Friesen <email address hidden>
Date: Wed Apr 19 18:45:01 2023 -0400

    fixup for kubeadm cert upload

    In commit 5c58f00c11 a change was made to use
    "kubeadm init phase upload-certs --upload-certs --certificate-key <key>"
    to upload the certs to a K8s Secret in order to work around an issue
    involving the YAML representation of IPv6 addresses.

    It turns out that when used in this way, kubeadm does not upload the
    external-etcd-ca.crt/external-etcd.crt/external-etcd.key entries to the
    K8s Secret. This breaks the install on multi-node labs.

    The fix is to revert this code back to the old way of doing it, but to
    call kubeadm_configmap_reformat() to reformat the ConfigMap if
    necessary prior to dumping it out. That way if it does contain IPv6
    addresses in the "wrong" YAML format, it will get corrected.

    Test Plan:
    PASSED: Install in AIO-DX virtualbox with IPv4.
    PASSED: Modified install in AIO-DX virtualbox with IPv6 address added
            to kubeadm-config ConfigMap before unlocking controller-1.

    Closes-Bug: 2017146
    Partial-Bug: 2016041
    Change-Id: I999a161e15a81a50085a1843cc80515b9af0f117
    Signed-off-by: Chris Friesen <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Critical
tags: added: stx.9.0 stx.containers
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.