AIO-DX kube upgrade from 1.21 to 1.22 fails on second control plane

Bug #2018247 reported by Chris Friesen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Chris Friesen

Bug Description

Performing an AIO-DX kube upgrade (using the vim) of 1.21 to 1.22 fails when it gets to the second control plane.

From sysinv.log:

2023-04-28 14:22:45.975 64199 ERROR sysinv.conductor.manager
sysinv 2023-04-28 14:22:45.980 64199 ERROR zerorpc.core [-] : sysinv.common.exception.SysinvException: Failed to generate bootstrap token
2023-04-28 14:22:45.980 64199 ERROR zerorpc.core Traceback (most recent call last):
2023-04-28 14:22:45.980 64199 ERROR zerorpc.core File "/usr/lib/python3/dist-packages/sysinv/puppet/kubernetes.py", line 288, in _get_kubernetes_join_cmd
2023-04-28 14:22:45.980 64199 ERROR zerorpc.core subprocess.check_call(cmd) # pylint: disable=not-callable
2023-04-28 14:22:45.980 64199 ERROR zerorpc.core File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
2023-04-28 14:22:45.980 64199 ERROR zerorpc.core raise CalledProcessError(retcode, cmd)
2023-04-28 14:22:45.980 64199 ERROR zerorpc.core subprocess.CalledProcessError: Command '['kubeadm', 'init', 'phase', 'upload-certs', '--upload-certs', '--config', '/tmp/tmpwm_5zfve.yaml']' returned non-zero exit status 1.

The kube command reports this output:
sudo kubeadm init phase upload-certs --upload-certs --config /tmp/tmpwm_5zfve.yaml

W0428 14:30:54.462803 3402454 strict.go:47] unknown configuration schema.GroupVersionKind
{Group:"kubeadm.k8s.io", Version:"v1beta3", Kind:"ClusterConfiguration"}

Chris Friesen (cbf123)
Changed in starlingx:
assignee: nobody → Chris Friesen (cbf123)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/config/+/881922

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/881922
Committed: https://opendev.org/starlingx/config/commit/1b93ed19411c77817602ac7571f470dca2ea8eea
Submitter: "Zuul (22348)"
Branch: master

commit 1b93ed19411c77817602ac7571f470dca2ea8eea
Author: Chris Friesen <email address hidden>
Date: Fri Apr 28 20:54:00 2023 -0600

    use new kubeadm version when uploading certs

    We've seen a case in the current dev branch where during a K8s upgrade
    from 1.21 to 1.22 something causes the ClusterConfiguration in the
    kubeadm-config configmap to be stored with the v1beta3 version. This
    version is not understood by the 1.21 version of kubeadm, so it causes
    problems when we try to upgrade the second controller node to K8s 1.22.

    To fix it, we need to ensure that when we upload certs during a K8s
    upgrade we use the version of kubeadm that we are upgrading to for
    the operation.

    It seems that this call is sensitive to the kubeadm version (while
    some others aren't) because we're passing in the complete kubeadm
    config file and so it needs to parse the ClusterConfiguration version.

    TEST PLAN:
    PASS: Upgrade AIO-DX lab from K8s 1.21 to 1.22
    PASS: Clean install of AIO-DX lab with change manually applied
          before initial run of ansible playbook.

    Closes-Bug: 2018247

    Change-Id: Ie6ecf3f33335bd2cc09d028c776f92d2d302b110
    Signed-off-by: Chris Friesen <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.containers stx.update
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.