kube-upgrade-networking fails for k8s 1.18.1 -> 1.19.13

Bug #1942351 reported by Steven Webster
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Steven Webster

Bug Description

Brief Description
-----------------
In the ongoing effort to refresh the containerization components in StarlingX:

https://storyboard.openstack.org/#!/story/2008972

Recent commits have been committed to support the upgrade of kubernetes from 1.18.1 -> 1.19.13 -> 1.20.9 -> 1.21.3.

Specifically, the networking container images will be upgraded along with the k8s 1.19.13 version.

See commit:

https://opendev.org/starlingx/ansible-playbooks/commit/a9a409cca71637dfc77374813887cbd5f5396473

An issue has been encountered when running the system kube-upgrade-networking step because of updates to the SR-IOV cni and device plugin .yaml templates needed by the refreshed images.

When running this step, ansible will fail with the following error while doing the rolling update to these images:

TASK [common/upgrade-k8s-networking : Update SRIOV Networking] *****************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "apply", "-f", "/etc/kubernetes/update_sriov-cni.yaml"], "delta": "0:00:00.212229", "end": "2021-08-31 10:27:43.270023", "msg": "non-zero return code", "rc": 1, "start": "2021-08-31 10:27:43.057794", "stderr": "The DaemonSet \"kube-sriov-cni-ds-amd64\" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"name\":\"sriov-cni\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable", "stderr_lines": ["The DaemonSet \"kube-sriov-cni-ds-amd64\" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"name\":\"sriov-cni\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable"], "stdout": "", "stdout_lines": []}

PLAY RECAP *********************************************************************
localhost : ok=64 changed=15 unreachable=0 failed=1

This is because the newer templates only have a name: label in the match selector, while the older templates did not specify a name label.

The new templates should be fixed up to match the labels applied to the older images in order to enable the rolling update. It's possible that we can add new templates to the k8s 1.20.9 which will align with the 'name' match selector (since this label will be applied via the k8s 1.19.13 templates), but the first priority should be to enable the k8s 1.18 -> k8s 1.19 upgrade path.

Severity
--------
Critical: System/Feature is not usable due to the defect

Steps to Reproduce
------------------
At the time of writing, the load used is a developer build from 2021-08-31

With the following cherry-picked WIP commits:

https://review.opendev.org/c/starlingx/config/+/805311
https://review.opendev.org/c/starlingx/integ/+/805448

system kube-upgrade-start v1.19.13
system kube-upgrade-download-images

*** there's still a bug here with a config-out-of-date issue, so this step will eventually fail. Lock and unlock the host and run it again:

system kube-upgrade-download-images
system kube-upgrade-networking

Expected Behavior
------------------
Write down what was expected after taking the steps written above

Actual Behavior
----------------
We should get to a state of 'upgraded-networking' (system kube-upgrade-show) after the system kube-upgrade-networking step.

Reproducibility
---------------
100%

System Configuration
--------------------
AIO-SX

Branch/Pull Time/Commit
-----------------------
master 2021-08-31

With the following cherry-picked WIP commits:

https://review.opendev.org/c/starlingx/config/+/805311
https://review.opendev.org/c/starlingx/integ/+/805448

Last Pass
---------
N/A we are still working through the story

Timestamp/Logs
--------------
TASK [common/upgrade-k8s-networking : Update SRIOV Networking] *****************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["kubectl", "--kubeconfig=/etc/kubernetes/admin.conf", "apply", "-f", "/etc/kubernetes/update_sriov-cni.yaml"], "delta": "0:00:00.212229", "end": "2021-08-31 10:27:43.270023", "msg": "non-zero return code", "rc": 1, "start": "2021-08-31 10:27:43.057794", "stderr": "The DaemonSet \"kube-sriov-cni-ds-amd64\" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"name\":\"sriov-cni\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable", "stderr_lines": ["The DaemonSet \"kube-sriov-cni-ds-amd64\" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"name\":\"sriov-cni\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable"], "stdout": "", "stdout_lines": []}

PLAY RECAP *********************************************************************
localhost : ok=64 changed=15 unreachable=0 failed=1

Test Activity
-------------
Developer testing

Workaround
----------
Change the following files to change the match labels as follows:

- name: sriov-device-plugin
+ tier: node
+ app: sriovdp

/usr/share/ansible/stx-ansible/playbooks/roles/bootstrap/bringup-essential-services/templates/k8s-v1.19.13/sriov-cni.yaml.j2
/usr/share/ansible/stx-ansible/playbooks/roles/bootstrap/bringup-essential-services/templates/k8s-v1.19.13/sriov-plugin.yaml.j2

Changed in starlingx:
assignee: nobody → Steven Webster (swebster-wr)
status: New → In Progress
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

screening: stx.6.0 / medium - issue tied to in-progress stx.6.0 feature: https://storyboard.openstack.org/#!/story/2008972

Changed in starlingx:
importance: Critical → Medium
tags: added: stx.6.0 stx.containers
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/807135
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/6083b7be80d7d6f28ce7d9c9810deaba73351b41
Submitter: "Zuul (22348)"
Branch: master

commit 6083b7be80d7d6f28ce7d9c9810deaba73351b41
Author: Steven Webster <email address hidden>
Date: Wed Sep 1 14:58:14 2021 -0400

    Fix k8s 1.19 match selectors for SR-IOV images

    When performing a k8s upgrade from 1.18.1 to 1.19.13, the
    networking images will be upgraded as well. As part of this
    upgrade, the respective kubernetes spec templates which
    align with the new images will be used.

    An issue has been seen in which the rolling upgrade of the
    SR-IOV daemonsets fail because the latest templates specify
    a 'name' as the matchLabel selector. However, the older spec
    uses 'app' and 'tier' as the match selector.

    I believe that as of apps/v1, the selector label is still
    immutable for controllers such as daemonsets that have
    already been deployed.

    Some discussion on the topic can be found here:

    https://github.com/kubernetes/kubernetes/issues/50808

    For now, we'll just carry forward the 1.18 match selector.
    It's possible this can be fixed in later k8s releases.

    Story: 2008972
    Closes-Bug: 1942351

    Signed-off-by: Steven Webster <email address hidden>
    Change-Id: Id0ca32038dc2897879786a17f9794515457cd837

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.