On AIO-SX when doing multi-version K8s upgrade class pre_pull_control_plane_images is pre-pulling the wrong image versions

Bug #2044492 reported by Chris Friesen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Chris Friesen

Bug Description

Brief Description

When doing a multi-version K8s upgrade on AIO-SX, the Puppet class platform::kubernetes::pre_pull_control_plane_images is pre-pulling the images for the final K8s version rather than the immediate next K8s version. Most of the time this isn't a problem because we have already populated the cache as part of the ansible playbook, but in the event that the image filesystem is full and kubelet starts garbage-collecting the images we could drop them from the cache, at which point we're relying on this puppet class to ensure that they're present when needed.

Severity

Minor

Steps to Reproduce

Do multi-version K8s upgrade on AIO-SX

Expected Behavior

Puppet class platform::kubernetes::upgrade_first_control_plane we will call class platform::kubernetes::pre_pull_control_plane_images, which will ensure that the control plane images for the next version of K8s are pre-pulled before they are needed.

Actual Behavior

platform::kubernetes::pre_pull_control_plane_images pre-pulls the images for the final version of K8s in the multi-version upgrade.

Reproducibility

Reproducible

System Configuration

AIO-SX

Load info (eg: 2022-03-10_20-00-07)

current dev branch

Last Pass

long-standing bug

Timestamp/Logs

N/A

Alarms

N/A

Test Activity

Developer Testing

Workaround

Manually pull image with crictl.

Chris Friesen (cbf123)
Changed in starlingx:
assignee: nobody → Chris Friesen (cbf123)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/901778
Committed: https://opendev.org/starlingx/stx-puppet/commit/4636be19a498eca43f028c79cb2b652ad552d26f
Submitter: "Zuul (22348)"
Branch: master

commit 4636be19a498eca43f028c79cb2b652ad552d26f
Author: Chris Friesen <email address hidden>
Date: Thu Nov 23 15:04:29 2023 -0600

    disable image gc when doing k8s upgrade

    Static pods cannot use image pull secrets, so it's important that the
    control plane images are not garbage-collected while we're doing a
    Kubernetes upgrade otherwise the upgrade can fail.

    Accordingly we want to disable garbage-collecting the images, then
    pre-pull the new images, then do the actual K8s upgrade, then re-enable
    image garbage collection.

    Also included are a couple of fixes for places where we were using
    subtly incorrect versions when retrieving the image list as part of
    a multi-version upgrade.

    TEST-PLAN:
    PASS: Perform multi-version K8s upgrade on AIO-SX, ensure upgrade
          passes and image garbage collection is disabled during the
          upgrade and re-enabled when kubelet gets upgraded to the
          final version.

    PASS: Perform single-verison K8s upgrade on AIO-SX, ensure upgrade
          passes and image garbage collection is disabled during the
          upgrade and re-enabled when kubelet gets upgraded.

    PASS: Perform single-version K8s upgrade on Standard lab, ensure
          upgrade passes and image garbage collection is disabled on
          each node during the upgrade and re-enabled when kubelet is
          upgraded.

    Closes-Bug: 2044492
    Partial-Bug: 2044493

    Change-Id: I358ae922e5c2c5c047806a1e6773b1d23a74cbd0
    Signed-off-by: Chris Friesen <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.containers stx.update
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.