During upgrade, containerd config contains stale sandbox image

Bug #2044160 reported by Joshua Kraitberg
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Joshua Kraitberg

Bug Description

Brief Description
-----------------
Upgrade SX failure with error "[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests".

Severity
--------
Critical

Steps to Reproduce
------------------
Upgrade from stx5 to stx6 to stx8

Expected Behavior
------------------
Pass

Actual Behavior
----------------
Fail during Kubernetes bring up on stx8

Reproducibility
---------------
100%

System Configuration
--------------------
AIO-SX

Branch/Pull Time/Commit
-----------------------
11-21-2023

Last Pass
---------
Never

Timestamp/Logs
--------------
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
    [kubelet-check] Initial timeout of 40s passed.

            Unfortunately, an error has occurred:
                    timed out waiting for the condition

            This error is likely caused by:
                    - The kubelet is not running
                    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

            If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
                    - 'systemctl status kubelet'
                    - 'journalctl -xeu kubelet'

            Additionally, a control plane component may have crashed or exited when started by the container runtime.
            To troubleshoot, list all containers using your preferred container runtimes CLI.

            Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
                    - 'crictl --runtime-endpoint /var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
                    Once you have found the failing container, you can inspect its logs with:
                    - 'crictl --runtime-endpoint /var/run/containerd/containerd.sock logs CONTAINERID'
  stdout_lines: <omitted>
2023-11-20 02:50:32,950 p=4935 u=sysadmin n=ansible | PLAY RECAP ***********************************************************************************************************************************************************************************************************************************************

Test Activity
-------------
Regression Testing

Workaround
----------
Download sandbox image found in containerd config using additional_registry_images_list.

Changed in starlingx:
assignee: nobody → Joshua Kraitberg (jkraitbe-wr)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/901612
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/c7c4af733992d31347ada0962caefff251bbbd17
Submitter: "Zuul (22348)"
Branch: master

commit c7c4af733992d31347ada0962caefff251bbbd17
Author: Joshua Kraitberg <email address hidden>
Date: Tue Nov 21 12:09:13 2023 -0500

    Additional containerd migration during SX upgrades

    The containerd config contains a value for the sandbox image used
    by Kubernetes. This value needs to be updated during upgrades to ensure
    it is downloaded during Kubernetes bringup.

    TEST PLAN
    PASS: AIO-SX optimized upgrade, stx6 to stx8
    PASS: AIO-SX optimized upgrade, stx6 to stx8
      * After an stx5 to stx6 upgrade

    Closes-Bug: 2044160
    Signed-off-by: Joshua Kraitberg <email address hidden>
    Change-Id: Icc18269e663ee96368e7ace3a8cda331c7a080b3

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.update
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.