sysinv error handling in kubelet upgrade needs improving

Bug #1949515 reported by Chris Friesen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Chris Friesen

Bug Description

In the sysinv code in kube_upgrade_kubelet() there are some while loops which increment the time by a given interval every loop. This essentially assumes that the code running in the loop is infinitely fast and cannot block, which isn't true since we can be delayed waiting for a response from the Kubernetes API.

Also, if we take an exception (due to a timeout for example) while waiting for a response from the Kubernetes API then we bypass the code that sets the status to KUBE_HOST_UPGRADING_KUBELET_FAILED, which leaves us in an invalid state.

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/c/starlingx/config/+/816299
Committed: https://opendev.org/starlingx/config/commit/311c137bb872f3662a333b7f1f3d18a6db73c4f0
Submitter: "Zuul (22348)"
Branch: master

commit 311c137bb872f3662a333b7f1f3d18a6db73c4f0
Author: Chris Friesen <email address hidden>
Date: Mon Nov 1 23:34:38 2021 -0600

    improve kubelet upgrade error handling

    When upgrading kubelet there are a couple of while loops with
    expiry. Change them to actually check the time instead of assuming
    the loop is instantaneous.

    Also, if we take an exception while querying the K8s versions we
    need to handle the exception in order to ensure that we properly
    set the KUBE_HOST_UPGRADING_KUBELET_FAILED state so that the
    kubelet upgrade can be retried.

    Partial-bug: 1949515
    Change-Id: Ic7df945488386bbad8a73dfe574843113993176b
    Signed-off-by: Chris Friesen <email address hidden>

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Chris Friesen (cbf123)
importance: Undecided → Low
tags: added: stx.6.0
Chris Friesen (cbf123)
Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.