controller-0 is degraded due to the failure of its isolcpu_plugin

Bug #2041686 reported by Gleb Aronsky
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Gleb Aronsky

Bug Description

Brief Description
-----------------

when we perform multi k8s upgrade using orchestration method for the first k8s control-plane and kubelet it will upgrade successfully but when it continues to upgrade next k8s kubelet it will fail due to 200.006 alarm.

Severity
--------
Major

Steps to Reproduce
------------------
Upgrade kublet as part of the kubernets upgarde.

Expected Behavior
------------------
Write down what was expected after taking the steps written above

Actual Behavior
----------------
isolcpu_plugin is not running after kubelet upgrade.

Reproducibility
---------------
100%

System Configuration
--------------------
All controllers and workers

Branch/Pull Time/Commit
-----------------------
Master

Last Pass
---------

Timestamp/Logs
--------------
[sysadmin@controller-0 ~(keystone_admin)]$ sw-manager kube-upgrade-strategy show
Strategy Kubernetes Upgrade Strategy:
  strategy-uuid: 88e0b598-6abe-480f-972e-56b59dde0f4f
  controller-apply-type: serial
  storage-apply-type: serial
  worker-apply-type: serial
  default-instance-action: stop-start
  alarm-restrictions: strict
  current-phase: abort
  current-phase-completion: 100%
  state: aborted
  apply-result: failed
  apply-reason: alarms ['200.006'] from platform are present
  abort-result: success
  abort-reason:

Test Activity
-------------
Testing

Workaround
----------
Use systemd to unmask and start isolcpu_plugin. Use pmon-start to monitor isolcpu_plugin.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/stx-puppet/+/899494

Changed in starlingx:
status: New → In Progress
summary: - controller-0 is degraded due to the failure of its isolcpu_plugi
+ controller-0 is degraded due to the failure of its isolcpu_plugin
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/c/starlingx/stx-puppet/+/899494
Committed: https://opendev.org/starlingx/stx-puppet/commit/c361bb90981f517a0338c3d5810196749261834e
Submitter: "Zuul (22348)"
Branch: master

commit c361bb90981f517a0338c3d5810196749261834e
Author: Gleb Aronsky <email address hidden>
Date: Fri Oct 27 09:06:05 2023 -0700

    Fix unmask service conditional for isolcpu_plugin

    After a kubelet upgrade, when restarting the isolcpu_plugin,
    the Puppet code needs to check that the isolcpu_plugin has
    been explicitly masked and not disabled. A disabled setting
    would indicate that the isolcpu_plugin should not run on that
    particular node, unlike a masked isolcpu_plugin. This change
    ensures that we differentiate between isolcpu_plugin being masked
    vs disabled, which in the former we would want to start.

    This commit corrects an erroneous update to line 952
    in commit https://review.opendev.org/c/starlingx/stx-puppet/+/894545/9..10

    Test Plan:
    Pass: This code was tested was tested as part of Bug #2036985
          but was not merged in it's entirety.

    Closes-Bug: 2041686
    Change-Id: I5192ac515be3e3804ec77a14a6515ffe1da0d26a
    Signed-off-by: Gleb Aronsky <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Gleb Aronsky (gleb-aronsky)
tags: added: stx.9.0 stx.config stx.containers
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.