Unable to unlock host after disabling CPU hyperthreading

Bug #1955608 reported by Kaustubh Dhokte
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Kaustubh Dhokte

Bug Description

Brief Description
-----------------
Host unlock fails after disabling CPU hyperthreading

Severity
-----------------
Critical.

Steps to Reproduce
-----------------
Install StarlingX on AIO-SX with CPU hyperthreading enabled

1. Host-lock
2. Manual Reboot
3. Disable Hyperthreading and reboot
4. Host-unlock

Expected Behavior
-----------------
last step "system host-unlock controller-0" should complete without error message

Please note that same procedure when enabling hyperthreading works.

Actual Behavior
-----------------

[sysadmin@controller-0 ~(keystone_admin)]$ system host-unlock 1

Remote error: SysinvException Failed to generate bootstrap token

[u'Traceback (most recent call last):\n', u' File "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/rpc/amqp.py", line 437, in _process_data\n **args)\n', u' File "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/rpc/dispatcher.py", line 172, in dispatch\n result = getattr(proxyobj, method)(ctxt, **kwargs)\n', u' File "/usr/lib64/python2.7/site-packages/sysinv/conductor/manager.py", line 1980, in configure_ihost\n self._configure_controller_host(context, host)\n', u' File "/usr/lib64/python2.7/site-packages/sysinv/conductor/manager.py", line 1539, in _configure_controller_host\n self._puppet.update_host_config(host)\n', u' File "/usr/lib64/python2.7/site-packages/sysinv/puppet/puppet.py", line 31, in _wrapper\n func(self, *args, **kwargs)\n', u' File "/usr/lib64/python2.7/site-packages/sysinv/puppet/puppet.py", line 148, in update_host_config\n config.update(puppet_plugin.obj.get_host_config(host))\n', u' File "/usr/lib64/python2.7/site-packages/sysinv/puppet/kubernetes.py", line 79, in get_host_config\n config.update(self._get_host_join_command(host))\n', u' File "/usr/lib64/python2.7/site-packages/sysinv/puppet/kubernetes.py", line 145, in _get_host_join_command\n join_cmd = self._get_kubernetes_join_cmd(host)\n', u' File "/usr/lib64/python2.7/site-packages/sysinv/puppet/kubernetes.py", line 206, in _get_kubernetes_join_cmd\n \'Failed to generate bootstrap token\')\n', u'SysinvException: Failed to generate bootstrap token\n'].

[sysadmin@controller-0 ~(keystone_admin)]$

Reproducibility
-----------------

100%

System Configuration
-----------------
AIO-SX

Branch/Pull Time/Commit
-----------------------
N/A

Last Pass
----------

N/A

Timestamp/Logs
------------------
N/A

Test Activity
---------------
N/a

Workaround
-----------------
n/a

Changed in starlingx:
assignee: nobody → Kaustubh Dhokte (kdhokte)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/822776

Changed in starlingx:
status: New → In Progress
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.7.0 stx.containers
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/827384

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on integ (master)

Change abandoned by "Kaustubh Dhokte <email address hidden>" on branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/822776
Reason: I somehow failed to rebase while resolving merge conflicts. A new review has been posted here. https://review.opendev.org/c/starlingx/integ/+/827384

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/827384
Committed: https://opendev.org/starlingx/integ/commit/c9b781b7c022e619a8bed2c58e686c989c4ca9eb
Submitter: "Zuul (22348)"
Branch: master

commit c9b781b7c022e619a8bed2c58e686c989c4ca9eb
Author: kdhokte <email address hidden>
Date: Tue Feb 1 21:09:03 2022 -0500

    sanitize reserved cpus list before kubelet starts

    The script will run everytime before the kubelet service is started.

    It reads the reserved-cpus list for the kubelet from the service
    environment file and sanitizes it on the basis of online CPUs.

    If none of the reserved cpus is online, it removes the
    --reserved-cpus flag from the environment file which allows
    the kubelet to choose CPUs itself.

    Sanitizing the reserved-cpus list everytime before the kubelet starts
    assures that the kubelet will not fail to start due to unavailability
    of one or more CPUs in the list.

    By enabling or disabling CPU hyperthreading, available CPUs change.
    This change will make sure changing CPU hyperthreading setting will
    not lead to kubelet start failure after the system boots up.

    Test Plan: (On AIO-SX)

    PASS:
    Initial Hyperthreading state: enabled
    Host-lock->Reboot->Disable CPU hyperthreading and reboot->Host-unlock
    Observe kubelet does not fail to start before host-unlock.
    All pods states are as expected. Host-unlock succeeds.

    PASS:
    Initial Hyperthreading state: disabled
    Host-lock->Reboot->Enable CPU hyperthreading and reboot->Host-unlock
    Observe kubelet does not fail to start before host-unlock.
    All pods states are as expected. Host-unlock succeeds.

    PASS:
    Manually restart the Kubelet service.
    Observe that the kubelet does not fail to start.
    All pods states are as expected.

    PASS:
    Host-lock->Host unlock (without any config change).
    Observe that the kubelet does not fail to start.
    All pods states are as expected.

    PASS:
    Packages built successfully on both Debian and CentOS.

    Closes-Bug: 1955608

    Change-Id: I699c5c36a56a50d4c48faa816edad69c17058079
    Signed-off-by: Kaustubh Dhokte <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/c/starlingx/integ/+/839866
Committed: https://opendev.org/starlingx/integ/commit/ee6eadab97e48956d43db6f2f79a02dc24eefb63
Submitter: "Zuul (22348)"
Branch: master

commit ee6eadab97e48956d43db6f2f79a02dc24eefb63
Author: Kaustubh Dhokte <email address hidden>
Date: Fri Apr 29 06:27:46 2022 +0000

    Debian: Correct "sanitize reserved cpus list before kubelet starts"

    This change makes a correction in kubeadm.conf for k8s 1.21.8 on
    Debian originally committed in
    https://review.opendev.org/c/starlingx/integ/+/827384

    /etc/sysconfig does not exist on Debian.
    Kubelet service environment variables file location is /etc/default/
    on StarlingX Debian.

    Test Plan:
    Package builds successfully

    Closes-Bug: 1955608

    Signed-off-by: Kaustubh Dhokte <email address hidden>
    Change-Id: Ic3f7f6a514088a3ccbd7f99c0433a8144e8d0ade

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.