Platform kubernets Cgroup (k8s-infra) reported value for cpuset.cpus is incorrect

Bug #1824563 reported by Wendy Mitchell
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Jim Gauld

Bug Description

Brief Description
-----------------
Platform kubernetes cgroup reported value for cpuset.cpus (in the puppet log and in /sys/fs/cgroup/cpuset/k8s-infra/cpuset.cpus) is incorrect

Severity
--------
standard

Steps to Reproduce
------------------
1 install and unlock worker node
2.confirm value for the platform cpu is eg. 0,36 as follows:
 platform::kubernetes::params::k8s_cpuset: 0,36

<compute-1># sudo grep cpu /opt/platform/puppet/19.01/hieradata/*
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml:nova::compute::shared_pcpu_map: '""'
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml:nova::compute::vcpu_pin_set: '"3-35,39-71"'
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml:platform::compute::grub::params::cpu_options: nohz_full=1-35,37-71 isolcpus=1-35,37-71
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml: rcu_nocbs=1-35,37-71 kthread_cpus=0,36 irqaffinity=0,36
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml:platform::compute::grub::params::n_cpus: 72
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml:platform::compute::params::platform_cpu_list: '"0,36"'
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml:platform::compute::params::worker_cpu_list: '"0-71"'
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml:platform::compute::pmqos::hight_wakeup_cpus: '"0,3-36,39-71"'
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml:platform::compute::pmqos::low_wakeup_cpus: '"1-2,37-38"'
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml:platform::kubernetes::params::k8s_cpuset: 0,36 (this does not have quotes here)
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml:platform::params::platform_cpu_count: 2

3. Confirm the puppet heiradata reports the k9s_cpuset also as 0,36 for this worker node
<compute-1>:~# grep -rs k8s /opt/platform/puppet/19.01/hieradata/ (looks for compute-1 in question for example)
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml:platform::kubernetes::params::k8s_cpuset: 0,36 (this does not have quotes)
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml:platform::kubernetes::params::k8s_nodeset: '0'
4. Confirm the platform-cpuaffinity.conf also is 0,36

compute-1:/sys/fs/cgroup/cpuset/k8s-infra# more /etc/systemd/system.conf.d/platform-cpuaffinity.conf l
::::::::::::::
/etc/systemd/system.conf.d/platform-cpuaffinity.conf
::::::::::::::
[Manager]
CPUAffinity="0,36"

5. Confirm the output from the puppet.log for the k8s-infra cpuset
compute-1:/sys/fs/cgroup/cpuset/k8s-infra# grep "Set k8s" /var/log/puppet/latest/puppet.log
2019-04-12T15:13:07.884 Notice: 2019-04-12 15:13:06 +0000 Scope(Class[Platform::Kubernetes::Cgroup]): Set k8s-infra nodeset: 0, cpuset: 30

6. Confirm the settings in the following files cpuset.mems and and cpuset.cpus (in the path compute-1:/sys/fs/cgroup/cpuset/k8s-infra#)
compute-1:/sys/fs/cgroup/cpuset/k8s-infra# cat cpuset.mems
0
compute-1:/sys/fs/cgroup/cpuset/k8s-infra# cat cpuset.cpus
30

Expected Behavior
------------------
In step 5, expected puppet.log to report cpuset: 0,36 (not 30)
In step 6, execpted the cpuset.cpus file to have 0,36 (not 30)

Actual Behavior
----------------

See actual output in step 5 and 6

Reproducibility
---------------
yes

System Configuration
--------------------
2+3
(Optional Hyperthreaded, low-latency lab yow-cgcs-wildcat-92-98 )

Branch/Pull Time/Commit
-----------------------
BUILD_ID="20190412T013000Z"

Timestamp/Logs
--------------
see puppet.log

2019-04-12T15:13:07.880 Notice: 2019-04-12 15:13:06 +0000 Scope(Class[Platform::Kubernetes::Cgroup]): Create /sys/fs/cgroup/[cpuset, cpu, cpuacct, memory, systemd]/k8s-infra

2019-04-12T15:13:07.884 Notice: 2019-04-12 15:13:06 +0000 Scope(Class[Platform::Kubernetes::Cgroup]): Set k8s-infra nodeset: 0, cpuset: 30

Jim Gauld (jgauld)
Changed in starlingx:
assignee: nobody → Jim Gauld (jgauld)
Revision history for this message
Jim Gauld (jgauld) wrote :

This is new issue related to my recent code https://review.openstack.org/#/c/648511/ .

The cpulist values are correct in hierdata, but are mangled when used by puppet.
Solution is to wrap the values in quotes, like we do for various other parameters.

I made the simple fix manually on this lab to /usr/lib64/python2.7/site-packages/sysinv/puppet/kubernetes.py, restarted sysinv-conductor, and locked/unlocked the compute, and issue was resolved.

BEFORE, looking at compute-1 :
controller-0:~# grep -rs k8s_ /opt/platform/puppet/19.01/hieradata/
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml:platform::kubernetes::params::k8s_cpuset: 0,36
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml:platform::kubernetes::params::k8s_nodeset: '0'

AFTER manual fix, we correctly get the 0,36:

controller-0:~# grep -rs k8s_ /opt/platform/puppet/19.01/hieradata/
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml:platform::kubernetes::params::k8s_cpuset: '"0,36"'
/opt/platform/puppet/19.01/hieradata/192.168.204.96.yaml:platform::kubernetes::params::k8s_nodeset: '"0"'

compute-1:~# grep -rs "Set k8s" /var/log/puppet/latest/puppet.log
2019-04-12T17:25:01.064 Notice: 2019-04-12 17:24:59 +0000 Scope(Class[Platform::Kubernetes::Cgroup]): Set k8s-infra nodeset: "0", cpuset: "0,36"

compute-1:/sys/fs/cgroup/cpuset/k8s-infra# cat cpuset.cpus
0,36
compute-1:/sys/fs/cgroup/cpuset/k8s-infra# cat cpuset.mems
0

compute-1:# cat /etc/systemd/system.conf.d/platform-cpuaffinity.conf
[Manager]
CPUAffinity="0,36"

The affinity of platform tasks and kubernetes task are on cpus 0,36 as desired:
compute-1:~$ ps-sched.sh | grep -e COMM -e bird|cut -c1-120
   PID TID PPID S PO NICE RTPRIO PR AFFINITY P COMM COMMAND
 46516 46516 46269 S TS 0 - 20 0x1000000001 0 runsv runsv bird
 46517 46517 46269 S TS 0 - 20 0x1000000001 36 runsv runsv bird6
 46684 46684 46517 S TS 0 - 20 0x1000000001 36 bird6 bird6 -R -s /var/run/calico/bird
 46686 46686 46516 S TS 0 - 20 0x1000000001 36 bird bird -R -s /var/run/calico/bird.

Jim Gauld (jgauld)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-config (master)

Fix proposed to branch: master
Review: https://review.openstack.org/652719

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating; high priority. This is related to a recent containerization feature.

Changed in starlingx:
importance: Undecided → High
tags: added: stx.2.0 stx.containers
tags: added: stx.retestneeded
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-config (master)

Reviewed: https://review.openstack.org/652719
Committed: https://git.openstack.org/cgit/openstack/stx-config/commit/?id=26b34909d038675fb86b526ce4da5ab935bbdf57
Submitter: Zuul
Branch: master

commit 26b34909d038675fb86b526ce4da5ab935bbdf57
Author: Jim Gauld <email address hidden>
Date: Mon Apr 15 12:12:39 2019 -0400

    Fix kubernetes k8s-infra cpuset.cpus mangled values

    The platform kubernetes cgroup configured value for cpuset.cpus
    (i.e., /sys/fs/cgroup/cpuset/k8s-infra/cpuset.cpus) is sometimes
    incorrect. The cpulist values are correct in hierdata, but are
    mangled when used by puppet. Solution is to wrap the values in
    quotes, like we do for various other parameters.

    This bug was introduced by this code:
     https://review.openstack.org/#/c/648511/

    Change-Id: Ic49090502242cc1f59dd09afad0db46cc9e399c2
    Closes-Bug: 1824563
    Signed-off-by: Jim Gauld <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :

verified
BUILD_TYPE="Formal"
BUILD_ID="20190415T233001Z"

tags: removed: stx.retestneeded
tags: added: stx.config
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.