AIODX: Platform tasks are floating on all cores
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Won't Fix
|
Low
|
Ghada Khalil |
Bug Description
Brief Description
-----------------
Platform tasks including those of docker are floating on all cores.
Severity
--------
Critical
Steps to Reproduce
------------------
Run top command, press 1 to see the detail of each core.
Launch a large number of pods and observe the cpu occupancy of platform cores vs application cores.
The occupancy of application cores would spike until the scaling is complete while the occupancy of platform cores would increase slightly.
Check the ps-sched.sh dump (ps-sched.sh|sort -k10 -n)
Check cpuset of docker cgroup (see Logs section below)
Note: the 2 controller nodes had originally been assigned with openstack-
Expected Behavior
------------------
Except for k8s-infra related tasks (known issue, work in progress), all other platform related tasks should run on CPU cores reserved for platform uses.
Actual Behavior
----------------
Many platform tasks such as postgres, docker, mtcClient, lldpd, ceph-mgr, sm, etc... are running on non-platform cores.
Reproducibility
---------------
Reproducible in the load stated below.
System Configuration
-------
AIODX, IPv6
Branch/Pull Time/Commit
-------
OS="centos"
SW_VERSION="19.09"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID=
JOB="STX_
<email address hidden>"
BUILD_NUMBER="221"
BUILD_HOST=
BUILD_DATE=
Last Pass
---------
I am not sure if this test was conducted before in an AIODX (IPv6) system
Timestamp/Logs
--------------
controller-0:/tmp# systemd-cgls cpuset
....
....
....
....
├─docker
│ ├─dc4daa4e77401
│ │ ├─2323134 uwsgi -b 32768 --die-on-term --http :8000 --http-timeout 3600 --enable-threads -L --lazy-apps --master --paste config:
│ │ ├─2323161 uwsgi -b 32768 --die-on-term --http :8000 --http-timeout 3600 --enable-threads -L --lazy-apps --master --paste config:
│ │ ├─2323162 uwsgi -b 32768 --die-on-term --http :8000 --http-timeout 3600 --enable-threads -L --lazy-apps --master --paste config:
│ │ ├─2323163 uwsgi -b 32768 --die-on-term --http :8000 --http-timeout 3600 --enable-threads -L --lazy-apps --master --paste config:
│ │ ├─2323164 uwsgi -b 32768 --die-on-term --http :8000 --http-timeout 3600 --enable-threads -L --lazy-apps --master --paste config:
│ │ └─2323165 uwsgi -b 32768 --die-on-term --http :8000 --http-timeout 3600 --enable-threads -L --lazy-apps --master --paste config:
│ ├─2586137481c4a
│ │ ├─4024261 /bin/sh -c /edgex/
│ │ ├─4024322 /bin/sh /edgex/
│ │ └─4024327 mongod --smallfiles --ipv6 --bind_ip_all
│ └─5207389c0e411
│ ├─3304583 /bin/sh -c rm -rf /consul/data/* && docker-
│ ├─3304648 /bin/dumb-init /bin/sh /usr/local/
│ ├─3304649 tee /edgex/
│ └─3304650 consul agent -data-dir=
controller-0:/tmp# cat /proc/2323134/
11:memory:
10:cpuset:
9:blkio:
8:net_prio,
7:devices:
6:perf_
5:cpuacct,
4:freezer:
3:pids:
2:hugetlb:
1:name=
controller-0:/tmp# cd /sys/fs/
controller-
controller-
0-35
Attached is the dump of ps-sched.sh on controller-1 (primary controller), e.g. some postgres related processes on core #7
Test Activity
-------------
System Test
I was informed that the platform tasks affining job is tied to the openstack- compute- node label.
After removing the openstack related labels on both controllers, the platform tasks (except for k8s infra related) seem to be affined correctly. However, some pods failed to launch due to Insufficient Memory. It turned out that Kubernetes allocatable memory is tied to the openstack labels.
When the node has openstack labels, cpu and memory reserved for platform use are "visible" to Kubernetes and thus allocatable to pods. Below is the comparison of 2 controller nodes, one with openstack labels assigned and one without
Controller-0 (without openstack labels) kubeconfig= /etc/kubernetes /bootstrap- kubelet. conf --kubeconfig= /etc/kubernetes /kubelet. conf --config= /var/lib/ kubelet/ config. yaml --cgroup- driver= cgroupfs --network- plugin= cni --pod-infra- container- image=k8s. gcr.io/ pause:3. 1 --node-ip=face::3 --cpu-manager- policy= none
============
root 117578 1 4 13:05 ? 00:15:45 /usr/bin/kubelet --bootstrap-
Capacity: com/pci_ sriov_net_ group0_ data0: 64 com/pci_ sriov_net_ group0_ data0: 0
cpu: 36
ephemeral-storage: 10190100Ki
hugepages-1Gi: 60Gi
hugepages-2Mi: 0
intel.
memory: 97528444Ki
pods: 110
Allocatable:
cpu: 36 <-- platform and vswitch cpus have not been deducted
ephemeral-storage: 9391196145
hugepages-1Gi: 60Gi
hugepages-2Mi: 0
intel.
memory: 34511484Ki <--- ~32G (platform mem has not been deducted)
pods: 110
controller-1 (without openstack labels) kubeconfig= /etc/kubernetes /bootstrap- kubelet. conf --kubeconfig= /etc/kubernetes /kubelet. conf --config= /var/lib/ kubelet/ config. yaml --cgroup- driver= cgroupfs --network- plugin= cni --pod-infra- container- image=k8s. gcr.io/ pause:3. 1 --node-ip=face::4 --cpu-manager- policy= static --system- reserved- cgroup= /system. slice --system- reserved= cpu=2,memory= 16500Mi
============
root 117762 1 3 15:24 ? 00:06:10 /usr/bin/kubelet --bootstrap-
Capacity: com/pci_ sriov_net_ group0_ data0: 64 com/pci_ sriov_net_ group0_ data0: 64
cpu: 36
ephemeral-storage: 10190100Ki
hugepages-1Gi: 70Gi
hugepages-2Mi: 0
intel.
memory: 97528444Ki
pods: 110
Allocatable:
cpu: 34 <--- vswitch cpus have not been deducted
ephemeral-storage: 9391196145
hugepages-1Gi: 70Gi
hugepages-2Mi: 0
intel.
memory: 7129724Ki <--- ~6G (platform mem has been deducted)
pods: 110