containers in monitor namespace unexpectedly affined to Platform cpu

Bug #1849359 reported by Wendy Mitchell
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
John Kung

Bug Description

Brief Description
-----------------
containers in monitor namespace unexpectedly affined to Platform cpu

Severity
--------
Standard

Steps to Reproduce
------------------
The system has static cpu_manager_state policy and has platform-integ-apps applied successfully.
$ sudo cat /var/lib/kubelet/cpu_manager_state | python -mjson.tool
    "policyName": "static"

1. Upload the application
system application-upload stx-monitor-1.0-1.tgz
2. Assign the required host-labels then apply the monitor application
eg.
$system host-label-assign controller-0 elastic-data=enabled elastic-controller=enabled elastic-client=enabled etc...
$system application-apply stx-monitor

2. After apply of the monitor application, check the affinity of the containers in the monitor namespace

$ kubectl get pods -o wide -n monitor
$ cat /var/log/daemon.log | grep <podname> | grep reserved
$ cat /var/log/daemon.log | grep <podname> | grep isolcpus
$ cat /var/log/daemon.log | grep "namespace: monitor, pod: mon-"

Expected Behavior
------------------
 did not expect it to be affined to the platform cpus

Actual Behavior
----------------
containers in monitor namespace unexpectedly affined to what has been configured as Platform cpu

$ kubectl get pods -o wide -n monitor

see daemon.log

[sysadmin@controller-1 ~(keystone_admin)]$ cat /var/log/daemon.log | grep "namespace: monitor, pod: mon-"
2019-10-18T21:26:21.389 controller-1 kubelet[100828]: info I1018 21:26:21.389643 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-nginx-ingress-controller-7sd86, container: nginx-ingress-controller, container id: 993df50516f2a0cdc5ce005ce9d095719aa1d92899fa6c8fec9d4bd9ffa16dfa); cpuset=0,2,4,6
2019-10-18T21:26:22.676 controller-1 kubelet[100828]: info I1018 21:26:22.675690 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-nginx-ingress-default-backend-7c78c4ffc5-j7j4p, container: nginx-ingress-default-backend, container id: 2033fb03167b2d9d6babe2fabf81d381078f1245009d24406dde9f82d8a7b4d2); cpuset=0,2,4,6
2019-10-18T21:26:41.722 controller-1 kubelet[100828]: info I1018 21:26:41.722110 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-kibana-57ddf577f4-xw8gc, container: kibana, container id: 16a5f9a14baa5804a8c43fb1514d144bcc1e3ebe564da3bf43b0b935f6312f4a); cpuset=0,2,4,6
2019-10-18T21:26:52.620 controller-1 kubelet[100828]: info I1018 21:26:52.620283 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-kibana-57ddf577f4-xw8gc, container: kibana, container id: 16a5f9a14baa5804a8c43fb1514d144bcc1e3ebe564da3bf43b0b935f6312f4a); cpuset=0,2,4,6
2019-10-18T21:27:16.060 controller-1 kubelet[100828]: info I1018 21:27:16.060451 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-elasticsearch-master-0, container: configure-sysctl, container id: 4c706d55e63647027818a72e2912afd718c0069fc3670b22387bcc87aacf3eb0); cpuset=0,2,4,6
2019-10-18T21:27:17.489 controller-1 kubelet[100828]: info I1018 21:27:17.489256 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-elasticsearch-master-0, container: elasticsearch, container id: f9197fa04c9fac9e439d59cd09ad39796ed74723370a3a4d980e691a2efa27d2); cpuset=0,2,4,6
2019-10-18T21:28:30.922 controller-1 kubelet[100828]: info I1018 21:28:30.922190 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-elasticsearch-data-1, container: configure-sysctl, container id: 0e403827cb3a356a76633e952d2489e0eed9deeb074df0fe8e940c4fa97a4f94); cpuset=0,2,4,6
2019-10-18T21:28:31.493 controller-1 kubelet[100828]: info I1018 21:28:31.493891 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-elasticsearch-data-1, container: elasticsearch, container id: 804113bee3a86e1e37bbc5792e422430afabd323ad044c1a1959b1ad301cc77a); cpuset=0,2,4,6
2019-10-18T21:30:02.549 controller-1 kubelet[100828]: info I1018 21:30:02.548977 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-elasticsearch-client-0, container: configure-sysctl, container id: 4ba4801517e40732d541747b23792c4cfe874b238af963a3dac4c933d18ae105); cpuset=0,2,4,6
2019-10-18T21:30:03.721 controller-1 kubelet[100828]: info I1018 21:30:03.720759 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-elasticsearch-client-0, container: elasticsearch, container id: d1dcd74ca590c310430393ca09aebef88218cc7f41839417321a17c3131fbd3c); cpuset=0,2,4,6
2019-10-18T21:31:06.196 controller-1 kubelet[100828]: info I1018 21:31:06.196423 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-logstash-0, container: logstash, container id: 47f64a7b219bfae12595f1c1e02cebfcd6bffbaf64ab5f4959675f2d35712246); cpuset=0,2,4,6
2019-10-18T21:33:37.914 controller-1 kubelet[100828]: info I1018 21:33:37.913711 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-filebeat-j8kxj, container: setup-script, container id: 865e635667f8f57b8f370b058d63916ebfa9cb82024acc3ae2082099b8feff32); cpuset=0,2,4,6
2019-10-18T21:33:39.085 controller-1 kubelet[100828]: info I1018 21:33:39.085608 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-filebeat-j8kxj, container: filebeat, container id: 81177302b7ded6532159bbaaa18b29f0f0ee697321f3f1fd0bd199094b44cd64); cpuset=0,2,4,6
2019-10-18T21:33:41.708 controller-1 kubelet[100828]: info I1018 21:33:41.708708 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-filebeat-j8kxj, container: mon-filebeat-prometheus-exporter, container id: b21b39a8e3e3cbd4fb7aeb1ee588c86ccc9b28418ac6af8776b54fc083e3ee0d); cpuset=0,2,4,6
2019-10-18T21:33:51.550 controller-1 kubelet[100828]: info I1018 21:33:51.550196 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-metricbeat-hrdk8, container: setup-script, container id: 6e9659d9a6944e17ecc7f8c12becc85189ac2083f4e2987f6f31ca5d43238f13); cpuset=0,2,4,6
2019-10-18T21:33:53.533 controller-1 kubelet[100828]: info I1018 21:33:53.533388 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-metricbeat-hrdk8, container: metricbeat, container id: 6c77a31aa9baa15f9733af463b028f0115f37953b566bfe3e11cea6f974358f9); cpuset=0,2,4,6
2019-10-18T21:33:57.407 controller-1 kubelet[100828]: info I1018 21:33:57.407932 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-metricbeat-79fd648597-k6w9l, container: metricbeat, container id: 1d08315841dda661de187ff3c363f261072031689b7b1ec9563bd86b0c5b95a3); cpuset=0,2,4,6
2019-10-18T21:34:09.020 controller-1 kubelet[100828]: info I1018 21:34:09.020248 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-metricbeat-79fd648597-k6w9l, container: metricbeat, container id: 1d08315841dda661de187ff3c363f261072031689b7b1ec9563bd86b0c5b95a3); cpuset=0,2,4,6
2019-10-18T21:34:11.334 controller-1 kubelet[100828]: info I1018 21:34:11.334724 100828 policy_static.go:253] [cpumanager] static policy: reserved: AddContainer (namespace: monitor, pod: mon-kube-state-metrics-c78655d4f-rkk8r, container: kube-state-metrics, container id: 6401f532112e2031aa9d7113a94a30080915e48a4d9c3d70a1519e2418a525d3); cpuset=0,2,4,6

[sysadmin@controller-1 ~(keystone_admin)]$ sudo cat /var/lib/kubelet/cpu_manager_state | python -mjson.tool
{
    "checksum": 1851442141,
    "defaultCpuSet": "1,3,5,7,9,11,13,15-19",
    "entries": {
        "0c8e814a6e5341e11f9429ef2a05ee625b59f356739bb3d4822d5d748482e201": "0,2,4,6",
        "0d082f2c95983f7fde11c0f2e37a950d07361007d862ae4c3432c9c6eb436139": "0,2,4,6",
        "0d7ef27c52c4cf229936b01494257085e45deed00a5a1e40bd3d31c681f51ac8": "0,2,4,6",
        "0e403827cb3a356a76633e952d2489e0eed9deeb074df0fe8e940c4fa97a4f94": "0,2,4,6",
        "16a5f9a14baa5804a8c43fb1514d144bcc1e3ebe564da3bf43b0b935f6312f4a": "0,2,4,6",
        "1d08315841dda661de187ff3c363f261072031689b7b1ec9563bd86b0c5b95a3": "0,2,4,6",
        "2033fb03167b2d9d6babe2fabf81d381078f1245009d24406dde9f82d8a7b4d2": "0,2,4,6",
        "32aba49d21c789df84e821ad04a85a7ed6b2320bcb82666d59bc441f0b433254": "0,2,4,6",
        "3f553732714f2ca285686266f0cf21c4d3bcaacc71858476ea75526efef29ab6": "0,2,4,6",
        "47f64a7b219bfae12595f1c1e02cebfcd6bffbaf64ab5f4959675f2d35712246": "0,2,4,6",
        "4ba4801517e40732d541747b23792c4cfe874b238af963a3dac4c933d18ae105": "0,2,4,6",
        "4c706d55e63647027818a72e2912afd718c0069fc3670b22387bcc87aacf3eb0": "0,2,4,6",
        "57e23e04f029b5ce2e9d4f20216a73b9304821734a7ddc436ba79e4aa448c9d0": "0,2,4,6",
        "6401f532112e2031aa9d7113a94a30080915e48a4d9c3d70a1519e2418a525d3": "0,2,4,6",
        "6c77a31aa9baa15f9733af463b028f0115f37953b566bfe3e11cea6f974358f9": "0,2,4,6",
        "6e9659d9a6944e17ecc7f8c12becc85189ac2083f4e2987f6f31ca5d43238f13": "0,2,4,6",
        "762fca3947747f3258aef6e4425c7aae9c10100bcee94eac4e4c67a7cdbfde78": "0,2,4,6",
        "804113bee3a86e1e37bbc5792e422430afabd323ad044c1a1959b1ad301cc77a": "0,2,4,6",
        "81177302b7ded6532159bbaaa18b29f0f0ee697321f3f1fd0bd199094b44cd64": "0,2,4,6",
        "843a649f582b71ab079b475e4d83e9993abb62e9cddc7d06e12dd45e9c0125b7": "0,2,4,6",
        "865e635667f8f57b8f370b058d63916ebfa9cb82024acc3ae2082099b8feff32": "0,2,4,6",
        "911937f46bd0aa589bff4e8651d2bdbd60bf253fab14c94b44f5f3de66902b62": "0,2,4,6",
        "94bef667aa5342e27c53231d427342f5d1d583e8a7c90edcbd0045ff16e66e95": "0,2,4,6",
        "993df50516f2a0cdc5ce005ce9d095719aa1d92899fa6c8fec9d4bd9ffa16dfa": "0,2,4,6",
        "b21b39a8e3e3cbd4fb7aeb1ee588c86ccc9b28418ac6af8776b54fc083e3ee0d": "0,2,4,6",
        "c18eb4c78ca66e00962fd004e3e64aa36119ce382982a2472bb4498ee4aef03f": "0,2,4,6",
        "d1dcd74ca590c310430393ca09aebef88218cc7f41839417321a17c3131fbd3c": "0,2,4,6",
        "e6878c2f10824ca80e32df957ecb9f0041e2843fbe3606b0292436d0850f0f6f": "0,2,4,6",
        "e701318b8f09a6482dcc0930f278f3c5f0366d5d5d37e81371bdb2f683eb60ef": "0,2,4,6",
        "f9197fa04c9fac9e439d59cd09ad39796ed74723370a3a4d980e691a2efa27d2": "0,2,4,6"
    },
    "policyName": "static"
}

eg.
[sysadmin@controller-1 ~(keystone_admin)]$ sudo cat /var/lib/kubelet/cpu_manager_state | python -mjson.tool | grep 6401f532112e2031aa9d7113a94a30080915e48a4d9c3d70a1519e2418a525d3
        "6401f532112e2031aa9d7113a94a30080915e48a4d9c3d70a1519e2418a525d3": "0,2,4,6",
[sysadmin@controller-1 ~(keystone_admin)]$ sudo cat /var/lib/kubelet/cpu_manager_state | python -mjson.tool | grep 1d08315841dda661de187ff3c363f261072031689b7b1ec9563bd86b0c5b95a3
        "1d08315841dda661de187ff3c363f261072031689b7b1ec9563bd86b0c5b95a3": "0,2,4,6",
[sysadmin@controller-1 ~(keystone_admin)]$ sudo cat /var/lib/kubelet/cpu_manager_state | python -mjson.tool | grep 1d08315841dda661de187ff3c363f261072031689b7b1ec9563bd86b0c5b95a3
        "1d08315841dda661de187ff3c363f261072031689b7b1ec9563bd86b0c5b95a3": "0,2,4,6",
[sysadmin@controller-1 ~(keystone_admin)]$ sudo cat /var/lib/kubelet/cpu_manager_state | python -mjson.tool | grep 6c77a31aa9baa15f9733af463b028f0115f37953b566bfe3e11cea6f974358f9
        "6c77a31aa9baa15f9733af463b028f0115f37953b566bfe3e11cea6f974358f9": "0,2,4,6",

Reproducibility
---------------
yes

System Configuration
--------------------
Duplex
R720 1-2

Branch/Pull Time/Commit
-----------------------
2019-10-17_20-00-00

Last Pass
---------

Timestamp/Logs
--------------
See above

Test Activity
-------------
new Feature testing

Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :
tags: added: stx.retestneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

My understanding is this is currently under discussion to close on whether the stx-monitor should be part of the platform or run on application cpus. Assigning to John Kung, the feature prime, to update once a decision is made.

Changed in starlingx:
assignee: nobody → John Kung (john-kung)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.3.0 / medium priority - As per John Kung, the decision is not to affine the stx-monitor to platform resources. This LP will be used to make that change.

tags: added: stx.3.0 stx.config
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to monitoring (master)

Fix proposed to branch: master
Review: https://review.opendev.org/691752

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to monitoring (master)

Reviewed: https://review.opendev.org/691752
Committed: https://git.openstack.org/cgit/starlingx/monitoring/commit/?id=859d9ba677fed42f46c18a4f5b236223e58899cd
Submitter: Zuul
Branch: master

commit 859d9ba677fed42f46c18a4f5b236223e58899cd
Author: John Kung <email address hidden>
Date: Mon Oct 28 16:48:11 2019 -0400

    Update kube-addons in collectd to be independent of platform

    Update collectd-extensions, kube-addons to be collected independently
    of PLATFORM_GROUPS.

    Change-Id: I018ede5d67cbd17b3a342e811cf6b51cbcd674b0
    Closes-bug: 1849359
    Signed-off-by: John Kung <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :

Confirmed stx-monitor applied successfully on a 2+2 system and confirmed pods in monitor are not running on application cpu.
2019-10-31_18-19-37

system application-list
...
| stx-monitor | 1.0-1 | monitor-armada-manifest | stx-monitor.yaml | applied | completed

$ cat /var/log/daemon.log | grep cpuset=0,36 | grep monitor

Revision history for this message
Wendy Mitchell (wmitchellwr) wrote :

verified stx-monitor applied successfully on 2 controller system.
pods in monitor namespace are not running on platform cpu
R720 1-2
2019-11-02_08-39-54

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.