Kata runtime does not honor static cpu policy

Bug #1864382 reported by Brent Rowsell
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Triaged
Low
Unassigned

Bug Description

Brief Description
-----------------
I enabled static cpu policy kube-cpu-mgr-policy=static which will enable the feature on kubelet, --cpu-manager-policy=static

The server has 2 cpu's reserved for the system, so any static cpu assignments should start at 2.

I launch a pod with the following spec:

apiVersion: v1
kind: Pod
metadata:
  name: testpod1
spec:
  runtimeClassName: kata
  containers:
  - name: appcntr1
    image: centos/tools
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]
    resources:
      requests:
        cpu: 2
        memory: "1Gi"
      limits:
        cpu: 2
        memory: "1Gi"

I would expect the container to be pinned to cpu's 2-3, insted it is pinned to 0,1 which are the reserved cpu's

I then launch another pod with the following spec:

apiVersion: v1
kind: Pod
metadata:
  name: testpod3
spec:
  runtimeClassName: kata
  containers:
  - name: appcntr1
    image: centos/tools
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]
    resources:
      requests:
        cpu: 3
        memory: "1Gi"
      limits:
        cpu: 3
        memory: "1Gi"

I would expect the container to be pinned to cpu's 4-6. Instead it is also pinned to 0,1

Finally I launch a pod with best effort

apiVersion: v1
kind: Pod
metadata:
  name: testpod4
spec:
  runtimeClassName: kata
  containers:
  - name: appcntr1
    image: centos/tools
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]

It ends up getting pinned to cpu 0. I would have expected it to float across all cpu's not statically reserved.

Kata-runtime is not honoring the k8s cpu polices,

Severity
--------
Major, kata not useable for any high performance workloads

Steps to Reproduce
------------------
See above

Expected Behavior
------------------
See above

Actual Behavior
----------------
Does not work

Reproducibility
---------------
100%

System Configuration
--------------------
All

Branch/Pull Time/Commit
-----------------------
BUILD_DATE="2020-02-22 04:15:31 -0500"

Last Pass
---------
Never likely

Timestamp/Logs
--------------
2020-02-23T19:26:30.000 controller-0 kata[2526851]: warning time="2020-02-23T19:26:30.090151969Z" level=warning msg="sandbox's cgroup won't be updated: cgroup path is empty" ID=cb49feea52226d01289fc5ddd1bb98e8b5e08b970f83fb10f691dbcdf2a437da sandbox=cb49feea52226d01289fc5ddd1bb98e8b5e08b970f83fb10f691dbcdf2a437da source=virtcontainers subsystem=sandbox
2020-02-23T19:26:40.000 controller-0 kata[1857644]: warning time="2020-02-23T19:26:40.604504439Z" level=warning msg="Cannot hotplug 1 CPUs, currently this SB has 2 CPUs and the maximum amount of CPUs is 2" ID=30d3bd8f6a70c057f9df7e667129375d6ca1b6d75bd9ac9b7bd75b1605a2118f source=virtcontainers subsystem=qemu
2020-02-23T19:26:40.000 controller-0 kata[1857644]: warning time="2020-02-23T19:26:40.604551194Z" level=warning msg="maximum number of vCPUs '2' has been reached" ID=30d3bd8f6a70c057f9df7e667129375d6ca1b6d75bd9ac9b7bd75b1605a2118f source=virtcontainers subsystem=qemu
2020-02-23T19:26:40.000 controller-0 kata[1857644]: warning time="2020-02-23T19:26:40.609893031Z" level=warning msg="sandbox's cgroup won't be updated: cgroup path is empty" ID=30d3bd8f6a70c057f9df7e667129375d6ca1b6d75bd9ac9b7bd75b1605a2118f sandbox=30d3bd8f6a70c057f9df7e667129375d6ca1b6d75bd9ac9b7bd75b1605a2118f source=virtcontainers subsystem=sandbox
2020-02-23T19:26:40.000 controller-0 kata[2526851]: warning time="2020-02-23T19:26:40.618683944Z" level=warning msg="Cannot hotplug 2 CPUs, currently this SB has 2 CPUs and the maximum amount of CPUs is 2" ID=cb49feea52226d01289fc5ddd1bb98e8b5e08b970f83fb10f691dbcdf2a437da source=virtcontainers subsystem=qemu
2020-02-23T19:26:40.000 controller-0 kata[2526851]: warning time="2020-02-23T19:26:40.618727787Z" level=warning msg="maximum number of vCPUs '2' has been reached" ID=cb49feea52226d01289fc5ddd1bb98e8b5e08b970f83fb10f691dbcdf2a437da source=virtcontainers subsystem=qemu
2020-02-23T19:26:40.000 controller-0 kata[2526851]: warning time="2020-02-23T19:26:40.622697131Z" level=warning msg="sandbox's cgroup won't be updated: cgroup path is empty" ID=cb49feea52226d01289fc5ddd1bb98e8b5e08b970f83fb10f691dbcdf2a437da sandbox=cb49feea52226d01289fc5ddd1bb98e8b5e08b970f83fb10f691dbcdf2a437da source=virtcontainers subsystem=sandbox
2020-02-23T19:26:50.000 controller-0 kata[1857644]: warning time="2020-02-23T19:26:50.966405683Z" level=warning msg="Cannot hotplug 1 CPUs, currently this SB has 2 CPUs and the maximum amount of CPUs is 2" ID=30d3bd8f6a70c057f9df7e667129375d6ca1b6d75bd9ac9b7bd75b1605a2118f source=virtcontainers subsystem=qemu

Test Activity
-------------
Developer testing

Workaround
----------
None

summary: - Kata runtime does nor honor static cpu policy
+ Kata runtime does not honor static cpu policy
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / high priority - serious limitation with kata containers.

Unsure if this is related to the upstream kata container project or specific to StarlingX

Changed in starlingx:
importance: Undecided → High
status: New → Triaged
tags: added: stx.4.0 stx.containers
Changed in starlingx:
assignee: nobody → Lin Shuicheng (shuicheng)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to the Kata Containers Feature Prime

Revision history for this message
Lin Shuicheng (shuicheng) wrote :

Kata doesn't support static cpu manager policy yet.
Here is the issue opened in kata community:
https://github.com/kata-containers/runtime/issues/878
"support for static configuration of k8s cpu manager - container level cpu affinity #878"

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Lowering the priority given this is an upstream kata issue

tags: removed: stx.4.0
Changed in starlingx:
importance: High → Low
Changed in starlingx:
assignee: Lin Shuicheng (shuicheng) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.