CPU resource 'limit' value without having integer on pod is not working properly

Bug #1997528 reported by Boovan Rajendran
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Boovan Rajendran

Bug Description

Brief Description

In terms of CPU setting of Guaranteed QoS on pod, 'limit' without having integer is not working properly.

limit with integer(like 2,  1000m, 2000m) is working normal
limit with non-integer(like 200m, 500m, 1500m) is not working

Severity

Major: System/Feature is usable but degraded

Steps to Reproduce:

[sysadmin@controller-0 ~(keystone_admin)]$ system host-label-list controller-1

-------------------------------------++-----------------

hostname     label key                label value

-------------------------------------++-----------------

controller-1 kube-cpu-mgr-policy      static

-------------------------------------++-----------------

# set CPU limit : 200m, and deploy 10  pod

[sysadmin@controller-0 ~(keystone_admin)]$ cat podtest3.yaml

apiVersion: apps/v1

kind: Deployment

metadata:

  name: my-nginx

spec:

  selector:

    matchLabels:

      run: my-nginx

  replicas: 10

  template:

    metadata:

      labels:

        run: my-nginx

      annotations:

        k8s.v1.cni.cncf.io/networks: '[

            { "name": "datanet1" }

        ]'

    spec:

      nodeSelector:

        kubernetes.io/hostname: controller-1

      containers:

      - name: mypod

        command: ["/bin/bash", "-c", "trap : TERM INT; sleep infinity & wait"]

        image: dougbtv/centos-network

        securityContext:

          privileged: true

          allowPrivilegeEscalation: true

        resources:

          limits:

            cpu: 200m

            memory: 100Mi

            intel.com/pci_sriov_net_datanet_1: 1

          requests:

            cpu: 200m

            memory: 100Mi

            intel.com/pci_sriov_net_datanet_1: 1

 #Each pod ran “yes > /dev/null &” in a row to increase CPU load.

# Usage exceeded the limit

[sysadmin@controller-0 ~(keystone_admin)]$ kubectl top pods

NAME                                           CPU(cores)   MEMORY(bytes)

my-nginx-54f7f9f796-6bx6p                      3050m        1Mi

my-nginx-54f7f9f796-cwkfp                      2877m        1Mi

my-nginx-54f7f9f796-j75pb                      3046m        1Mi

my-nginx-54f7f9f796-nmx5n                      3027m        1Mi

my-nginx-54f7f9f796-qpnhg                      3014m        1Mi

my-nginx-54f7f9f796-qxfsb                      3275m        1Mi

my-nginx-54f7f9f796-rl5cm                      3349m        1Mi

my-nginx-54f7f9f796-t7df5                      3359m        1Mi

my-nginx-54f7f9f796-vr6dx                      3223m        2Mi

my-nginx-54f7f9f796-xfbxh                      3942m        2Mi

 #CPU limit=1500 ms, 3  pod deployed -> ISSUE

[sysadmin@controller-0 ~(keystone_admin)]$ kubectl top pods

NAME                                           CPU(cores)   MEMORY(bytes)

my-nginx-5c849665d4-5kjvw                      11022m       1Mi

my-nginx-5c849665d4-7fhrm                      10678m       1Mi

my-nginx-5c849665d4-q92sv                      5953m        2Mi

# CPU limit=2000 ms, 3 pod deployed -> NORMAL

[sysadmin@controller-0 ~(keystone_admin)]$ kubectl top pods

NAME                                           CPU(cores)   MEMORY(bytes)

my-nginx-55f48d47d6-cz65z                      2000m        1Mi

my-nginx-55f48d47d6-zgctv                      2000m        1Mi

my-nginx-55f48d47d6-zsxrc                      2000m        1Mi

Expected Behavior

The CPU usage should be limited as 'limit' configured.

Changed in starlingx:
assignee: nobody → Boovan Rajendran (brajendr)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/865313
Committed: https://opendev.org/starlingx/integ/commit/e24f687606d25424bfd09dd53c7195e6d180a069
Submitter: "Zuul (22348)"
Branch: master

commit e24f687606d25424bfd09dd53c7195e6d180a069
Author: Boovan Rajendran <email address hidden>
Date: Tue Nov 22 12:10:48 2022 -0500

    kubelet CFS quota throttling for non integer cpulimit

    This patch is used to set cgroups by writing -1 to cgroup
    cpu.cfs_quota_us when the cpulimit has integer value.

    Test Plan:
    Verified the pods that in the "Guaranteed" QoS class, on hosts that
    have "kube-cpu-mgr-policy=static" have cpu.cfs_quota_us set to -1 for
    integer cpu value.

    Closes-Bug: #1997528

    Signed-off-by: Boovan Rajendran <email address hidden>
    Change-Id: I06a5ea791b9392483414323db1f2ae0962a466ce

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/866204

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Re-opening. The original review ( https://review.opendev.org/c/starlingx/integ/+/865313 ) was reverted ( https://review.opendev.org/c/starlingx/integ/+/866092 ) as it caused a build failure. The change is being reworked in a follow-up review ( https://review.opendev.org/c/starlingx/integ/+/866204 )

Changed in starlingx:
status: Fix Released → In Progress
tags: added: stx.8.0 stx.containers
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/866204
Committed: https://opendev.org/starlingx/integ/commit/dd75f3ba3d41180a2db9cb9099f6ce02f34820c0
Submitter: "Zuul (22348)"
Branch: master

commit dd75f3ba3d41180a2db9cb9099f6ce02f34820c0
Author: Boovan Rajendran <email address hidden>
Date: Wed Nov 30 08:35:13 2022 -0500

    kubelet CFS quota throttling for non integer cpulimit

    A previous change set the cgroup cpu.cfs_quota_us value to -1 for
    containers in pods in the Guaranteed QoS class.

    We can only do this if we're allocating the entire CPU. For non-
    integer CPU allocations we need to set the cpu.cfs_quota_us value
    to enforce the CPU limit configured on the container.

    Test Plan:
    Verified the pods that in the "Guaranteed" QoS class, on hosts that
    have "kube-cpu-mgr-policy=static" have cpu.cfs_quota_us set to -1 for
    integer cpu value.

    Closes-Bug: 1997528

    Signed-off-by: Boovan Rajendran <email address hidden>
    Change-Id: I33662e67706cee4cb0ce005bb09ce3b5fc717239

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.