[k8s] Pod Limits, Requests and QOS setup
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
Wishlist
|
Unassigned |
Bug Description
There are multiple issues opened requesting APIs to give control for limits and requests:
* https:/
* https:/
* https:/
This bug aims at highlighting what are the consequences of having these values set, and how to set them.
Juju does allow to configure limits and requests via constraints, however, there are two main problems: (1) it will build it with the same values; and (2) it will not set the initContainer, which means it is not possible to configure Pods with QOSClass=
As described in LP#1919976, comment #8, it is possible to set these parameters post deployment, for example:
$ kubectl patch sts postgresql-k8s -n test -p '{"spec"
That will trigger a RESTART in the workload.
-------
I've built a quick test environment with microk8s 1.27.5 in classic confinement. Here is what I have observed.
As-is, when we deploy a workload, each Pod of the statefulset will come up without limits nor requests. Each Pod will have QOSClass set to:
$ kubectl get po -n test postgresql-k8s-0 -o=yaml | grep -i qosclass
qosClass: BestEffort
Checking the specifics of that workload (postgresql): https:/
Patching one of the containers to have limits and requests (with the same size, and for both CPU and memory):
$ kubectl patch sts postgresql-k8s -n test -p '{"spec"
statefulset.
And results in QOSClass=Burstable: https:/
It also results in pods with a better OOM score and enforced max memory.
If all containers are patched, including the init-containers, then the pod moves to QOSClass=
To discover all containers' names, use:
$ kubectl get sts -o=json -n test postgresql-k8s | jq .spec.template.
$ kubectl get sts -o=json -n test postgresql-k8s | jq .spec.template.
The result of patching all the containers AND init-containers to limits and requests having the same values for both memory and cpu results in QOSClass=Guaranteed and:
https:/
-------
As discussed in LP#2023782, limits and requests will create cgroups to enforce these limitations. Each container will receive a different cgroup, as for example postgresql container and its processes are the only isolated in one cgroup:
https:/
Whereas charm container has a different cgroup: https:/
-------
Conclusions:
1) It is possible to set QOS values post-deployment, via charms
2) Moving from BestEffort to Guaranteed reduces drastically reduces the oom_score_adj. According to the documentation [1], the lower oom_score_adj is, the least is the chance OOMKiller will kill the process
3) Guaranteed also gives the best chance to avoid eviction from Kubernetes [2]
4) However, if the workload crosses the memory limit, it will have a very high chance of being OOM'ed [4]
Therefore, for workloads such as databases, it is very interesting to be set as Guaranteed.
I recommend we also set the Juju controller as Guaranteed if we can guarantee its memory consumption. That reduces the chance of OOM.
As a side note, Pods must be Guaranteed to allow other setups, such as static CPU (i.e. pinning workloads to cores) [3]. That can be interesting for other workloads.
From Juju team, we need to either provide APIs to configure the values above OR to allow charms to edit these values and defined moments Juju must override them (e.g. at upgrades).
-------
[1] https:/
Acceptable values range from -1000
(OOM_SCORE_ADJ_MIN) to +1000 (OOM_SCORE_
polarize the preference for oom killing either by always preferring a certain
task or completely disabling it. The lowest possible value, -1000, is
equivalent to disabling oom killing entirely for that task since it will always
report a badness score of 0.
[2] https:/
[3] https:/
[4] https:/
Memory usage hard limit. This is the main mechanism to limit memory usage of a cgroup. If a cgroup's memory usage reaches this limit and can't be reduced, the OOM killer is invoked in the cgroup. Under certain circumstances, the usage may go over the limit temporarily.
[5] Reference from Kubernetes source code where it decides to keep the Guaranteed status, as long as Limits and Requests have the same value:
https:/
description: | updated |
description: | updated |
Changed in juju: | |
status: | New → Triaged |
importance: | Undecided → Wishlist |
tags: | added: canonical-data-platform-eng |
Regarding: /bugs.launchpad .net/juju/ +bug/2035102
* https:/
It is also possible to set terminationGrac ePeriodSeconds post-deployment of the charm with:
$ kubectl patch sts postgresql-k8s -n test -p '{"spec" :{"template" :{"spec" :{"terminationG racePeriodSecon ds": VALUE_OF_ CHOICE} }}}'
That will trigger a restart in the statefulset.