StarlingX

Kata runtime does not support hugepages

Bug #1864383 reported by Brent Rowsell on 2020-02-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Triaged	Low	Unassigned

Bug Description

Brief Description
-----------------
Kata runtime does not support hugepages assigned via k8s
I launch a pod with the following spec

apiVersion: v1
kind: Pod
metadata:
  name: testpod2
spec:
  runtimeClassName: kata
  containers:
  - name: appcntr1
    image: centos/tools
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]
    volumeMounts:
    - mountPath: /hugepages
      name: hugepage
    resources:
      requests:
        cpu: 2
        memory: "1Gi"
        hugepages-1Gi: 1Gi
      limits:
        cpu: 2
        memory: "1Gi"
        hugepages-1Gi: 1Gi
  volumes:
    - name: hugepage
      emptyDir:
        medium: HugePages

I would expect to see it to see a hugepages mount in the container as follows

nodev on /hugepages type hugetlbfs (rw,relatime,pagesize=1Gi)

It is not present for kata, works fine for runc

Severity
--------
Majo performance feature not available with kata runtime

Steps to Reproduce
------------------
See above

Expected Behavior
------------------
See above

Actual Behavior
----------------
See above

Reproducibility
---------------
100%

System Configuration
--------------------
All

Branch/Pull Time/Commit
-----------------------
BUILD_DATE="2020-02-22 04:15:31 -0500"

Last Pass
---------
Likely never

Timestamp/Logs
--------------
See above

Test Activity
-------------
Developer Testing

Workaround
----------
None

Tags:

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2020-02-24:

stx.4.0 / high priority - serious limitation with kata containers.

This will likely require follow-up with the upstream kata container project.

tags:	added: stx.4.0 stx.containers
Changed in starlingx:
status:	New → Triaged
importance:	Undecided → High
assignee:	nobody → Lin Shuicheng (shuicheng)

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2020-02-24:

Assigning to the Kata Containers Feature Prime

Revision history for this message

Lin Shuicheng (shuicheng) wrote on 2020-02-26:

There is issue with huge page support in kata currently.
And it is already tracked in kata community.
Here is the open issues relate to huge page with kata runtime:
https://github.com/kata-containers/runtime/issues/2353
https://github.com/kata-containers/runtime/issues/2172
https://github.com/kata-containers/runtime/issues/1548

Revision history for this message

Steven Webster (swebster-wr) wrote on 2020-03-31:

Just as a data point here's what I had to do to get kata+hugepages somewhat working in a StarlingX lab:

Option 1. Custom built kata-containers kernel and rootfs with an empty hugepages
directory mapping those on the host system

Kernel config:

CONFIG_CGROUP_HUGETLB=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD=y
CONFIG_HAVE_ARCH_HUGE_VMAP=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y
CONFIG_TRANSPARENT_HUGE_PAGECACHE=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y

Rootfs empty directory created:

/mnt/huge-1048576kB
/mnt/huge-2048kB

Modify the containerd config.toml with:

[plugins.cri.containerd.runtimes.kata]
runtime_type = "io.containerd.kata.v2"
+ pod_annotations = ["io.katacontainers.*"]

The user could then specify the hugepage configuration in the pod annotation:

annotations:
io.katacontainers.config.hypervisor.kernel_params: "default_hugepagesz=1G hugepagesz=1G hugepages=2"

Note that in the above command, the hugepages=2 does not seem to actually set the number of hugepages.

Since Kata does not support the k8s sysctl facility:
https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/ ,

It's possible to write a systemd service/target which takes the value of hugepages in the kernel parameters and sets the sysctl on vm startup.

Finally, in the container itself, it's up to the end user application to create a hugetlbfs mount for hugepages:

mount -t hugetlbfs -o pagesize=1G none /mnt/huge-1048576kB

mount -t hugetlbfs -o pagesize=2M none /mnt/huge-2048kB

Option 2. In the kata-containers configuration.toml, set enable_hugepages = true

This results in all memory in the kata VM being backed by hugepages, based on
the memory resource request/limits

Similar to 1. , it would then be required for the container application itself
to create a hugepage directory, then mount the hugetlbfs appropriately

Just as a data point here's what I had to do to get kata+hugepages somewhat working in a StarlingX lab:

Option 1. Custom built kata-containers kernel and rootfs with an empty hugepages
directory mapping those on the host system

Kernel config:

Rootfs empty directory created:

/mnt/huge-1048576kB
/mnt/huge-2048kB

Modify the containerd config.toml with:

[plugins.cri.containerd.runtimes.kata]
          runtime_type = "io.containerd.kata.v2"
+          pod_annotations = ["io.katacontainers.*"]

The user could then specify the hugepage configuration in the pod annotation:

annotations:
    io.katacontainers.config.hypervisor.kernel_params: "default_hugepagesz=1G hugepagesz=1G hugepages=2"

Note that in the above command, the hugepages=2 does not seem to actually set the number of hugepages.

Since Kata does not support the k8s sysctl facility:
https://kubernetes.io/docs/tasks/administer-cluster/sysctl-cluster/ ,

It's possible to write a systemd service/target which takes the value of hugepages in the kernel parameters and sets the sysctl on vm startup.

Finally, in the container itself, it's up to the end user application to create a hugetlbfs mount for hugepages:

mount -t hugetlbfs -o pagesize=1G none /mnt/huge-1048576kB

mount -t hugetlbfs -o pagesize=2M none /mnt/huge-2048kB

Option 2. In the kata-containers configuration.toml, set enable_hugepages = true

This results in all memory in the kata VM being backed by hugepages, based on
the memory resource request/limits

Similar to 1. , it would then be required for the container application itself
to create a hugepage directory, then mount the hugetlbfs appropriately

Revision history for this message

Frank Miller (sensfan22) wrote on 2020-07-06:

This support is not feasible in the stx.4.0 timeline. Moving this to stx.5.0.

tags:

added: stx.5.0
removed: stx.4.0

Lin Shuicheng (shuicheng) on 2020-12-05

Changed in starlingx:
assignee:	Lin Shuicheng (shuicheng) → nobody

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2021-03-10:

Lowering the priority as nobody seems to be working on this. We will not hold up stx.5.0 for this issue.

tags:	removed: stx.5.0
Changed in starlingx:
importance:	High → Low

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.