Kata container runtime does not include support for SR-IOV devices

Bug #1867927 reported by Steven Webster
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Triaged
Low
Unassigned

Bug Description

Brief Description
-----------------
The kata runtime shipped with StarlingX does not fully support SR-IOV network devices assigned to a pod/container. The devices themselves are able to be assigned, but currently there is no driver support in the kata kernel.

Severity
--------
Major performance feature not available with kata runtime

Steps to Reproduce
------------------
- Ensure that pci-sriov classed interface(s) are assigned to a worker node
  - (system host-if-modify <worker> <interface> -c pci-sriov -n sriov1 -N <number of VFs>)
- Ensure the sriovdp label is applied
  - (system host-label-assign <worker> sriovdp=enabled)
- Ensure the SR-IOV interface is assigned to a data network
  - (system interface-datanetwork-assign <worker> <interface> <datanetwork>)

- Launch a pod with SR-IOV devices and observe that the devices can be seen with lspci
- Observe that the device cannot be bound to any driver and is not usable

Sample network attachment definition spec:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: sriov1
  annotations:
    k8s.v1.cni.cncf.io/resourceName: intel.com/pci_sriov_net_group0_data0
spec:
  config: '{
      "cniVersion": "0.3.0",
      "type": "sriov"
    }'

Sample pod spec:

apiVersion: v1
kind: Pod
metadata:
  name: testpod1
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov1
spec:
  runtimeClassName: kata
  containers:
  - name: appcntr1
    image: centos/tools
    imagePullPolicy: IfNotPresent
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 300000; done;" ]
    resources:
      requests:
        cpu: 2
        memory: "1Gi"
  intel.com/pci_sriov_net_group0_data0: '1'
      limits:
        cpu: 2
        memory: "1Gi"
  intel.com/pci_sriov_net_group0_data0: '1'

Expected Behavior
------------------
See above

Actual Behavior
----------------
See above

Reproducibility
---------------
100%

System Configuration
--------------------
All

Branch/Pull Time/Commit
-----------------------
BUILD_DATE="2020-03-09 04:13:40 -0400"

Last Pass
---------
Likely never

Timestamp/Logs
--------------
See above

Test Activity
-------------
Developer Testing

Workaround
----------
A custom kata kernel and rootfs must be built to include the appropriate driver support

Ref: https://github.com/kata-containers/kata-containers

Revision history for this message
Steven Webster (swebster-wr) wrote :
Download full text (3.9 KiB)

I have done some investigative work to determine what would need to be done to improve our support for
SR-IOV and Kata containers:

1. It should be documented that for Kata containers, the SR-IOV device must be bound to the VFIO driver
   in the host. Currently Kata can only pass through an SR-IOV device using a vfio driver.

   For example:

   system host-if-modify <worker> <sriov_interface> --vf-driver=vfio

2. We would need a method to bind the driver appropriately in the Kata VM itself so that it shows up in the
   container as a kernel network device (netdevice), or is bound again to vfio in the VM.

   I think one method of allowing this would be to include the standard network drivers and vfio as kernel modules,
   and have the user be able to decide which driver is used based on the kata kernel_modules pod annotation:

   io.katacontainers.config.agent.kernel_modules: "vfio; vfio-pci"

   To support this, the following modifications would be needed:

   2.1 Specify the kata containers pod annotation prefix in the containerd config.toml file:

   /etc/containerd/config.toml

   In the 'plugins.cri.containerd.runtimes.kata' section:

   pod_annotations = ["io.katacontainers.*"]

   2.2 Specify appropriate kernel_params in the kata-containers configuration.toml:

   /usr/share/defaults/kata-containers/configuration.toml

   The kernel params need to be set with iommu and the vendor:device id of the supported network devices/

   For example:

   kernel_params = "iommu=pt intel_iommu=on vfio-pci.ids=8086:154c"

   Setting the vfio-pci.ids in this way means if "vfio; vfio-pci" is in the kernel_modules annotation,
   the devices will be bound to vfio automatically in the Kata VM.
       Note: This means it would be tricky/not possible to have a mixed vfio/netdevices in the container.

   Alternatively, it might be better to just document that the user can specify the kernel_params as
   a pod annotation similar to the kernel_modules

3. A custom Kata kernel will need to be built for StarlingX which includes the supported network/vfio
   drivers as kernel modules.

   Ref:

   https://github.com/kata-containers/osbuilder
   https://github.com/kata-containers/packaging/tree/master/kernel
   https://github.com/kata-containers/documentation/blob/master/use-cases/using-SRIOV-and-kata.md

   For example:

   CONFIG_IGB=m
   CONFIG_IGBVF=m
   CONFIG_IXGB=m
   CONFIG_IXGBE=m
   CONFIG_IXGBEVF=m
   CONFIG_I40E=m
   CONFIG_I40EVF=m
   +<Mellanox Drivers>

   CONFIG_VFIO_IOMMU_TYPE1=m
   CONFIG_VFIO_VIRQFD=m
   CONFIG_VFIO=m
   CONFIG_VFIO_NOIOMMU=y
   CONFIG_VFIO_PCI=m
   CONFIG_VFIO_PCI_MMAP=y
   CONFIG_VFIO_PCI_INTX=y
   CONFIG_VFIO_PCI_IGD=y
   CONFIG_VFIO_MDEV=m
   CONFIG_VFIO_MDEV_DEVICE=m

4. A custom Kata rootfs will need to be built for StarlingX which contains the kernel modules

  Ref:

  https://github.com/kata-containers/osbuilder

  My rootfs build example looks something like this:

  In the kernel build, I had to modify the build-kernel.sh script to add the following to the build_kernel()
  function:

  make -j $(nproc) ARCH="${arch_target}"
  +make ARCH="${arch_target}" mod...

Read more...

Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / high priority - serious limitation with kata containers.

Addressing this should be within the control of the StarlingX project by building a kata run-time/kernel that includes the needed drivers.

tags: added: stx.4.0 stx.containers
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Lin Shuicheng (shuicheng)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Assigning to the Kata Containers Feature Prime

Ghada Khalil (gkhalil)
tags: added: stx.networking
Revision history for this message
Frank Miller (sensfan22) wrote :

This support is not feasible in the stx.4.0 timeline. Moving this to stx.5.0.

tags: added: stx.5.0
removed: stx.4.0
Changed in starlingx:
assignee: Lin Shuicheng (shuicheng) → nobody
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Lowering the priority as nobody seems to be working on this. We will not hold up stx.5.0 for this issue.

tags: removed: stx.5.0
Changed in starlingx:
importance: High → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.