IRQ affinities of newly set up network interfaces do not align to irqaffinity=

Bug #1958417 reported by M. Vefa Bicakci
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
M. Vefa Bicakci

Bug Description

Brief Description
-----------------

When a user sets up new network interfaces, the IRQ affinities of the new interfaces do not adhere to the global IRQ affinity setting specified via the irqaffinity= kernel command line option. This is most visible with virtual function (VF) interfaces, but applies to all interfaces that use a specific kernel API function (irq_set_affinity_hint).

According to user reports, this issue can result in watchdog-triggered system reboots due to platform CPU over-utilization, caused by interrupts being (at least part of the time) on platform CPUs

Severity
--------

Major: System may reboot due to heavy traffic.

Steps to Reproduce
------------------
I unfortunately do not have the exact steps to reproduce the watchdog-triggered reboots, but the following commands ought to demonstrate the issue. A colleague provided these commands. They are intended to reproduce the issue using a dual-CPU system where each CPU has 14 cores (with hyper-threading disabled). The intent is to force the irqaffinity= argument to be set to 26-27, and inspect the IRQ affinities of the VF interfaces to notice that they do not adhere to the irqaffinity= argument.

Update: This procedure was verified as of Friday, 2022-01-21.

```
NODE=controller-0

system host-lock ${NODE}

system host-label-assign ${NODE} sriovdp=enabled
system host-if-modify -m 1500 -n sriov1 -c pci-sriov -N 7 --vf-driver=netdevice ${NODE} data0
system host-if-add -c pci-sriov ${NODE} sriov2 vf sriov1 -N 4 --vf-driver=vfio

system datanetwork-add sriovnet1 vlan
system datanetwork-add sriovnet2 vlan

system interface-datanetwork-assign ${NODE} sriov1 sriovnet1
system interface-datanetwork-assign ${NODE} sriov2 sriovnet2

system host-label-assign ${NODE} kube-cpu-mgr-policy=static
system host-label-assign ${NODE} kube-topology-mgr-policy=restricted

system host-cpu-modify -f application-isolated -p0 12 ${NODE}
system host-cpu-modify -f application-isolated -p1 12 ${NODE}

system host-unlock ${NODE}
```

Wait for the system to reboot, then bring up the VF interfaces using "ip link set IFACE up" where IFACE is an interface name.

Then, inspect the IRQ thread affinities using 'ps-sched.sh | grep "irq/"' and the IRQ affinities using /proc/irq/IRQ_NUM/smp_affinity_list and /proc/irq/IRQ_NUM/effective_affinity_list

The same exercise could be carried out with non-VF interfaces as well, by modifying the /etc/rc.d/init.d/affine-platform.sh and /usr/bin/affine-interrupts.sh initialization scripts so that they do not manipulate IRQ affinities and then rebooting the system.

Expected Behavior
------------------
Network interfaces set up after the system has fully initialized should have IRQ affinities that adhere to the irqaffinity= command line argument's value.

Actual Behavior
----------------
IRQ affinities are set up according to the device driver's preference.

In the example above, the iavf device driver affines its interrupts to CPUs 0, 1, 2 and 3, instead of CPUs 26 and 27 as indicated by the irqaffinity= kernel command line argument. (As a side note, CPUs 2 and 3 were isolated CPUs in the example above.)

Reproducibility
---------------
Reproducible.

System Configuration
--------------------
This issue is reproducible with systems running a version of StarlingX with the v5.10 kernel.

It was reproduced with the compute-0 node of a two-controller-node and three-compute-node system, as an example.

One of my colleagues reproduced this issue with an All-in-One (aio) simplex system as well.

Branch/Pull Time/Commit
-----------------------
Recent StarlingX releases with the v5.10 kernel should reproduce this issue.

Last Pass
---------
StarlingX releases with the v3.10-based kernel do not reproduce this issue.

Timestamp/Logs
--------------

None available at this point.

Test Activity
-------------

Performance testing.

Workaround
----------

Users could modify the IRQ affinities manually by writing to /proc/irq/IRQ_NUM/smp_affinity_list, but this is not a scalable solution.

Changed in starlingx:
assignee: nobody → M. Vefa Bicakci (vbicakci)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kernel (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/kernel/+/825343

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/kernel/+/825344

Ghada Khalil (gkhalil)
tags: added: stx.7.0 stx.distro.other
Changed in starlingx:
importance: Undecided → Medium
description: updated
summary: - IRQ affinities of newly set up VF interfaces do not align to
+ IRQ affinities of newly set up network interfaces do not align to
irqaffinity=
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kernel (master)
Download full text (3.8 KiB)

Reviewed: https://review.opendev.org/c/starlingx/kernel/+/825343
Committed: https://opendev.org/starlingx/kernel/commit/19ca0df55a7c905dc062008862b7b76b577a2354
Submitter: "Zuul (22348)"
Branch: master

commit 19ca0df55a7c905dc062008862b7b76b577a2354
Author: M. Vefa Bicakci <email address hidden>
Date: Thu Jan 13 13:18:03 2022 -0500

    kernel: Backport IRQ affinity patches

    This commit backports IRQ affinity commits from the mainline kernel tree
    to the StarlingX kernel. (Links to the patch series can be found at the
    end of this commit message.) The intent is to be able to let certain
    device drivers (such as i40e, iavf, ice, ixgbe and mlx5) use the global
    IRQ affinity setting specified by 'irqaffinity=' on the kernel command
    line.

    In summary, a number of device drivers use the irq_set_affinity_hint
    function to provide a hint to the userspace about the ideal affinity to
    use with an IRQ. However, this function also sets the IRQ affinity,
    which makes the IRQ use the hinted affinity rather than the global IRQ
    affinity provided by the irqaffinity= argument. This is problematic for
    StarlingX, as interrupts are, in general, expected to be affined
    according to irqaffinity=, which is adjusted by Puppet.

    The patch series deprecates the kernel function irq_set_affinity_hint,
    and provides two replacements: (1) irq_set_affinity_and_hint and (2)
    irq_update_affinity_hint. Replacement function (1) sets both the
    affinity hint and the actual affinity, whereas (2) only sets the
    affinity hint. The original function -- irq_set_affinity_hint -- remains
    as a wrapper around (1), likely for backwards compatibility.

    The remaining patches in the series modify a number of device drivers to
    use the replacement functions. Of these patches, only the patch for the
    ixgbe driver is kept, for two reasons:
    - This commit aims to fix the IRQ affinities for network device drivers
      only.
    - StarlingX has out-of-tree modules for most of the remaining network
      device drivers, and these will be modified with a separate commit.

    Note that the first patch in this commit is included as a dependency for
    the others: genirq-Export-affinity-setter-for-modules.patch

    Finally, we should note that older versions of StarlingX that used
    CentOS 7's v3.10-based kernel did not have this issue, because of a
    CentOS 7-specific patch in that kernel that allowed the irqaffinity=
    kernel command line argument to take precedence over the device
    driver-provided IRQ affinity hints.

    Testing:
    - An ISO image was successfully built using a monolithic build
      procedure.

    - The ISO image was installed and bootstrapped successfully with an
      All-in-One simplex system (physical server) in low-latency mode. This
      server has a management/OAM Ethernet controller managed by the ixgbe
      driver, whose operation was observed to not have been negatively
      affected.

    - On another physical All-in-One simplex system which had one
      non-management/non-OAM Ethernet controller handled by the ixgbe
      driv...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :
Download full text (5.5 KiB)

Reviewed: https://review.opendev.org/c/starlingx/kernel/+/825344
Committed: https://opendev.org/starlingx/kernel/commit/7ded00431675bbab05fe254b90efd08eb335f101
Submitter: "Zuul (22348)"
Branch: master

commit 7ded00431675bbab05fe254b90efd08eb335f101
Author: M. Vefa Bicakci <email address hidden>
Date: Fri Jan 14 17:10:31 2022 -0500

    kernel-modules: IRQ affinity hint fix-ups

    This commit modifies a number of out-of-tree kernel modules to ensure
    that the irqaffinity= kernel command line option is honored by the
    interrupts set up and serviced by the device drivers. For further
    information about the rationale for the changes, please see the the
    following change: Ibf47fd301a460638f3bb4c49865adc3b2429e06d

    Here is a summary of the changes made by this commit:

    - i40e: Replicate mainline commit d34c54d1739c ("i40e: Use
      irq_update_affinity_hint()").

    - iavf: Replicate mainline commit 0f9744f4ed53 ("iavf: Use
      irq_update_affinity_hint()").

    - ice: The device driver is made to use the irq_update_affinity_hint
      function instead of the irq_set_affinity_hint function. Please note
      that this driver was not modified in the mainline kernel, and hence
      this modification is a StarlingX-specific change.

    - mlx5: Diverge from mainline commit 7451e9ea8e20 ("net/mlx5: Use
      irq_set_affinity_and_hint()") by using irq_update_affinity_hint
      instead of irq_set_affinity_and_hint, so that StarlingX users can rely
      on the irqaffinity= kernel command line argument to set the affinities
      of the interrupts serviced by mlx5.

      Please note that, due to the way the Mellanox module build works,
      there is a need to have a patch that adds a patch for the build system
      to apply; otherwise, there are patch application failures at build
      time.

    The reasons for not modifying the remaining modules are as follows:

    - igb_uio: This driver does not use the deprecated API function.

    - intel-opae-fpga: This driver does not use the deprecated API function.

    - qat17: A patch to make use of the irq_update_affinity_hint function
      was prepared, but after some consideration, it was decided to not
      publish the patch due to concerns about unintended side-effects.

    Testing:
    - An ISO image was built successfully with this patch, via a monolithic
      build.

    - The built ISO image was successfully installed onto and bootstrapped
      on an All-in-One simplex (physical) server with network interfaces
      handled by the i40e and ice (as well as ixgbe) device drivers. The
      low-latency profile was used during the tests.

      The following test steps were carried out with and without this patch
      on the aforementioned All-in-One simplex server:
      - After bootstrap, /etc/rc.d/init.d/affine-platform.sh and
        /usr/bin/affine-interrupts.sh were modified to not manipulate IRQ
        affinities to be able to clearly observe the effect of the changes
        in this patch.
      - The system configuration was changed so that all CPUs other than
        platform CPUs and two ...

Read more...

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.