Nvme interrupts not affined to platform cores on aio-dx

Bug #1843942 reported by Brent Rowsell
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Jim Gauld

Bug Description

Brief Description
-----------------
Disk interrupts should be affined to the platform cores. On AIO-DX I am seeing interrupts affined to the application cores. Platform cores in this case are 0 and 1.

 irq/279-nvme1q2 5686
 irq/307-nvme2q2 5699
 irq/335-nvme0q2 6178

taskset -apc 5686
pid 5686's current affinity list: 27
controller-0:~$ taskset -apc 5699
pid 5699's current affinity list: 27
controller-0:~$ taskset -apc 6178
pid 6178's current affinity list: 27

They can be manually re-affined

sudo taskset -pc 0 5686
Password:
pid 5686's current affinity list: 27
pid 5686's new affinity list: 0

Severity
--------
Major, will cause application jitter

Steps to Reproduce
------------------
Unlock node

Expected Behavior
------------------
Interrupts are affined to plaform cores

Actual Behavior
----------------
Interrupts affined to an application cores

Reproducibility
---------------
TBD

System Configuration
--------------------
AIO-DX

Branch/Pull Time/Commit
-----------------------
BUILD_DATE="2019-09-11 14:52:32 -0400"

Last Pass
---------
Not sure

Timestamp/Logs
--------------

Test Activity
-------------
Developer Testing

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking stx.3.0 / medium priority - issue impacts system performance

tags: added: stx.3.0 stx.config
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Jim Gauld (jgauld)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to utilities (master)

Fix proposed to branch: master
Review: https://review.opendev.org/685976

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to utilities (master)

Reviewed: https://review.opendev.org/685976
Committed: https://git.openstack.org/cgit/starlingx/utilities/commit/?id=45a149497ee8d05b571328214c2c819c63ea9ea9
Submitter: Zuul
Branch: master

commit 45a149497ee8d05b571328214c2c819c63ea9ea9
Author: Jim Gauld <email address hidden>
Date: Tue Oct 1 09:39:57 2019 -0400

    Affine NVMe disk interrupts to platform cores

    This affines NVMe disk interrupts to platform cores for low-latency
    worker configurations to reduce application jitter. Tasks with names
    matching irq/<pid>-nvme* get re-affined.

    This also corrects writing of related affinity settings for:
    - /proc/irq/default_smp_affinity
    - /sys/devices/virtual/workqueue/cpumask
    - /sys/bus/workqueue/devices/writeback/cpumask

    The platform coremasks require being delimited with commas every 8 hex
    characters. HT-enabled hosts tend to have very large affinity masks,
    e.g., host with 88 CPUs, the coremask for the cpulist 0-1,44-45 is:
    00000000,00000000,00000000,00003000,00000000,00000003

    Just writing 30000000000000000003 will produce the following error:
    echo: write error: Value too large for defined data type

    Change-Id: Ibad5beffc791b796e472ff19950cf6a3e973be86
    Closes-bug: 1843942
    Signed-off-by: Jim Gauld <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.