AIO: Some platform processes affined to the wrong cores
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Jim Gauld |
Bug Description
Brief Description
-----------------
On AIO low-latency system, noticed a few tasks with floating affinity masks.
This does not align with the process engineering to isolate platform, from applications.
Inspecting output from: ps-sched.sh, see the following:
- AIO systems with > 64 cpus, drbd_w_* affined to specific cores including isolcpus and application cores
- kswapd<x> process floating across entire numa nodes
Root cause is known.
Severity
--------
Major: For low-latency systems, these tasks can wake up and cause degraded performance.
Steps to Reproduce
------------------
AIO, just install system.
Inspect output of: ps-sched.sh (also included in 'collect'.)
Expected Behavior
------------------
All platform tasks should have platform affinity mask.
Actual Behavior
----------------
In the case when the DRBD tasks don't get correct affinity, we see kernel log like this:
2020-10-
Taking ps-sched.sh, see the drbd_w_* tasks don't have 0x300000003 :
controller-0:~$ ps-sched.sh | grep drbd
84375 84375 2 S TS -20 - 0 0x300000003 32 drbd-reissue [drbd-reissue]
90678 90678 2 S TS -20 - 0 0x300000003 33 drbd8_submit [drbd8_submit]
90682 90682 2 S TS -20 - 0 0x300000003 33 drbd7_submit [drbd7_submit]
90687 90687 2 S TS -20 - 0 0x300000003 1 drbd5_submit [drbd5_submit]
90692 90692 2 S TS -20 - 0 0x300000003 1 drbd0_submit [drbd0_submit]
90697 90697 2 S TS -20 - 0 0x300000003 1 drbd2_submit [drbd2_submit]
90702 90702 2 S TS -20 - 0 0x300000003 0 drbd1_submit [drbd1_submit]
90709 90709 2 S TS 0 - 20 0x1 0 drbd_w_drbd-doc [drbd_w_drbd-doc]
90715 90715 2 S TS 0 - 20 0x2 1 drbd_w_drbd-etc [drbd_w_drbd-etc]
90725 90725 2 R TS 0 - 20 0x4 2 drbd_w_drbd-ext [drbd_w_drbd-ext]
90731 90731 2 R TS 0 - 20 0x8 3 drbd_w_drbd-pgs [drbd_w_drbd-pgs]
90737 90737 2 R TS 0 - 20 0x10 4 drbd_w_drbd-pla [drbd_w_drbd-pla]
90746 90746 2 S TS 0 - 20 0x20 5 drbd_w_drbd-rab [drbd_w_drbd-rab]
90749 90749 2 S TS 0 - 20 0x300000003 33 drbd_r_drbd-doc [drbd_r_drbd-doc]
90751 90751 2 S TS 0 - 20 0x300000003 33 drbd_r_drbd-etc [drbd_r_drbd-etc]
90754 90754 2 S TS 0 - 20 0x300000003 1 drbd_r_drbd-ext [drbd_r_drbd-ext]
90756 90756 2 S TS 0 - 20 0x300000003 33 drbd_r_drbd-pgs [drbd_r_drbd-pgs]
90758 90758 2 S TS 0 - 20 0x300000003 33 drbd_r_drbd-pla [drbd_r_drbd-pla]
90760 90760 2 S TS 0 - 20 0x300000003 1 drbd_r_drbd-rab [drbd_r_drbd-rab]
93637 93637 2 S TS 0 - 20 0x300000003 32 jbd2/drbd8-8 [jbd2/drbd8-8]
93954 93954 2 S TS 0 - 20 0x300000003 1 jbd2/drbd5-8 [jbd2/drbd5-8]
93961 93961 2 S TS 0 - 20 0x300000003 32 jbd2/drbd7-8 [jbd2/drbd7-8]
93984 93984 2 S TS 0 - 20 0x300000003 33 jbd2/drbd2-8 [jbd2/drbd2-8]
94214 94214 2 S TS 0 - 20 0x300000003 1 jbd2/drbd1-8 [jbd2/drbd1-8]
94245 94245 2 S TS 0 - 20 0x300000003 1 jbd2/drbd0-8 [jbd2/drbd0-8]
On a different lab, the kswapd* has per numa affinity mask:
controller-0:~$ ps-sched.sh |grep kswapd
468 468 2 S TS 0 - 20 0x3fffff 1 kswapd0 [kswapd0]
469 469 2 S TS 0 - 20 0xfffffc00000 22 kswapd1 [kswapd1]
Reproducibility
---------------
100 percent reproducible
System Configuration
-------
DRBD: AIO low-latency with >= 64 cpus
kswapd: AIO low-latency
Branch/Pull Time/Commit
-------
- current load
Last Pass
---------
- day one issue
Timestamp/Logs
--------------
- na
Test Activity
-------------
Evaluation.
Workaround
----------
Manually use taskset to change affinity of tasks to match platform cores, but this does not survive reboot.
Changed in starlingx: | |
assignee: | nobody → Jim Gauld (jgauld) |
stx.5.0 / medium priority - issue results in unpredictable performance, but doesn't cause functional issues.
If someone faces a serious issue in a previous release, this can be considered to port back then.