v5.10 kernel workqueue rescuer threads have unexpected CPU affinitie

Bug #1948639 reported by Jiping Ma
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Jiping Ma

Bug Description

Brief Description
-----------------
here are numerous kernel threads that have unexpected CPU affinities.

Quoting from Gerry's e-mail, which I trimmed a bit for length:

I did setup some application-isolated cpus in wc68 (with static policy) and noticed
that there’s a lot of kernel threads that are floating across all cpus based
on ps-sched.sh dump.

[sysadmin@controller-0 ~(keystone_admin)]$ grep 0xfffffffff ps-sched.dump.static_enabled_isocpu_2_18
   PID TID PPID S PO NICE RTPRIO PR AFFINITY P COMM COMMAND
     3 3 2 I TS -20 - 0 0xfffffffff 0 rcu_gp [rcu_gp]
     4 4 2 I TS -20 - 0 0xfffffffff 0 rcu_par_gp [rcu_par_gp]
     8 8 2 I TS -20 - 0 0xfffffffff 0 mm_percpu_wq [mm_percpu_wq]
   280 280 2 I TS -20 - 0 0xfffffffff 0 netns [netns]
   288 288 2 I TS -20 - 0 0xfffffffff 0 writeback [writeback]
   301 301 2 I TS -20 - 0 0xfffffffff 0 cryptd [cryptd]
   344 344 2 I TS -20 - 0 0xfffffffff 0 kintegrityd [kintegrityd]
   345 345 2 I TS -20 - 0 0xfffffffff 0 kblockd [kblockd]
   346 346 2 I TS -20 - 0 0xfffffffff 0 blkcg_punt_bio [blkcg_punt_bio]
 ...

controller-0:/home/sysadmin# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-5.10.30-200.4.tis.rt.el7.x86_64 root=UUID=d4bb07c6-7f4f-4ed5-a416-2ba16fe7e340 ro \
  module_blacklist=integrity,ima tboot=false crashkernel=512M console=ttyS0,115200n8 iommu=pt usbcore.autosuspend=-1 \
  selinux=0 enforcing=0 nmi_watchdog=0 softlockup_panic=0 softdog.soft_panic=1 intel_iommu=on biosdevname=0 \
  user_namespace.enable=1 skew_tick=1 nopti nospectre_v2 nospectre_v1 hugepagesz=2M hugepages=0 \
  default_hugepagesz=2M irqaffinity=3-17,19-35 isolcpus=2,18 rcu_nocbs=2-35 nohz_full=2-35 kthread_cpus=0-1 \
  audit=1 audit_backlog_limit=8192

In summary, the kernel workqueue rescuer threads have their CPU affinities set to "all possible CPUs", which may negatively impact cyclictest measurements.

Severity
--------
Major

Steps to Reproduce
-------------------

Expected Behavior
-----------------

Actual Behavior
---------------

Reproducibility
----------------
reproducible (happened 2/2)

System Configuration
--------------------

Branch/Pull Time/Commit
-----------------------
recent stx master load after the 5.10 kernel merged

Last Pass
----------

Timestamp/Logs
--------------

Test Activity

Workaround
----------
Unknown

Jiping Ma (jma11)
Changed in starlingx:
assignee: nobody → Jiping Ma (jma11)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kernel (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/kernel/+/815226

Changed in starlingx:
status: New → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

screening: stx.6.0 / high - issue related to the 5.10 kernel upversion

tags: added: stx.6.0 stx.distro.other
Changed in starlingx:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kernel (master)

Reviewed: https://review.opendev.org/c/starlingx/kernel/+/815226
Committed: https://opendev.org/starlingx/kernel/commit/cfe452afa565d00ddda817f8f1d0d8bb28da7773
Submitter: "Zuul (22348)"
Branch: master

commit cfe452afa565d00ddda817f8f1d0d8bb28da7773
Author: Jiping Ma <email address hidden>
Date: Mon Oct 25 03:25:01 2021 -0400

    workqueue: Affine rescuer threads and unbound wqs

    This commit ensures that workqueue rescuer threads are affined to the
    platform CPUs specified by the "kthread_cpus" kernel argument. Prior
    to this commit, rescuer threads could be bound to any CPU. Rescuer
    threads are described in "kernel/workqueue.c" as follows:

    "Regular work processing on a pool may block trying to create a new
    worker which uses GFP_KERNEL allocation which has slight chance of
    developing into deadlock if some works currently on the same queue
    need to be processed to satisfy the GFP_KERNEL allocation. This is
    the problem rescuer solves.

    When such condition is possible, the pool summons rescuers of all
    workqueues which have works queued on the pool and let them process
    those works so that forward progress can be guaranteed."

    This commit also affines unbound workqueues to the platform CPUs instead
    of the housekeeping CPUs, because the latter can be a superset of the
    former.

    Verification:
    Compared the affinity of workqueue thread between before and after the
    fix, the affinity was 0xff before this commit that mean the thread
    could be bound to cpu0-7, the affinity was 0x3 after the fix with
    "kthread_cpus=0, 1", which only could be bound to cpu0, 1. Also checked
    unbound workqueue such as "writeback" whose affinity also was 0x3 with
    "kthread_cpus=0, 1". We did not find the commit break anything else.

    Closes-Bug: #1948639

    Signed-off-by: M. Vefa Bicakci <email address hidden>
    Signed-off-by: Jiping Ma <email address hidden>
    Change-Id: I8afd56c8d0d0526d523accf3ea45ee02635b1602

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.