On boot excessive number of kworker threads are running

Bug #1649905 reported by Colin Ian King on 2016-12-14
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Status tracked in Zesty
Yakkety
Undecided
Colin Ian King
Zesty
Medium
Colin Ian King

Bug Description

[SRU REQUEST, Yakkety]

Ubuntu Yakkety 4.8 kernels have an excessive amount of kworker threads running, this is especially noticeable on boot, one can easily have > 1000 kworker threads on a 4 CPU box.

Bisected this down to:

commit 81ae6d03952c1bfb96e1a716809bd65e7cd14360
Author: Vladimir Davydov <email address hidden>
Date: Thu May 19 17:10:34 2016 -0700

    mm/slub.c: replace kick_all_cpus_sync() with synchronize_sched() in
kmem_cache_shrink()

[FIX]

The synchronize_sched calls seem to create all these excessive kworker threads. This is fixed with upstream commit:

commit 89e364db71fb5e7fc8d93228152abfa67daf35fa
Author: Vladimir Davydov <email address hidden>
Date: Mon Dec 12 16:41:32 2016 -0800

    slub: move synchronize_sched out of slab_mutex on shrink

    synchronize_sched() is a heavy operation and calling it per each cache
    owned by a memory cgroup being destroyed may take quite some time. What
    is worse, it's currently called under the slab_mutex, stalling all works
    doing cache creation/destruction.

    Actually, there isn't much point in calling synchronize_sched() for each
    cache - it's enough to call it just once - after setting cpu_partial for
    all caches and before shrinking them. This way, we can also move it out
    of the slab_mutex, which we have to hold for iterating over the slab
    cache list.

[TEST CASE]

Without the fix, boot a Yakkety and count the number of kthreads:

ps -ef | grep kworker | wc -l
1034

With the fix, boot count the number kthreads and it will be dramatically less:

ps -ef | grep kworker | wc -l
32

Since this touches the slub allocator and cgroups too, I have regression tested this against the kernel-team autotest regression tests to sanity check this fix. All seems OK.

Note: this only affects kernels from 4.7-rc1 through to 4.8

CVE References

Changed in linux (Ubuntu):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
Tim Gardner (timg-tpi) on 2016-12-14
Changed in linux (Ubuntu Yakkety):
assignee: nobody → Colin Ian King (colin-king)
status: New → In Progress
Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Released
Doug Smythies (dsmythies) wrote :

I didn't see the third commit mentioned which I thought was also required:

mm: memcontrol: use special workqueue for creating per-memcg caches

Anyway, I'll test also, once kernel 4.10-rc1 is ready.

The upstream bug report is this one:
https://bugzilla.kernel.org/show_bug.cgi?id=172991

Luis Henriques (henrix) on 2016-12-19
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'. If the problem still exists, change the tag 'verification-needed-yakkety' to 'verification-failed-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-yakkety
Doug Smythies (dsmythies) wrote :

I tested kernel 4.10-rc1, and the issue is solved with it.

I'm not sure how to test the yakkety proposed kernel, because the instructions seems to be for a desktop computer, and this issue shows up much much more clearly on my server computers.

Colin Ian King (colin-king) wrote :

I've tested 4.8.0-34 #36 from yakkety -proposed and the issue is solved. Marking this as verified for yakkety.

tags: added: verification-done-yakkety
removed: verification-needed-yakkety
Launchpad Janitor (janitor) wrote :
Download full text (3.2 KiB)

This bug was fixed in the package linux - 4.8.0-34.36

---------------
linux (4.8.0-34.36) yakkety; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1651800

  * Miscellaneous Ubuntu changes
    - SAUCE: Do not build the xr-usb-serial driver for s390

linux (4.8.0-33.35) yakkety; urgency=low

  [ Thadeu Lima de Souza Cascardo ]

  * Release Tracking Bug
    - LP: #1651721

  [ Luis Henriques ]

  * crypto : tolerate new crypto hardware for z Systems (LP: #1644557)
    - s390/zcrypt: Introduce CEX6 toleration

  * Several new Asus laptops are missing touchpad support (LP: #1650895)
    - HID: asus: Add i2c touchpad support

  * Acer, Inc ID 5986:055a is useless after 14.04.2 installed. (LP: #1433906)
    - uvcvideo: uvc_scan_fallback() for webcams with broken chain

  * cdc_ether fills kernel log (LP: #1626371)
    - cdc_ether: Fix handling connection notification

  * Kernel Fixes to get TCMU File Backed Optical to work (LP: #1646204)
    - SAUCE: target/user: Fix use-after-free of tcmu_cmds if they are expired

  * CVE-2016-9756
    - KVM: x86: drop error recovery in em_jmp_far and em_ret_far

  * On boot excessive number of kworker threads are running (LP: #1649905)
    - slub: move synchronize_sched out of slab_mutex on shrink

  * Ethernet not work after upgrade from kernel 3.19 to 4.4 [10ec:8168]
    (LP: #1648279)
    - ACPI / blacklist: Make Dell Latitude 3350 ethernet work

  * Ubuntu 16.10 netboot install fails with "Oops: Exception in kernel mode,
    sig: 5 [#1] " (lpfc) (LP: #1648873)
    - scsi: lpfc: fix oops/BUG in lpfc_sli_ringtxcmpl_put()

  * CVE-2016-9793
    - net: avoid signed overflows for SO_{SND|RCV}BUFFORCE

  * [Hyper-V] Kernel panic not functional on 32bit Ubuntu 14.10, 15.04, and
    15.10 (LP: #1400319)
    - Drivers: hv: avoid vfree() on crash

  * d-i is missing usb support for platforms that use the xhci-platform driver
    (LP: #1625222)
    - d-i initrd needs additional usb modules to support the merlin platform

  * overlayfs no longer supports nested overlayfs mounts, but there is a fix
    upstream (LP: #1647007)
    - ovl: fix d_real() for stacked fs

  * Yakkety: arm64: CONFIG_ARM64_ERRATUM_845719 isn't enabled (LP: #1647793)
    - [Config] CONFIG_ARM64_ERRATUM_845719=y

  * Ubuntu16.10 - EEH on BELL3 adapter fails to recover (serial/tty)
    (LP: #1646857)
    - serial: 8250_pci: Detach low-level driver during PCI error recovery

  * Driver for Exar USB UART (LP: #1645591)
    - SAUCE: xr-usb-serial: Driver for Exar USB serial ports
    - SAUCE: xr-usb-serial: interface for switching modes
    - SAUCE: cdc-acm: Exclude Exar USB serial ports

  * [Bug] (Purley) x86/hpet: Reduce HPET counter read contention (LP: #1645928)
    - x86/hpet: Reduce HPET counter read contention

  * Need Alps upstream their new touchpad driver (LP: #1571530)
    - Input: ALPS - add touchstick support for SS5 hardware
    - Input: ALPS - handle 0-pressure 1F events
    - Input: ALPS - allow touchsticks to report pressure
    - Input: ALPS - set DualPoint flag for 74 03 28 devices

  * CONFIG_NR_CPUS=256 is too low (LP: #1579205)
    - [Config] Increase the NR_CPUS to 512 for amd64 to support systems with a...

Read more...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Peter Sabaini (peter-sabaini) wrote :

Is this being verified for Xenial as well?

Colin Ian King (colin-king) wrote :

@Peter, is this a bug in Xenial?

Michael Barnett (mbarnett) wrote :

@Colin, We were seeing it in a Xenial deployment. We have tested in our staging environment and have not seen the issue since updating updating to the new kernel, so it is looking good so far.

Doug Smythies (dsmythies) wrote :

I would not expect to see this issue in 16.04, and don't on my test 16.04 server, with the default, up to date, kernel. (Linux s15 4.4.0-59-generic #80-Ubuntu SMP Fri Jan 6 17:47:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux).

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.