On boot excessive number of kworker threads are running

Bug #1649905 reported by Colin Ian King
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Colin Ian King
Yakkety
Fix Released
Undecided
Colin Ian King
Zesty
Fix Released
Medium
Colin Ian King

Bug Description

[SRU REQUEST, Yakkety]

Ubuntu Yakkety 4.8 kernels have an excessive amount of kworker threads running, this is especially noticeable on boot, one can easily have > 1000 kworker threads on a 4 CPU box.

Bisected this down to:

commit 81ae6d03952c1bfb96e1a716809bd65e7cd14360
Author: Vladimir Davydov <email address hidden>
Date: Thu May 19 17:10:34 2016 -0700

    mm/slub.c: replace kick_all_cpus_sync() with synchronize_sched() in
kmem_cache_shrink()

[FIX]

The synchronize_sched calls seem to create all these excessive kworker threads. This is fixed with upstream commit:

commit 89e364db71fb5e7fc8d93228152abfa67daf35fa
Author: Vladimir Davydov <email address hidden>
Date: Mon Dec 12 16:41:32 2016 -0800

    slub: move synchronize_sched out of slab_mutex on shrink

    synchronize_sched() is a heavy operation and calling it per each cache
    owned by a memory cgroup being destroyed may take quite some time. What
    is worse, it's currently called under the slab_mutex, stalling all works
    doing cache creation/destruction.

    Actually, there isn't much point in calling synchronize_sched() for each
    cache - it's enough to call it just once - after setting cpu_partial for
    all caches and before shrinking them. This way, we can also move it out
    of the slab_mutex, which we have to hold for iterating over the slab
    cache list.

[TEST CASE]

Without the fix, boot a Yakkety and count the number of kthreads:

ps -ef | grep kworker | wc -l
1034

With the fix, boot count the number kthreads and it will be dramatically less:

ps -ef | grep kworker | wc -l
32

Since this touches the slub allocator and cgroups too, I have regression tested this against the kernel-team autotest regression tests to sanity check this fix. All seems OK.

Note: this only affects kernels from 4.7-rc1 through to 4.8

CVE References

Changed in linux (Ubuntu):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Yakkety):
assignee: nobody → Colin Ian King (colin-king)
status: New → In Progress
Changed in linux (Ubuntu Zesty):
status: In Progress → Fix Released
Revision history for this message
Doug Smythies (dsmythies) wrote :

I didn't see the third commit mentioned which I thought was also required:

mm: memcontrol: use special workqueue for creating per-memcg caches

Anyway, I'll test also, once kernel 4.10-rc1 is ready.

The upstream bug report is this one:
https://bugzilla.kernel.org/show_bug.cgi?id=172991

Luis Henriques (henrix)
Changed in linux (Ubuntu Yakkety):
status: In Progress → Fix Committed
Revision history for this message
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-yakkety' to 'verification-done-yakkety'. If the problem still exists, change the tag 'verification-needed-yakkety' to 'verification-failed-yakkety'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-yakkety
Revision history for this message
Doug Smythies (dsmythies) wrote :

I tested kernel 4.10-rc1, and the issue is solved with it.

I'm not sure how to test the yakkety proposed kernel, because the instructions seems to be for a desktop computer, and this issue shows up much much more clearly on my server computers.

Revision history for this message
Colin Ian King (colin-king) wrote :

I've tested 4.8.0-34 #36 from yakkety -proposed and the issue is solved. Marking this as verified for yakkety.

tags: added: verification-done-yakkety
removed: verification-needed-yakkety
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.2 KiB)

This bug was fixed in the package linux - 4.8.0-34.36

---------------
linux (4.8.0-34.36) yakkety; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1651800

  * Miscellaneous Ubuntu changes
    - SAUCE: Do not build the xr-usb-serial driver for s390

linux (4.8.0-33.35) yakkety; urgency=low

  [ Thadeu Lima de Souza Cascardo ]

  * Release Tracking Bug
    - LP: #1651721

  [ Luis Henriques ]

  * crypto : tolerate new crypto hardware for z Systems (LP: #1644557)
    - s390/zcrypt: Introduce CEX6 toleration

  * Several new Asus laptops are missing touchpad support (LP: #1650895)
    - HID: asus: Add i2c touchpad support

  * Acer, Inc ID 5986:055a is useless after 14.04.2 installed. (LP: #1433906)
    - uvcvideo: uvc_scan_fallback() for webcams with broken chain

  * cdc_ether fills kernel log (LP: #1626371)
    - cdc_ether: Fix handling connection notification

  * Kernel Fixes to get TCMU File Backed Optical to work (LP: #1646204)
    - SAUCE: target/user: Fix use-after-free of tcmu_cmds if they are expired

  * CVE-2016-9756
    - KVM: x86: drop error recovery in em_jmp_far and em_ret_far

  * On boot excessive number of kworker threads are running (LP: #1649905)
    - slub: move synchronize_sched out of slab_mutex on shrink

  * Ethernet not work after upgrade from kernel 3.19 to 4.4 [10ec:8168]
    (LP: #1648279)
    - ACPI / blacklist: Make Dell Latitude 3350 ethernet work

  * Ubuntu 16.10 netboot install fails with "Oops: Exception in kernel mode,
    sig: 5 [#1] " (lpfc) (LP: #1648873)
    - scsi: lpfc: fix oops/BUG in lpfc_sli_ringtxcmpl_put()

  * CVE-2016-9793
    - net: avoid signed overflows for SO_{SND|RCV}BUFFORCE

  * [Hyper-V] Kernel panic not functional on 32bit Ubuntu 14.10, 15.04, and
    15.10 (LP: #1400319)
    - Drivers: hv: avoid vfree() on crash

  * d-i is missing usb support for platforms that use the xhci-platform driver
    (LP: #1625222)
    - d-i initrd needs additional usb modules to support the merlin platform

  * overlayfs no longer supports nested overlayfs mounts, but there is a fix
    upstream (LP: #1647007)
    - ovl: fix d_real() for stacked fs

  * Yakkety: arm64: CONFIG_ARM64_ERRATUM_845719 isn't enabled (LP: #1647793)
    - [Config] CONFIG_ARM64_ERRATUM_845719=y

  * Ubuntu16.10 - EEH on BELL3 adapter fails to recover (serial/tty)
    (LP: #1646857)
    - serial: 8250_pci: Detach low-level driver during PCI error recovery

  * Driver for Exar USB UART (LP: #1645591)
    - SAUCE: xr-usb-serial: Driver for Exar USB serial ports
    - SAUCE: xr-usb-serial: interface for switching modes
    - SAUCE: cdc-acm: Exclude Exar USB serial ports

  * [Bug] (Purley) x86/hpet: Reduce HPET counter read contention (LP: #1645928)
    - x86/hpet: Reduce HPET counter read contention

  * Need Alps upstream their new touchpad driver (LP: #1571530)
    - Input: ALPS - add touchstick support for SS5 hardware
    - Input: ALPS - handle 0-pressure 1F events
    - Input: ALPS - allow touchsticks to report pressure
    - Input: ALPS - set DualPoint flag for 74 03 28 devices

  * CONFIG_NR_CPUS=256 is too low (LP: #1579205)
    - [Config] Increase the NR_CPUS to 512 for amd64 to support systems with a...

Read more...

Changed in linux (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

Is this being verified for Xenial as well?

Revision history for this message
Colin Ian King (colin-king) wrote :

@Peter, is this a bug in Xenial?

Revision history for this message
Michael Barnett (mbarnett) wrote :

@Colin, We were seeing it in a Xenial deployment. We have tested in our staging environment and have not seen the issue since updating updating to the new kernel, so it is looking good so far.

Revision history for this message
Doug Smythies (dsmythies) wrote :

I would not expect to see this issue in 16.04, and don't on my test 16.04 server, with the default, up to date, kernel. (Linux s15 4.4.0-59-generic #80-Ubuntu SMP Fri Jan 6 17:47:47 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux).

Revision history for this message
Eric Hidle (eric-hidle) wrote :

I am seeing this in Xenial running:

Linux beethoven 4.4.0-81-lowlatency #104-Ubuntu SMP PREEMPT Wed Jun 14 09:42:54 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

I have probably 50 kworker processes, including one that is pegging a core to 100%. Proc is AMD 8320.

Not sure this is helpful but thought I would offer..

Revision history for this message
Doug Smythies (dsmythies) wrote :

@Eric (although you do not seem to have subscribed to this bug report). Your issue is different than what this bug report is/was covering.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.