kworker process starts at 85% of cpu and stays there

Bug #1799235 reported by Peter Brandon on 2018-10-22
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Dell Sputnik
Undecided
Unassigned
linux (Ubuntu)
Medium
Unassigned

Bug Description

Ubuntu starts, w/ /sbin/init, a kworker process that never stops running below 85%:

pb@pb-tower:~/Downloads$ ps -elf
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
4 S root 1 0 0 80 0 - 56471 - 18:30 ? 00:00:01 /sbin/init splash
1 S root 2 0 0 80 0 - 0 - 18:30 ? 00:00:00 [kthreadd]

1 R root 85 2 63 80 0 - 0 - 18:30 ? 00:03:03 [kworker/0:1+kac]

After a few hours, grep . -r /sys/firmware/acpi/interrupts/ shows tens of millions of interrupts.

The following command line stops the kworker process:
echo "disable" > /sys/firmware/acpi/interrupts/gpe6F

On the other hand, I haven't been able to properly shut down my system after this command (power never goes off except manually).

I believe the bug is related to the one discussed here: https://superuser.com/questions/1117992/acpi-exception-ae-not-found-while-evaluating-gpe-method-floods-syslog

The dmesg w/ this report may be somewhat confusing because there is an unsigned module. But w/ the module removed, dmesg gives the following:

[ 0.097480] ACPI: Dynamic OEM Table Load:
[ 0.097480] ACPI: SSDT 0xFFFFA07F57173000 000317 (v02 PmRef ApHwp 00003000 INTL 20160527)
[ 0.097480] ACPI: Executed 1 blocks of module-level executable AML code
[ 0.100180] ACPI: Dynamic OEM Table Load:
[ 0.100186] ACPI: SSDT 0xFFFFA07F577EF400 00030A (v02 PmRef ApCst 00003000 INTL 20160527)
[ 0.100473] ACPI: Executed 1 blocks of module-level executable AML code
[ 0.102187] ACPI: Interpreter enabled
[ 0.102222] ACPI: (supports S0 S3 S4 S5)
[ 0.102223] ACPI: Using IOAPIC for interrupt routing
[ 0.102275] HEST: Table parsing has been initialized.
[ 0.102277] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[ 0.103580] ACPI: GPE 0x20 active on init
[ 0.104031] ACPI Error: No handler for Region [WST1] ( (ptrval)) [GenericSerialBus] (20170831/evregion-166)
[ 0.104035] ACPI Error: Region GenericSerialBus (ID=9) has no handler (20170831/exfldio-299)
[ 0.104040]
               Initialized Local Variables for Method [PAS1]:
[ 0.104041] Local0: (ptrval) <Obj> Integer 0000000000000001
[ 0.104045] No Arguments are initialized for method [PAS1]
[ 0.104046] ACPI Error: Method parse/execution failed \_SB.PCI0.I2C0.PAS1, AE_NOT_EXIST (20170831/psparse-550)
[ 0.104052] ACPI Error: Method parse/execution failed \_GPE._L20, AE_NOT_EXIST (20170831/psparse-550)
[ 0.104058] ACPI Exception: AE_NOT_EXIST, while evaluating GPE method [_L20] (20170831/evgpe-646)
[ 0.104005] ACPI: GPE 0x6F active on init
[ 0.104005] ACPI: Enabled 10 GPEs in block 00 to 7F
[ 0.212126] ACPI: Power Resource [USBC] (on)

[ 0.444270] pci 0000:06:00.0: PME# supported from D0 D1 D2 D3hot
[ 0.444270] pci 0000:06:00.1: [1033:00e0] type 00 class 0x0c0320
[ 0.444270] pci 0000:06:00.1: reg 0x10: [mem 0xa2000000-0xa20000ff]
[ 0.444278] pci 0000:06:00.1: supports D1 D2
[ 0.444278] pci 0000:06:00.1: PME# supported from D0 D1 D2 D3hot
[ 0.444278] pci 0000:05:00.0: PCI bridge to [bus 06]
[ 0.444278] pci 0000:05:00.0: bridge window [mem 0xa2000000-0xa20fffff]
[ 0.456183] ACPI: PCI Interrupt Link [LNKA] (IRQs) *0
[ 0.456183] ACPI: PCI Interrupt Link [LNKB] (IRQs) *1
[ 0.456183] ACPI: PCI Interrupt Link [LNKC] (IRQs) *0
[ 0.456183] ACPI: PCI Interrupt Link [LNKD] (IRQs) *1
[ 0.456183] ACPI: PCI Interrupt Link [LNKE] (IRQs) *1
[ 0.456183] ACPI: PCI Interrupt Link [LNKF] (IRQs) *1
[ 0.456183] ACPI: PCI Interrupt Link [LNKG] (IRQs) *1
[ 0.456183] ACPI: PCI Interrupt Link [LNKH] (IRQs) *1
[ 0.464184] SCSI subsystem initialized
[ 0.464186] libata version 3.00 loaded.
[ 0.464186] pci 0000:00:02.0: vgaarb: setting as boot VGA device

[ 1.225495] Scanning for low memory corruption every 60 seconds
[ 1.226265] Initialise system trusted keyrings
[ 1.226271] Key type blacklist registered
[ 1.226399] workingset: timestamp_bits=36 max_order=23 bucket_order=0
[ 1.227127] zbud: loaded
[ 1.227482] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[ 1.227769] fuse init (API version 7.26)
[ 1.233972] Key type asymmetric registered
[ 1.233973] Asymmetric key parser 'x509' registered
[ 1.233991] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 246)
[ 1.234105] io scheduler noop registered
[ 1.234105] io scheduler deadline registered
[ 1.234131] io scheduler cfq registered (default)
[ 1.238792] pcieport 0000:00:1b.0: AER enabled with IRQ 122
[ 1.238819] pcieport 0000:00:1b.4: AER enabled with IRQ 123
[ 1.238845] pcieport 0000:00:1d.2: AER enabled with IRQ 124
[ 1.238863] dpc 0000:00:1b.0:pcie010: DPC error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+
[ 1.238874] dpc 0000:00:1b.4:pcie010: DPC error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+
[ 1.238886] dpc 0000:00:1d.2:pcie010: DPC error containment capabilities: Int Msg #0, RPExt+ PoisonedTLP+ SwTrigger+ RP PIO Log 4, DL_ActiveErr+
[ 1.238925] efifb: probing for efifb

[ 1.378346] ACPI: Thermal Zone [TZ00] (28 C)
[ 1.378390] ERST: Error Record Serialization Table (ERST) support is initialized.
[ 1.378391] pstore: using zlib compression
[ 1.378393] pstore: Registered erst as persistent store backend
[ 1.378456] GHES: APEI firmware first mode is enabled by APEI bit.
[ 1.378659] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[ 1.400977] 00:01: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[ 1.405952] Linux agpgart interface v0.103
[ 1.433815] tpm_tis MSFT0101:00: 2.0 TPM (device-id 0xFC, rev-id 1)
[ 1.444950] tpm tpm0: A TPM error (2314) occurred continue selftest
[ 1.798436] loop: module loaded

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.15.0-36-generic 4.15.0-36.39
ProcVersionSignature: Ubuntu 4.15.0-36.39-generic 4.15.18
Uname: Linux 4.15.0-36-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.4
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: pm 2353 F.... pulseaudio
 /dev/snd/controlC1: pm 2353 F.... pulseaudio
Date: Mon Oct 22 09:21:42 2018
HibernationDevice: RESUME=UUID=5636f256-e713-4bd0-a4f0-535a151ca3b1
InstallationDate: Installed on 2018-09-19 (32 days ago)
InstallationMedia: Ubuntu 18.04.1 LTS "Bionic Beaver" - Release amd64 (20180725)
Lsusb:
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 003 Device 003: ID 8087:0025 Intel Corp.
 Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Dell Inc. Precision 3630 Tower
ProcEnviron:
 LANGUAGE=
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.15.0-36-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash
PulseList:
 Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-36-generic N/A
 linux-backports-modules-4.15.0-36-generic N/A
 linux-firmware 1.173.1
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 05/30/2018
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.0.1
dmi.board.name: 0NNNCT
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 3
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.0.1:bd05/30/2018:svnDellInc.:pnPrecision3630Tower:pvr:rvnDellInc.:rn0NNNCT:rvrA00:cvnDellInc.:ct3:cvr:
dmi.product.family: Precision
dmi.product.name: Precision 3630 Tower
dmi.sys.vendor: Dell Inc.

Peter Brandon (slowtrain55) wrote :

Did this issue start happening after an update/upgrade? Was there a
prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer
to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
v4.19 kernel[0].

If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as
"Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.19-rc8

Changed in linux (Ubuntu):
status: New → Incomplete
Peter Brandon (slowtrain55) wrote :

The issue began after a fresh system install on a new computer. Admittedly, the first system I installed was Qubes, with Fedora (24 I believe). When I decided Qubes wasn't adequate for my needs, I wiped the system (reformatted using gparted) and installed the latest Ubuntu (as of a month ago). The kworker process was out of control in both Fedora and Ubuntu.

I have tried kernels 16, 17, and 18, latest as of about a week ago. 16 is no different than my current 15 (4.15). 17 and 18 show a 5% decrease in the kworker process. I haven't tried 4.19-rc8 yet because I thought rc was too rough for testing, but I'll try it on next boot. Am not hopeful that the problem is fixed. And, even if it is fixed in 4.19, that may not do me much good. I need virtualbox but I doubt it would run in 4.19. Plus I imagine there are more security vulnerabilities and compatibility issues with mainline testing kernels. It will probably take months to years for 18.04 to be using 4.19. Just a thought....

Best wishes,
Peter

Peter Brandon (slowtrain55) wrote :

I've now tested with 4.19-rc8. The kworker process remains at about 65% continuously. Not quite sure how to mark this as confirmed, but will try.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Peter Brandon (slowtrain55) wrote :

I should add that I issued the following last night:

echo "enable" > /sys/firmware/acpi/interrupts/gpe6F

And then tried to shut down my system. After 8 minutes, the system had not shut down, so I did so manually. Either the interrupts from the bug are preventing the shutdown or this is a separate problem. I have on occasion been able to complete a shutdown without holding the power key, but this was always with the interrupts enabled.

Kai-Heng Feng (kaihengfeng) wrote :

There's a BIOS update for this system, it might solve the issue.

Changed in linux (Ubuntu):
importance: Undecided → Medium
Peter Brandon (slowtrain55) wrote :

I've updated the BIOS for my system. It did not solve the issue.

Kai-Heng Feng (kaihengfeng) wrote :

Can you try v4.13? There's a new commit introduced in v4.14:

commit ecc1165b8b743fd1503b9c799ae3a9933b89877b
Author: Rafael J. Wysocki <email address hidden>
Date: Thu Aug 10 00:30:09 2017 +0200

    ACPICA: Dispatch active GPEs at init time

    In some cases GPEs are already active when they are enabled by
    acpi_ev_initialize_gpe_block() and whatever happens next may depend
    on the result of handling the events signaled by them, so the
    events should not be discarded (which is what happens currently) and
    they should be handled as soon as reasonably possible.

    For this reason, modify acpi_ev_initialize_gpe_block() to
    dispatch GPEs with the status flag set in-band right after
    enabling them.

    Signed-off-by: Rafael J. Wysocki <email address hidden>
    Tested-by: Mika Westerberg <email address hidden>

Let's see if this commit introduce the regression.

Peter Brandon (slowtrain55) wrote :

Thanks Kai-Heng.

Unfortunately, my system wouldn't boot with v4.13. I installed the following kernel-related files:

linux-headers-4.13.16-041316_4.13.16-041316.201711240901_all.deb
linux-headers-4.13.16-041316-generic_4.13.16-041316.201711240901_amd64.deb
linux-image-4.13.16-041316-generic_4.13.16-041316.201711240901_amd64.deb

Kai-Heng Feng (kaihengfeng) wrote :

On your system, this is in kernel message:
[ 0.104005] ACPI: GPE 0x6F active on init

I just tested Precision 3630 Tower in our lab, no such message, and I don't see the issue here.

/sys/firmware/acpi/interrupts/gpe6F stays at 0 all the time.

Can you try "restore default settings" in BIOS?

Peter Brandon (slowtrain55) wrote :

Hmm, I think the main bios change is that I have safe mode turned off (at least in part because of that unsigned module). I'll see what I can do the next time I restart the system.

Peter Brandon (slowtrain55) wrote :

Hi again: So, I tried both with restored default bios settings and default factory setting. The kworker process is still going at over 85%. I can turn it off w/ echo "disable" > /sys/firmware/acpi/interrupts/gpe6F in both cases.

One odd thing I noticed. With the default settings, I have two devices I can boot from, one is ubuntu and the other is UEFI. If I try to boot from UEFI, I get an error message saying it can't start (can't find boot order?) and that it is changing the settings. When I look in the BIOS after this, the setting change I see is that ubuntu is now first in boot order. It boots fine after that.

I also tried running the system with Secure Boot, after erasing virtualbox. This also has no effect on the kworker process. And, when I reinstalled VirtualBox, it gave me the option to create a MOK key and put it in place, which I did. System now starts with Secure Boot and VirtualBox on it (yay, progress!). kworker process continues to fire up at 85%+.

Given that the out of control kworker process seems to be called by kthreadd, the 2nd process after system init (process 1), I'm guessing that it isn't something in the software on my system that is causing the issue. Maybe it's the specific hardware I have. Do you have a lshw?

Peter

Brad Figg (brad-figg) on 2019-07-24
tags: added: cscc
You-Sheng Yang (vicamo) on 2019-10-06
tags: added: ubuntu-certified
You-Sheng Yang (vicamo) wrote :

This is also found on a Cannon Lake Intel SDP with Xeon E-2176G running v5.4-rc1 kernel.

Daniel (brokencog) wrote :

PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
20684 root 20 0 0 0 0 I 59.9 0.0 2:52.66 kworker/0:0-kacpi_notify

I'm also having this on a Dell G7 laptop ( Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz ) with kernel 4.19.80.

I have also tried kernel's 5.2.0 with the same result.

Kai-Heng Feng (kaihengfeng) wrote :

Daniel, can you please attach output of `grep . /sys/class/dmi/id/*`?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers