kworker high CPU usage: issue with xhci_hub_control from xhci_pci kernel module

Bug #1990876 reported by Albert
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Using Ubuntu 22.04, as of yesterday night upon rebooting the framework laptop (which had not been rebooted for several weeks), the fans come on strong and I see, with top, a kworker process at nearly 100% CPU utilization, which makes the laptop unusable.

I have an early Framework laptop with an 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz.

I removed all USB-c expansion cards except for one with an USB-c pass through, to no avail.

With perf I have been able to determine that the xhci_hub_control and perhaps xhci_bus_suspend are at fault. As a matter of fact, the laptop cannot suspend any more, when it was working flawlessly until the reboot.

$ sudo apt install linux-tools-common linux-tools-$(uname -r)
$ sudo -i
# echo -1 > /proc/sys/kernel/perf_event_paranoid
# perf record
control-c
# perf report

Samples: 125K of event 'cycles', Event count (approx.): 142991820424
Overhead Command Shared Object Symbol
54.11% kworker/5:2+usb [kernel.kallsyms] [k] xhci_hub_control ◆
7.94% kworker/5:2+usb [kernel.kallsyms] [k] xhci_bus_suspend ▒
0.98% kworker/5:2+usb [kernel.kallsyms] [k] kfree ▒
0.86% kworker/5:2+usb [kernel.kallsyms] [k] _raw_spin_lock_irqsave
...

What can be done to address this issue with a kworker?

I suspect perhaps the new kernel is at fault. I tried rebooting to the prior kernel 5.15.0-47-generic and 5.15.0-46-generic, but with the same result.

What has changed? What can be done about this kworker and the xchi_hub_control?

A temporary solution:

$ sudo modprobe -r xhci_pci

… which results in the kworker process disappearing from `top` (reduced CPU usage to near zero), the fans wind down, and unfortunately the ethernet expansion bay doesn’t work anymore but at least wifi does. Keyboard and touchpad also work. And suspend with deep also work well, just tested it.

I assume other expansion bays will stop working too, but the laptop is at least charging: the USB-c pass-through works.

See also:
https://community.frame.work/t/kworker-stuck-at-near-100-cpu-usage-with-ubuntu-22-04/23053/2

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: linux-image-5.15.0-48-generic 5.15.0-48.54
ProcVersionSignature: Ubuntu 5.15.0-48.54-generic 5.15.53
Uname: Linux 5.15.0-48-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.20.11-0ubuntu82.1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: albert 4870 F.... pulseaudio
CasperMD5CheckResult: unknown
CurrentDesktop: ubuntu:GNOME
Date: Mon Sep 26 18:14:26 2022
InstallationDate: Installed on 2022-03-11 (199 days ago)
InstallationMedia: Ubuntu 20.04.4 LTS "Focal Fossa" - Release amd64 (20220223)
Lsusb: Error: command ['lsusb'] failed with exit code 1:
Lsusb-t:

Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
MachineType: Framework Laptop
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/BOOT/ubuntu_ezm1f9@/vmlinuz-5.15.0-48-generic root=ZFS=rpool/ROOT/ubuntu_ezm1f9 ro quiet splash nvme.noacpi=1 mem_sleep_default=deep vt.handoff=1
RelatedPackageVersions:
 linux-restricted-modules-5.15.0-48-generic N/A
 linux-backports-modules-5.15.0-48-generic N/A
 linux-firmware 20220329.git681281e4-0ubuntu3.5
RfKill:
 1: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: Upgraded to jammy on 2022-04-28 (150 days ago)
dmi.bios.date: 07/19/2022
dmi.bios.release: 3.16
dmi.bios.vendor: INSYDE Corp.
dmi.bios.version: 03.10
dmi.board.asset.tag: *
dmi.board.name: FRANBMCP0B
dmi.board.vendor: Framework
dmi.board.version: AB
dmi.chassis.asset.tag: FRANBMCPAB1484011X
dmi.chassis.type: 10
dmi.chassis.vendor: Framework
dmi.chassis.version: AB
dmi.modalias: dmi:bvnINSYDECorp.:bvr03.10:bd07/19/2022:br3.16:svnFramework:pnLaptop:pvrAB:rvnFramework:rnFRANBMCP0B:rvrAB:cvnFramework:ct10:cvrAB:skuFRANBMCP0B:
dmi.product.family: FRANBMCP
dmi.product.name: Laptop
dmi.product.sku: FRANBMCP0B
dmi.product.version: AB
dmi.sys.vendor: Framework

Revision history for this message
Albert (sapristi) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Can you please try some older kernel before 5.15.0-46-generic?

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Albert (sapristi) wrote (last edit ):

Thanks Kai-Heng Feng, I have just tried to use 5.15.0-43-generic and the issue isn't there: no need to modprobe -r xhci_pci, as there isn't any kworker pegging a CPU.

So indeed a git bisect or similar approach between kernels -43 and -46 could reveal something. I can't find binaries for kernels in between using "sudo apt install" in Ubuntu 22.04.

Kernel 5.15.0-43 has other problems though: for example, the touchpad doesn't work at all.

Revision history for this message
Albert (sapristi) wrote :

I have tried `sudo apt install linux-oem-22.04a` to install linux-image-5.17.0-1018-oem and the same problem occurs: `top` lists two, not one, kworker processes running hot continuously.

Revision history for this message
Albert (sapristi) wrote :

I have tried `sudo apt install linux-oem-22.04a` to install linux-image-5.17.0-1018-oem and the same problem occurs: `top` lists two, not one, kworker processes running hot continuously.

And with linux-image-5.10.0-50-generic (released today for Ubuntu 22.04a), same problem.

Revision history for this message
Albert (sapristi) wrote (last edit ):

Same problem with linux.image-5.15.0.52.

Revision history for this message
Albert (sapristi) wrote :

The problem seems to have, finally, gone with linux-image-5.15.0.53:

5.15.0-53-generic #59-Ubuntu SMP Mon Oct 17 18:53:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

This bug report can now be closed.

Revision history for this message
Albert (sapristi) wrote :

Spoke too early: today, for some reason, this bug returned. Perhaps because I plugged in a mouse, or the ethernet expansion card.

Revision history for this message
Albert (sapristi) wrote :

Today this bug returned, perhaps because I plugged in a USB-A mouse, or the ethernet USC-C expansion card.

As usual, this made the kworker at 90% CPU go away:

```$ sudo modprobe -r xhci_pci```

... but now the USB ports don't work, of course.

Revision history for this message
Albert (sapristi) wrote :

Error still here, with: 5.15.0-56-generic #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Albert (sapristi) wrote :

Today I tried out an experiment:

$ sudo modprobe xhci_pci

(wasn't loaded before)

... and the kworker wasn't active. Hurray! But: then I opened "Cheese" to test out the webcam. The webcam works, Cheese was using ~50% CPU just to show me my own image (doesn't sound right), and when I closed Cheese, the kworker was at ~93% CPU usage. So then:

$ sudo modprobe -r xhci_pci

... made the kworker go away. *Sigh*

Revision history for this message
Albert (sapristi) wrote :

Same issue if, instead of the camera, I plug the USB-C ethernet adapter: kworker at over 90% usage. Same "solution".

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Albert (sapristi) wrote :

Expired? The bug is real. Experiencing it right now.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please boot mainline kernel [0] with kernel parameter "usbcore.dyndbg xhci_pci.dyndbg log_buf_len=16M" and attach dmesg here.

[0] https://kernel.ubuntu.com/~kernel-ppa/mainline/v6.2/amd64/

Changed in linux (Ubuntu):
status: Expired → Confirmed
Revision history for this message
Sune Woeller (sune-woeller) wrote :

affected by this every second day. Ubuntu 22.04 LTS. Considering reverting back to 20.04 :( This is so critical.

Revision history for this message
Chris Bertin (chris-bertin) wrote :

This issue is only present for me when a micro-SD card is plugged in, whether it is mounted or not. System is LG-Gram, running fully up to date 22.04.3, running 5.15.0-83-generic kernel as of today. Completely reproducible.

Revision history for this message
Chris Bertin (chris-bertin) wrote :

I should have added that this bug also causes the system to overheat badly even in sleep mode. It seems it causes the system to wake up. I got in the habit of powering it off when traveling because I found it almost too hot to touch a couple of times when I took it out of my backpack.

Revision history for this message
Chris Bertin (chris-bertin) wrote :

The situation has gotten much worse with the 6.2.0.33 kernel! There are now kworker threads spinning out of control even without an SD drive in the slot. Inserting an SD drive just adds new kworker threads! Testing anyone? How can things deteriorate so badly?

Revision history for this message
Chris Bertin (chris-bertin) wrote :

So, I have 2 systems with kubuntu 22.04.3, a desktop and a laptop. The desktop has a 5.15 kernel, even though I always install all available updates, but the laptop has a 6.2 kernel. I'd be happy to "downgrade" that kernel to see what would happen with the rogue runaway kthreads. Is that possible? Has anyone reported these issues on 5.15 kernels?

Revision history for this message
Bruno Golosio (golosio) wrote :

Hello! Any news about this issue? I have the same problem on a freshly installed kubuntu 22.04.3 LTS on a ASUS laptop with AMD Ryzen 9 7845HX and RTX 4060 GPU. Disabling the module xhci_pci works, but unfortunately in my case when I do this the keyboard becomes completely inactive. Is there a way to blacklist this module without disabling the keyboard?

Revision history for this message
arminp (armin-poschmann) wrote :

I would like to know if anyone found a solution for lg gram systems.

Currently i am using HW and os see below. kernel is from mainline archive

Operating System: Linux Mint 21.3
          Kernel: Linux 5.19.17-051917-generic
    Architecture: x86-64
 Hardware Vendor: LG Electronics
  Hardware Model: 17Z90Q-G.AA78G

This kernel works, when you start the system without AC plugged. suspend, plug AC, wakeup.
boot param for kernel is acpi_mask_gpe=0x6E
0x6e can be unmasked immediately after boot.

All newer kernels i tested have the problem with kworkers driving the load to 3 and more, rendering the system unusable when on ac.
Without AC plugged they all work fine, with far better performance and energy saving as the 5.19

The kernelfunction hogging cpu is always something like os_map_remove. Cant check it right now exactly while writing.

Did anyone find a kernel version > 5.19 that works also on AC on these Systems?

Revision history for this message
Yuki ONISHI (yonishi) wrote :

In my case,
# echo 'on' > '/sys/bus/usb/devices/usb3/power/control'
helps removing the high CPU usage of kworker and ksoftirqd.

My PC (Ubuntu 22.04, 6.5.0-1020-oem) has a broken USB port on usb3 and dmesg reports "usb usb3-port7: over-current condition".
Also, powertop is enabled on my PC and the status of "Autosuspend for USB device xHCI Host Controller [usb3]" was "Good".
So, I changed the status to "Bad", which is equivalent to the command above, solving the issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.