kidle_inject constantly running

Bug #1800446 reported by William Ivanski
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned
thermald (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Computer boots up fine, but as soon as any process causes a single spike in CPU, `kidle_inject` starts and it just won't stop.

Even after 1 hour of idling, with low temperatures, no fan, it’s still making this computer unresponsive.

I disabled `kidle_inject` process by creating a file `/etc/modprobe.d/blacklist-power.conf` with the following content:

```
blacklist intel_powerclamp
blacklist intel_rapl
```

Then I restarted computer. Since then, my computer is not unresponsive anymore, and I use it to work all the time.

ProblemType: Bug
DistroRelease: Ubuntu 18.10
Package: linux-image-4.18.0-10-generic 4.18.0-10.11
ProcVersionSignature: Ubuntu 4.18.0-10.11-generic 4.18.12
Uname: Linux 4.18.0-10-generic x86_64
ApportVersion: 2.20.10-0ubuntu13
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: williamivanski 5974 F.... pulseaudio
 /dev/snd/controlC1: williamivanski 5974 F.... pulseaudio
Date: Mon Oct 29 08:05:09 2018
EcryptfsInUse: Yes
HibernationDevice: RESUME=UUID=1c6d4273-ae09-412d-9488-78ab7ba2fc4b
InstallationDate: Installed on 2015-09-23 (1131 days ago)
InstallationMedia: Kubuntu 15.04 "Vivid Vervet" - Release amd64 (20150422)
MachineType: LG Electronics 13Z940-G.BK71P1
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.18.0-10-generic root=UUID=42b637c4-2da3-4c1a-a62d-b4beb93b8243 ro quiet splash vt.handoff=1
PulseList:
 Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied
 No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-4.18.0-10-generic N/A
 linux-backports-modules-4.18.0-10-generic N/A
 linux-firmware 1.175
SourcePackage: linux
UpgradeStatus: Upgraded to cosmic on 2018-10-23 (5 days ago)
dmi.bios.date: 09/04/2014
dmi.bios.vendor: Phoenix Technologies Ltd.
dmi.bios.version: 13Z940FF
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: White Tip Mountain
dmi.board.vendor: LG Electronics
dmi.board.version: FAB3
dmi.chassis.asset.tag: Asset Tag
dmi.chassis.type: 9
dmi.chassis.vendor: LG Electronics
dmi.chassis.version: 0.1
dmi.modalias: dmi:bvnPhoenixTechnologiesLtd.:bvr13Z940FF:bd09/04/2014:svnLGElectronics:pn13Z940-G.BK71P1:pvr0.1:rvnLGElectronics:rnWhiteTipMountain:rvrFAB3:cvnLGElectronics:ct9:cvr0.1:
dmi.product.family: Shark Bay ULT
dmi.product.name: 13Z940-G.BK71P1
dmi.product.sku: System SKUNumber
dmi.product.version: 0.1
dmi.sys.vendor: LG Electronics

Revision history for this message
William Ivanski (william-ivanski) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Cristian Aravena Romero (caravena) wrote :

Did this issue start happening after an update/upgrade? Was there a
prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer
to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest
v4.19 kernel[0].

If this bug is fixed in the mainline kernel, please add the following
tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag:
'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as
"Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.19

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
William Ivanski (william-ivanski) wrote :

Hi @caravena ,

Thanks for your response.

The issue happened after I upgraded from 18.04 to 18.10 last week. I didn't take note on the kernel version before the upgrade (where the issue was not happening), but as I always keep system up-to-date, it should be latest kernel version for 18.04.

Revision history for this message
William Ivanski (william-ivanski) wrote :

I created a new Ubuntu 18.10 VM and tried to reproduce this, but couldn't. No `kidle_inject` processes were started at all. I'm not sure if they should start inside a VM, though.

Revision history for this message
Justin Warkentin (cosmonrd) wrote :

I have a Dell Precision 5520. I just updated my bios and updated Ubuntu to 18.10 from 18.04 and ran into hard CPU throttling. I uninstalled thermald and everything is fine. Not sure if it would be better to blacklist kidle_inject as suggested here instead though.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Justin,

Would it be possible for you to boot 4.15 kernel? If this also happens at Linux v4.15, it's quite likely a regression in thermald.

Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Justin Warkentin (cosmonrd) wrote :

I reinstalled thermald and tried with the 4.15 kernel and still got intense throttling. I noticed that with either kernel the dmesg output still dumps this:

[ 47.365544] CPU1: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 47.365568] CPU5: Core temperature above threshold, cpu clock throttled (total events = 1)
[ 47.365570] CPU5: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 47.365573] CPU6: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 47.365574] CPU4: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 47.365575] CPU0: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 47.365576] CPU7: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 47.365577] CPU1: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 47.365577] CPU3: Package temperature above threshold, cpu clock throttled (total events = 1)
[ 47.365578] CPU2: Package temperature above threshold, cpu clock throttled (total events = 1)

But without thermald I don't get the throttling.

Revision history for this message
William Ivanski (william-ivanski) wrote :

Here is what I did:

- Removed file /etc/modprobe.d/blacklist-power.conf to re-enable kidle_inject
- Disabled thermald
- Restarted the computer on kernel 4.18

I confirm I don't get the throttling.

In Ubuntu 18.04, thermald version is 1.7.0-5ubuntu1, while on Ubuntu 18.10 it is 1.7.0-8.

Revision history for this message
William Ivanski (william-ivanski) wrote :

I checked that there is no difference in configuration files between Ubuntu 18.04 and Ubuntu 18.10.

@caravena, clearly the kernel is not affected and it should happen in 4.19 too. Should we change this bug to thermald instead?

Revision history for this message
kiraff (kiraff) wrote :

I have the same issue on an acer vn7-592g and Ubuntu 18.10. I temporarily fixed by editing thermald config in /etc/thermald/thermal-cpu-cdev-order.xml to deactivate most cooling methods (guess work as the documentation for it is almost none existent)

<CoolingDeviceOrder>
 <!-- Specify Cooling device order -->
 <CoolingDevice>rapl_controller</CoolingDevice>
 <!--<CoolingDevice>intel_pstate</CoolingDevice> -->
 <!--<CoolingDevice>intel_powerclamp</CoolingDevice> -->
 <!--<CoolingDevice>cpufreq</CoolingDevice>-->
 <!--<CoolingDevice>Processor</CoolingDevice> -->
</CoolingDeviceOrder>

On my setup only "rapl_controller" seems to work. "cpufreq" has no effect (might have something to do with my TLP config)

"intel_pstate", "intel_powerclamp" and "Processor" never stop throttling once the temperature limit is reached.

If needed, throttling can be reset without rebooting by:
"intel_pstate" --> sudo tlp start (you probably don't need tlp but I don't know the underlying command that makes it work)
"intel_powerclamp" --> sudo rmmod intel_powerclamp (https://askubuntu.com/questions/457252/intel-powerclamp-start-stop-forced-idle-injection)
"Processor" --> no idea, can't even find documentation on what it actually does

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in thermald (Ubuntu):
status: New → Confirmed
Revision history for this message
Kieran Bingham (kieranbingham) wrote :

Also affected me after upgrading from 18.04 to 18.10.

The real pain point is that it locks my CPU frequency at 800MHz, and doesn't 'unlock' it even after removing the intel_powerclamp module.

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

I suggest, disable thermald service first.
#systemctl diable thermald

then reboot
Then on a window
#thermald --loglevel=info --no-daemon

And do your regular work,

When you experience slow down, copy paste output of the thermald and attach.

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

Do we still have this issue?

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

Also can you attach acpi.out by issuing the following command

#acpidump > acpi.out

Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
pleabargain (dennisgdaniels) wrote :

I am experiencing this bug in 20.04

inxi -Fx
System: Host: ubuntu20.04 Kernel: 5.4.0-62-generic x86_64 bits: 64 compiler: gcc v: 9.3.0 Desktop: Gnome 3.36.4
           Distro: Ubuntu 20.04.1 LTS (Focal Fossa)
Machine: Type: Desktop Mobo: ASUSTeK model: P8Z77-V LX v: Rev X.0x serial: <superuser/root required>
           BIOS: American Megatrends v: 2501 date: 07/21/2014
CPU: Topology: Quad Core model: Intel Core i5-2500 bits: 64 type: MCP arch: Sandy Bridge rev: 7 L2 cache: 6144 KiB
           flags: avx lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 27198
           Speed: 2299 MHz min/max: 1600/6300 MHz Core speeds (MHz): 1: 2012 2: 1729 3: 1703 4: 1696
Graphics: Device-1: Intel 2nd Generation Core Processor Family Integrated Graphics vendor: ASUSTeK driver: i915 v: kernel
           bus ID: 00:02.0
           Display: x11 server: X.Org 1.20.9 driver: intel unloaded: fbdev,modesetting,vesa
           resolution: 1366x768~60Hz, 1280x1024~60Hz
           OpenGL: renderer: Mesa DRI Intel HD Graphics 2000 (SNB GT1) v: 3.3 Mesa 20.2.6 direct render: Yes
Audio: Device-1: Intel 7 Series/C216 Family High Definition Audio vendor: ASUSTeK P8Z77-V LX driver: snd_hda_intel
           v: kernel bus ID: 00:1b.0
           Device-2: Logitech HD Pro Webcam C920 type: USB driver: snd-usb-audio,uvcvideo bus ID: 6-4:3
           Device-3: Logitech HD Pro Webcam C920 type: USB driver: snd-usb-audio,uvcvideo
           Sound Server: ALSA v: k5.4.0-62-generic
Network: Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: ASUSTeK P8P67 and other motherboards
           driver: r8169 v: kernel port: e000 bus ID: 03:00.0
           IF: eth0 state: up speed: 1000 Mbps duplex: full mac: 10:bf:48:bc:31:13
Drives: Local Storage: total: 12.98 TiB used: 9.04 TiB (69.6%)
           ID-1: /dev/sda vendor: Samsung model: SSD 850 EVO 250GB size: 232.89 GiB
           ID-2: /dev/sdb vendor: Western Digital model: WD20EARX-00PASB0 size: 1.82 TiB
           ID-3: /dev/sdc vendor: Seagate model: ST8000AS0002-1NA17Z size: 7.28 TiB
           ID-4: /dev/sdd vendor: Western Digital model: WD40EZRZ-00GXCB0 size: 3.64 TiB
           ID-5: /dev/sde type: USB model: General size: 15.00 GiB
Partition: ID-1: / size: 197.91 GiB used: 86.53 GiB (43.7%) fs: ext4 dev: /dev/sda1
           ID-2: swap-1 size: 31.69 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/sda5
Sensors: System Temperatures: cpu: 71.0 C mobo: 27.8 C
           Fan Speeds (RPM): N/A
Info: Processes: 345 Uptime: 2h 07m Memory: 15.33 GiB used: 3.02 GiB (19.7%) Init: systemd runlevel: 5 Compilers:
           gcc: 9.3.0 Shell: bash v: 5.0.17 inxi: 3.0.38

Revision history for this message
Juho Kunsola (juboxi) wrote :

Affects me on Ubuntu Studio 20.10 with ThinkPad X201 with Intel(R) Core(TM) i5-3320M

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

Please attach logs as suggested in comment #14 and comment #16.

Revision history for this message
Colin Ian King (colin-king) wrote :

As asked in comment #19, please supply the logs as requested. This bug will be closed in 6 weeks if this has not occurred.

Changed in thermald (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for thermald (Ubuntu) because there has been no activity for 60 days.]

Changed in thermald (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.