thermald_1.9.1-1ubuntu0.4_amd64 breaks system

Bug #1930422 reported by Daniel James Brinton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
thermald (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

cpu runs at 100c fan at 3000rpm when in heavy use previous version was ok
re: https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/1913186
lshw -short is:
H/W path Device Class Description
==============================================================
                                    system XPS 8940 (09C5)
/0 bus 0KV3RP
/0/0 memory 64KiB BIOS
/0/9 memory 16GiB System Memory
/0/9/0 memory 8GiB DIMM DDR4 Synchronous 32
/0/9/1 memory 8GiB DIMM DDR4 Synchronous 32
/0/9/2 memory Project-Id-Version: lshwRepor
/0/9/3 memory Project-Id-Version: lshwRepor
/0/35 memory 512KiB L1 cache
/0/36 memory 2MiB L2 cache
/0/37 memory 16MiB L3 cache
/0/38 processor Intel(R) Core(TM) i7-10700 CP
/0/100 bridge Intel Corporation
/0/100/1 bridge Xeon E3-1200 v5/E3-1500 v5/6t
/0/100/1/0 display TU106 [GeForce RTX 2060 Rev.
/0/100/1/0.1 multimedia TU106 High Definition Audio C
/0/100/1/0.2 bus TU106 USB 3.1 Host Controller
/0/100/1/0.2/0 usb3 bus xHCI Host Controller
/0/100/1/0.2/1 usb4 bus xHCI Host Controller
/0/100/1/0.3 bus TU106 USB Type-C UCSI Control
/0/100/2 display Intel Corporation
/0/100/4 generic Xeon E3-1200 v5/E3-1500 v5/6t
/0/100/8 generic Xeon E3-1200 v5/v6 / E3-1500
/0/100/12 generic Comet Lake PCH Thermal Contro
/0/100/14 bus Comet Lake USB 3.1 xHCI Host
/0/100/14/0 usb1 bus xHCI Host Controller
/0/100/14/0/4 enx0c5b8f279a64 communication HUAWEI_MOBILE
/0/100/14/0/5 input Dell MS116 USB Optical Mouse
/0/100/14/0/6 input Dell KB216 Wired Keyboard
/0/100/14/0/b generic USB2.0-CRW
/0/100/14/0/c generic UB91C
/0/100/14/0/e communication Bluetooth wireless interface
/0/100/14/1 usb2 bus xHCI Host Controller
/0/100/14.2 memory RAM memory
/0/100/14.3 wlo1 network Wi-Fi 6 AX201
/0/100/15 bus Comet Lake PCH Serial IO I2C
/0/100/16 communication Comet Lake HECI Controller
/0/100/17 storage Intel Corporation
/0/100/1b bridge Comet Lake PCI Express Root P
/0/100/1b/0 storage Samsung Electronics Co Ltd
/0/100/1b/0/0 /dev/nvme0 storage PM991 NVMe Samsung 512GB
/0/100/1b/0/0/1 /dev/nvme0n1 disk 512GB NVMe namespace
/0/100/1b/0/0/1/1 volume 149MiB Windows FAT volume
/0/100/1b/0/0/1/2 /dev/nvme0n1p2 volume 127MiB reserved partition
/0/100/1b/0/0/1/3 /dev/nvme0n1p3 volume 214GiB Windows NTFS volume
/0/100/1b/0/0/1/4 /dev/nvme0n1p4 volume 989MiB Windows NTFS volume
/0/100/1b/0/0/1/5 /dev/nvme0n1p5 volume 16GiB Windows NTFS volume
/0/100/1b/0/0/1/6 /dev/nvme0n1p6 volume 1417MiB Windows NTFS volume
/0/100/1b/0/0/1/7 /dev/nvme0n1p7 volume 243GiB EXT4 volume
/0/100/1c bridge Intel Corporation
/0/100/1c/0 enp3s0 network Realtek Semiconductor Co., Lt
/0/100/1f bridge Intel Corporation
/0/100/1f.3 multimedia Comet Lake PCH cAVS
/0/100/1f.4 bus Comet Lake PCH SMBus Controll
/0/100/1f.5 bus Comet Lake PCH SPI Controller
/0/1 system PnP device PNP0c02
/0/2 system PnP device PNP0c02
/0/3 system PnP device PNP0c02
/0/4 system PnP device PNP0b00
/0/5 generic PnP device INT3f0d
/0/6 system PnP device PNP0c02
/0/7 system PnP device PNP0c02
/0/8 system PnP device PNP0c02
/0/a system PnP device PNP0c02
/0/b scsi0 storage
/0/b/0.0.0 /dev/sda disk 1TB WDC WD10EZEX-75W
/0/b/0.0.0/1 /dev/sda1 volume 127MiB reserved partition
/0/b/0.0.0/2 /dev/sda2 volume 931GiB Windows NTFS volume
/0/c scsi3 storage
/0/c/0.0.0 /dev/cdrom disk DVD+-RW DU-8A5LH
/0/d scsi6 storage
/0/d/0.0.0 /dev/sdb disk TF CARD Storage
/0/d/0.0.0/0 /dev/sdb disk
/1 wlx00c0ca97cd65 network Wireless interface

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: thermald 1.9.1-1ubuntu0.4
ProcVersionSignature: Ubuntu 5.8.0-53.60~20.04.1-generic 5.8.18
Uname: Linux 5.8.0-53-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu27.18
Architecture: amd64
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Tue Jun 1 14:29:19 2021
InstallationDate: Installed on 2021-05-11 (21 days ago)
InstallationMedia: Ubuntu 20.04.2.0 LTS "Focal Fossa" - Release amd64 (20210209.1)
ProcEnviron:
 LANGUAGE=en_GB:en
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
SourcePackage: thermald
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Daniel James Brinton (boobarble) wrote :
Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

Edit /usr/lib/systemd/system/thermald.service to add option --loglevel=info. Basically
/usr/sbin/thermald --systemd --dbus-enable --adaptive
changes to
/usr/sbin/thermald --systemd --dbus-enable --adaptive --loglevel=info

Then reboot and when you see the condition attach the output of
journalctl -rb /usr/sbin/thermald

Revision history for this message
Daniel James Brinton (boobarble) wrote : Re: [Bug 1930422] Re: thermald_1.9.1-1ubuntu0.4_amd64 breaks system

cheers for getting back. please see attached. it wasn't under stress?

On 02/06/2021 14:38, Srinivas Pandruvada wrote:
> Edit /usr/lib/systemd/system/thermald.service to add option --loglevel=info. Basically
> /usr/sbin/thermald --systemd --dbus-enable --adaptive
> changes to
> /usr/sbin/thermald --systemd --dbus-enable --adaptive --loglevel=info
>
> Then reboot and when you see the condition attach the output of
> journalctl -rb /usr/sbin/thermald
>

Revision history for this message
Daniel James Brinton (boobarble) wrote :

when stressed

On 02/06/2021 14:38, Srinivas Pandruvada wrote:
> Edit /usr/lib/systemd/system/thermald.service to add option --loglevel=info. Basically
> /usr/sbin/thermald --systemd --dbus-enable --adaptive
> changes to
> /usr/sbin/thermald --systemd --dbus-enable --adaptive --loglevel=info
>
> Then reboot and when you see the condition attach the output of
> journalctl -rb /usr/sbin/thermald
>

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

Both logs looks same. I don't see any throttling.
This is a backported version of thermald in Ubuntu.
Can you run with upstream version of thermald
https://github.com/intel/thermal_daemon

Revision history for this message
Junjie Jin (chen5317) wrote (last edit ):
Download full text (5.9 KiB)

I encountered the similar issue since 1.9.1-1ubuntu0.4 update. On the previous version 1.9.1-1build1, if I run a CPU stress test, the fan will keep going up until full speed (~6000rpm) as the temp rises. On 1.9.1-1ubuntu0.4, the fan sometimes is kept at 3600rpm or remains at zero. Once CPU reaches over ~90C, it got throttled down to min freq 400MHz. I didn't know the problem was thermald until I reinstalled a fresh Ubuntu 20.04.2, which worked perfectly, and then basically did a binary search to find which package update caused the issue. To verify it's really thermald 1.9.1-1ubuntu0.4, I reinstalled Ubuntu 20.04.2 again, and the fan worked fine as expected, and then updated thermald alone, and this issue happened. Then I downgraded thermald to the stock version come with Ubuntu 20.04.2, the issue went away.

lshw --short:
                                     system Latitude 7400 (08E1)
/0 bus 07WDVW
/0/0 memory 64KiB BIOS
/0/20 memory 32GiB System Memory
/0/20/0 memory 16GiB SODIMM DDR4 Synchronous 2667 MHz (0.4 ns)
/0/20/1 memory 16GiB SODIMM DDR4 Synchronous 2667 MHz (0.4 ns)
/0/29 memory 256KiB L1 cache
/0/2a memory 1MiB L2 cache
/0/2b memory 8MiB L3 cache
/0/2c processor Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
/0/100 bridge Coffee Lake HOST and DRAM Controller
/0/100/2 display UHD Graphics 620 (Whiskey Lake)
/0/100/4 generic Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem
/0/100/8 generic Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
/0/100/12 generic Cannon Point-LP Thermal Controller
/0/100/14 bus Cannon Point-LP USB 3.1 xHCI Controller
/0/100/14/0 usb1 bus xHCI Host Controller
/0/100/14/0/6 multimedia Integrated_Webcam_HD
/0/100/14/0/8 generic 58200
/0/100/14/0/a communication Bluetooth wireless interface
/0/100/14/1 usb2 bus xHCI Host Controller
/0/100/14.2 memory RAM memory
/0/100/14.3 wlo1 network Cannon Point-LP CNVi [Wireless-AC]
/0/100/15 bus Cannon Point-LP Serial IO I2C Controller #0
/0/100/15.1 bus Cannon Point-LP Serial IO I2C Controller #1
/0/100/15.3 bus Intel Corporation
/0/100/16 communication Cannon Point-LP MEI Controller #1
/0/100/16.3 communication Cannon Point-LP Keyboard and Text (KT) Redirection
/0/100/19 ...

Read more...

Revision history for this message
Daniel James Brinton (boobarble) wrote :

yeah ok, soz i was half asleep when i replied.

On 03/06/2021 16:01, Srinivas Pandruvada wrote:
> Both logs looks same. I don't see any throttling.
> This is a backported version of thermald in Ubuntu.
> Can you run with upstream version of thermald
> https://github.com/intel/thermal_daemon
>

Revision history for this message
Daniel James Brinton (boobarble) wrote :
Download full text (6.3 KiB)

exactly what happened to me, kept reinstalling 20.04.2 then decided to
update apps in batches leaving thermald to last as it seemed must likely
candidate. then i discovered Timeshift (old image restorer) save a lot
of time!

On 06/06/2021 01:20, Junjie Jin wrote:
> I encountered the similar issue since 1.9.1-1ubuntu0.4 update. On the
> previous version 1.9.1-1build1, if I run a CPU stress test, the fan will
> keep going up until full speed (~6000rpm) as the temp rises. On
> 1.9.1-1ubuntu0.4, the fan sometimes is kept at 3600rpm or remains at
> zero. Once CPU reaches over ~90C, it got throttled down to min freq
> 400MHz. I didn't know the problem was thermald until I reinstalled a
> fresh Ubuntu 20.04.2, which worked perfectly, and then basically did a
> binary search to find which package update caused the issue. To verify
> it's really thermald 1.9.1-1ubuntu0.4, I reinstalled Ubuntu 20.04.2
> again, and the fan worked fine as expected, and then updated thermald
> alone, and this issue happened. Then I downgraded thermald to the stock
> version come with Ubuntu 20.04.2, the issue went away.
>
> lshw --short:
> system Latitude 7400 (08E1)
> /0 bus 07WDVW
> /0/0 memory 64KiB BIOS
> /0/20 memory 32GiB System Memory
> /0/20/0 memory 16GiB SODIMM DDR4 Synchronous 2667 MHz (0.4 ns)
> /0/20/1 memory 16GiB SODIMM DDR4 Synchronous 2667 MHz (0.4 ns)
> /0/29 memory 256KiB L1 cache
> /0/2a memory 1MiB L2 cache
> /0/2b memory 8MiB L3 cache
> /0/2c processor Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz
> /0/100 bridge Coffee Lake HOST and DRAM Controller
> /0/100/2 display UHD Graphics 620 (Whiskey Lake)
> /0/100/4 generic Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem
> /0/100/8 generic Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
> /0/100/12 generic Cannon Point-LP Thermal Controller
> /0/100/14 bus Cannon Point-LP USB 3.1 xHCI Controller
> /0/100/14/0 usb1 bus xHCI Host Controller
> /0/100/14/0/6 multimedia Integrated_Webcam_HD
> /0/100/14/0/8 generic 58200
> /0/100/14/0/a communication Bluetooth wireless interface
> /0/100/14/1 usb2 bus xHCI Host Controller
> /0/100/14.2 memory RAM memory
> /0/100/14.3 wlo1 network Cannon Point-LP CNVi [Wireless-AC]
> /0/100/15 bus Cannon Point-LP Serial IO I2C Controller #0
> /0/100/15.1 bus ...

Read more...

Revision history for this message
Daniel James Brinton (boobarble) wrote :

that seems to have fixed it. temp steady at ~90c fan 2200rpm. ta. please
see attached.

On 03/06/2021 16:01, Srinivas Pandruvada wrote:
> Both logs looks same. I don't see any throttling.
> This is a backported version of thermald in Ubuntu.
> Can you run with upstream version of thermald
> https://github.com/intel/thermal_daemon
>

Revision history for this message
Daniel James Brinton (boobarble) wrote :

spoke too soon. if i run cpus and gpu hard termp goes 94-100c fan
2800-3000 makes no fifference if i fresh boot or restart thermald while
under stress; please see attached. hope this helps.

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

Again I don't see any throttling.
With the version from
https://github.com/intel/thermal_daemon

Can you do
#systemctl disable thermald
reboot

from a command line
#thermald --no-daemon --loglevel=debug --adaptive

Attach the output

Alternatively you can just add --loglevel=debug instead of loglevel=info in thermald service file and collects log in journal

Revision history for this message
Daniel James Brinton (boobarble) wrote :

ok will do, just for reference this is a screenshot running thermald
from ubuntu-20.04.2.0-desktop-amd64

doing prime number search with cpus and gpu runing at 84c 1802rpm will
send another when i install https://github.com/intel/thermal_daemon again.

On 06/06/2021 19:14, Srinivas Pandruvada wrote:

https://github.com/intel/thermal_daemon

> Again I don't see any throttling.
> With the version from
> https://github.com/intel/thermal_daemon
>
> Can you do
> #systemctl disable thermald
> reboot
>
> from a command line
> #thermald --no-daemon --loglevel=debug --adaptive
>
> Attach the output
>
> Alternatively you can just add --loglevel=debug instead of loglevel=info
> in thermald service file and collects log in journal
>

Revision history for this message
Daniel James Brinton (boobarble) wrote :

try this.

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

I see that this system doesn't have all expected table and has one default table which has just one entry. So need some special implementation. I will implement and send a branch to test.

But keep in mind that limit is set at 71C. So I know there will be complaints that there is too much throttling also.

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

Please try this version
https://github.com/intel/thermal_daemon/tree/ubuntu-bug-1930422

checkout branch ubuntu-bug-1930422.
Then repeat comments at #11

Revision history for this message
Daniel James Brinton (boobarble) wrote :

wow it's running like a fridge! cheers.

On 07/06/2021 03:23, Srinivas Pandruvada wrote:
> Please try this version
> https://github.com/intel/thermal_daemon/tree/ubuntu-bug-1930422
>
> checkout branch ubuntu-bug-1930422.
> Then repeat comments at #11
>

Revision history for this message
Daniel James Brinton (boobarble) wrote :

longer run. thanks for your help. without people like you we'd all be
using Windows!

On 07/06/2021 03:23, Srinivas Pandruvada wrote:
> Please try this version
> https://github.com/intel/thermal_daemon/tree/ubuntu-bug-1930422
>
> checkout branch ubuntu-bug-1930422.
> Then repeat comments at #11
>

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

Thanks for the comment.
I would like to know something more about this system?
- Is this a desktop?
- Do you see any entry where /sys/class/thermal/cooling_device*/type = fan?
If you see can you control fan speed via /sys/class/thermal/cooling_device*/cur_state

I will cleanup the change and upload one final version before I merge and release.

Revision history for this message
Daniel James Brinton (boobarble) wrote :

yes desktop/tower pc with ubuntu 20.04 lts desktop

cooling_device0-4 (out of 21) type = fan

all five cur_state were set at 0 changed them to various values between
0-1 made no difference to fans

On 07/06/2021 20:36, Srinivas Pandruvada wrote:
> Thanks for the comment.
> I would like to know something more about this system?
> - Is this a desktop?
> - Do you see any entry where /sys/class/thermal/cooling_device*/type = fan?
> If you see can you control fan speed via /sys/class/thermal/cooling_device*/cur_state
>
> I will cleanup the change and upload one final version before I merge
> and release.
>

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

I wish fan control worked.
I release v2.4.6 with the changes. This is in master branch with tag v2.4.6.
There are three commits on top of v2.4.5 to address this issue.

Please try.

Revision history for this message
Daniel James Brinton (boobarble) wrote :

first off don't take my word that fan control doesn't work i'm slightly
out of my depth! if fan control doesn't work is 84c the actual temp or
an ideal temp? also have attached log debug output for latest master. ta

On 08/06/2021 02:08, Srinivas Pandruvada wrote:
> I wish fan control worked.
> I release v2.4.6 with the changes. This is in master branch with tag v2.4.6.
> There are three commits on top of v2.4.5 to address this issue.
>
> Please try.
>

Revision history for this message
Colin Ian King (colin-king) wrote :

I've uploaded 2.4.6 to debian and this will sync this into Ubuntu Impish development in the next 24-36 hours. Once that's done I'll backport this fix to older releases of thermald. Thanks for fixing this.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package thermald - 2.4.6-1

---------------
thermald (2.4.6-1) unstable; urgency=medium

  * sync with latest upstream release 2.4.6:
   - fix overheating Latitude 7400 (LP: #1930422)
   - Use Adaptive PPCC limits for RAPL MMIO
   - Increase power limit for disabled RAPL-MMIO
   - Special case for default PSVT
  * 2.4.5 fixes:
   - Set a very high RAPL MSR PL1 with --adaptive

 -- Colin King <email address hidden> Wed, 9 Jun 2021 16:58:23 +0100

Changed in thermald (Ubuntu):
status: New → Fix Released
Revision history for this message
Daniel James Brinton (boobarble) wrote :

ok thanks.

On 09/06/2021 17:16, Colin Ian King wrote:
> I've uploaded 2.4.6 to debian and this will sync this into Ubuntu Impish
> development in the next 24-36 hours. Once that's done I'll backport this
> fix to older releases of thermald. Thanks for fixing this.
>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.