System hangs when thermald is running

Bug #1366541 reported by Timothy G. Rundle on 2014-09-07
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
thermald (Ubuntu)
High
Colin Ian King

Bug Description

While testing uptopic, my system has been hanging.I have not direct proof that it is thermald, except that I noticed the hangs occured whent he CPU temperature rises about 45C. After doing some research I found thermald being enabled was new in uptopic so I decided to stop it from launching and the hangs stopped. Also I noticed the thermald log was the last log being written to before the hang even though there doesn't appear to be any unusual details.

As per the Wiki I launched thermald manually with the debug logging enabled (see attached).

I have had issue in the pass witht he Kernel causing system hangs when it tried to take adavantage of certaion BIOS features (My BIOS is fully patched).

ProblemType: Bug
DistroRelease: Ubuntu 14.10
Package: thermald 1.3-3
ProcVersionSignature: Ubuntu 3.16.0-13.19-generic 3.16.1
Uname: Linux 3.16.0-13-generic x86_64
ApportVersion: 2.14.7-0ubuntu2
Architecture: amd64
CurrentDesktop: Unity
Date: Sun Sep 7 09:29:49 2014
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: thermald
UpgradeStatus: Upgraded to utopic on 2014-06-03 (96 days ago)
---
ApportVersion: 2.14.7-0ubuntu2
Architecture: amd64
CurrentDesktop: Unity
DistroRelease: Ubuntu 14.10
Package: thermald 1.3-3
PackageArchitecture: amd64
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 3.16.0-14.20-generic 3.16.2
Tags: utopic
Uname: Linux 3.16.0-14-generic x86_64
UpgradeStatus: Upgraded to utopic on 2014-06-03 (97 days ago)
UserGroups: adm admin cdrom dialout dip fax floppy fuse lp lpadmin plugdev sudo tape users video
_MarkForUpload: True

Related branches

Timothy G. Rundle (tgrundle) wrote :
Timothy G. Rundle (tgrundle) wrote :
Colin Ian King (colin-king) wrote :

@Srinivas, any ideas

Changed in thermald (Ubuntu):
importance: Undecided → High
Colin Ian King (colin-king) wrote :

Timothy, can you elaborate on what kind of issues you had, re: "I have had issue in the pass witht he Kernel causing system hangs when it tried to take adavantage of certaion BIOS features (My BIOS is fully patched)."

So we can get some better idea of the hardware, can you run: apport-collect 1366541

Changed in thermald (Ubuntu):
status: New → Triaged

apport information

tags: added: apport-collected
description: updated
Timothy G. Rundle (tgrundle) wrote :

The previous issue was with watchdog / sp5100_tco (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1116835)

tags: added: kernel-da-key

There is no coretemp sysfs or any config path, so thermald should gracefully exit. There is some sensor
/sys/class/hwmon/hwmon0/name->atk0110, but not sure it has any relationship with cpus on AMD platforms.
If it is then it can be added to thermal_conf.xml manually by correcting the passive temperature. 45C is too low for a CPU.

Attached a patch, can you try this?

The attachment "0001-Fix-assumption-about-hwmon0.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch

Also added a change to no rely on memset 0 to pollfds. So try with the two attached patches.

Timothy G. Rundle (tgrundle) wrote :

I applied the patches, but thermald did not start...I am not sure if that is the intended behavior or not so I attached the log

Timothy G. Rundle (tgrundle) wrote :

I am not sure if this is helpful or not, but here is a screen shot of whar indicator-sensors reports

Thermald couldn't find neither coretemp nor any sensors configured in the thermal-conf.xml, so it should exit.
We need to find out path for cpu temp on AMD and add to thermal-conf.xml.

Colin Ian King (colin-king) wrote :

I've built a new version of thermald containing the fixes from Srinivas for testing:

http://kernel.ubuntu.com/~cking/lp-1367131/

Install the appropriate package, it will restart the new thermald. Let us know if this helps.

Timothy G. Rundle (tgrundle) wrote :

@Colin, your build produced the same result as expected. Thermald exited.

@Srinivas, is there a command I can run to get this information? I assumed everything was uploaded when i ran the ubuntu-bug and apport-collect commands.

On Tue, 2014-09-09 at 21:41 +0000, Timothy G. Rundle wrote:
> @Colin, your build produced the same result as expected. Thermald
> exited.
>
> @Srinivas, is there a command I can run to get this information? I
> assumed everything was uploaded when i ran the ubuntu-bug and apport-
> collect commands.
>
I don't have AMD system, may be Colin has one. We just need to find out
the potential path for temp and configure in thermal-conf.xml.
There is some k10temp stuff I see in Linux drivers, but I couldn't find
on your system.

Thanks,
Srinivas

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package thermald - 1.3-4

---------------
thermald (1.3-4) unstable; urgency=medium

  * Fix assumption about hwmon0 (LP: #1366541)
  * Use correct nfds to not rely on memset 0 to pollfds

 -- Colin King <email address hidden> Tue, 9 Sep 2014 16:32:00 +0100

Changed in thermald (Ubuntu):
status: Triaged → Fix Released
Timothy G. Rundle (tgrundle) wrote :

I am confused as to why this was closed, is thermald a intel only thing? is my hardware essentially blacklisted?

Colin Ian King (colin-king) wrote :

OK, I was probably over-zealous on this; the initial bug of thermald causing a system hang was addressed, but support for your AMD H/W has not, so forgive me for closing it prematurely.

Changed in thermald (Ubuntu):
status: Fix Released → In Progress
assignee: nobody → Colin Ian King (colin-king)
Colin Ian King (colin-king) wrote :

@Srinivas, I don't have any modern AMD kit. What kind of sensor information do you require on the AMD platform?

On Wed, 2014-09-10 at 17:33 +0000, Colin Ian King wrote:
> @Srinivas, I don't have any modern AMD kit. What kind of sensor
> information do you require on the AMD platform?
>
Since sensors program able to find it, there are sensors. I don't know
if this program also prints location of those sensors in sysfs.

Thanks,
Srinivas

Colin Ian King (colin-king) wrote :

I guess running strace on sensors and capturing the output will show us which paths it is reading.

Any chance of doing that Timothy and attaching the output to the bug report.

Timothy G. Rundle (tgrundle) wrote :

I will give it a try when I get home.

If you want to close this issue as fixed and have me open a new one
requesting support for my hardware I can do that as well.

On Wed, Sep 10, 2014 at 2:22 PM, Colin Ian King
<email address hidden> wrote:
> I guess running strace on sensors and capturing the output will show us
> which paths it is reading.
>
> Any chance of doing that Timothy and attaching the output to the bug
> report.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1366541
>
> Title:
> System hangs when thermald is running
>
> Status in “thermald” package in Ubuntu:
> In Progress
>
> Bug description:
> While testing uptopic, my system has been hanging.I have not direct
> proof that it is thermald, except that I noticed the hangs occured
> whent he CPU temperature rises about 45C. After doing some research I
> found thermald being enabled was new in uptopic so I decided to stop
> it from launching and the hangs stopped. Also I noticed the thermald
> log was the last log being written to before the hang even though
> there doesn't appear to be any unusual details.
>
> As per the Wiki I launched thermald manually with the debug logging
> enabled (see attached).
>
> I have had issue in the pass witht he Kernel causing system hangs when
> it tried to take adavantage of certaion BIOS features (My BIOS is
> fully patched).
>
> ProblemType: Bug
> DistroRelease: Ubuntu 14.10
> Package: thermald 1.3-3
> ProcVersionSignature: Ubuntu 3.16.0-13.19-generic 3.16.1
> Uname: Linux 3.16.0-13-generic x86_64
> ApportVersion: 2.14.7-0ubuntu2
> Architecture: amd64
> CurrentDesktop: Unity
> Date: Sun Sep 7 09:29:49 2014
> ProcEnviron:
> TERM=xterm
> PATH=(custom, no user)
> XDG_RUNTIME_DIR=<set>
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> SourcePackage: thermald
> UpgradeStatus: Upgraded to utopic on 2014-06-03 (96 days ago)
> ---
> ApportVersion: 2.14.7-0ubuntu2
> Architecture: amd64
> CurrentDesktop: Unity
> DistroRelease: Ubuntu 14.10
> Package: thermald 1.3-3
> PackageArchitecture: amd64
> ProcEnviron:
> TERM=xterm
> PATH=(custom, no user)
> XDG_RUNTIME_DIR=<set>
> LANG=en_US.UTF-8
> SHELL=/bin/bash
> ProcVersionSignature: Ubuntu 3.16.0-14.20-generic 3.16.2
> Tags: utopic
> Uname: Linux 3.16.0-14-generic x86_64
> UpgradeStatus: Upgraded to utopic on 2014-06-03 (97 days ago)
> UserGroups: adm admin cdrom dialout dip fax floppy fuse lp lpadmin plugdev sudo tape users video
> _MarkForUpload: True
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/1366541/+subscriptions

Colin Ian King (colin-king) wrote :

Timothy, thanks, that sounds like a good idea. I'll close this bug as fixed right away.

Changed in thermald (Ubuntu):
status: In Progress → Fix Released
Timothy G. Rundle (tgrundle) wrote :

Ticket 1367946 created to track why thermald fails to start

Hello Timothy, or anyone else affected,

Accepted thermald into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/thermald/1.7.0-5ubuntu4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in thermald (Ubuntu Bionic):
status: New → Fix Committed
tags: added: verification-needed verification-needed-bionic
Colin Ian King (colin-king) wrote :

The bionic SRU test message occurred because I accidentally uploaded the package with the entire old history. This bug has already been fixed and the verification for bionic can be ignored.

Changed in thermald (Ubuntu Bionic):
status: Fix Committed → Fix Released
tags: removed: verification-needed verification-needed-bionic
no longer affects: thermald (Ubuntu Bionic)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers