error messages about /dev/acpi_thermal_rel and /var/run/thermald/thermal-conf.xml

Bug #1568123 reported by Laurent Bonnaud on 2016-04-08
34
This bug affects 6 people
Affects Status Importance Assigned to Milestone
thermald (Ubuntu)
Medium
Colin Ian King
Xenial
Medium
Colin Ian King

Bug Description

[SRU Justification][XENIAL]
f99f2b59fbbca04a13cad3f7d2dbc985bc7ee0cd caused regression
where the path to the thermal-conf.xml file was incorrectly changed.

[Fix]
This issue is fixed with the following upstream commit:

From 4a890d7e173678644882e6b863f3650e30d33052 Mon Sep 17 00:00:00 2001
From: Srinivas Pandruvada <email address hidden>
Date: Thu, 18 Feb 2016 11:33:50 -0800
Subject: [PATCH] Regression for default config file

f99f2b59fbbca04a13cad3f7d2dbc985bc7ee0cd caused regression
where for Android the default path is changed to TDRUNDIR.
This will cause issue to upgrade thermald.

[Testcase]
Without the fix, the systemd log can be observed to complain about the config file not being parse:

error: could not parse file /var/run/thermald/thermal-conf.xml

With the fix the path is correct and the error is not logged.

[Regression Potential]
Minimal. The fixes are minimal upstream changes to the log path. I have been testing this fix for the past 2 hours with no observed regressions.

-------------------------------------------------------

Hi,

here are the error messages (with some context):

$ journalctl | grep therm
Apr 08 20:23:49 xeelee kernel: thermal LNXTHERM:00: registered as thermal_zone0
Apr 08 20:23:49 xeelee kernel: thermal LNXTHERM:01: registered as thermal_zone1
Apr 08 20:24:43 xeelee sensors[977]: SYSTIN: +34.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = thermistor
Apr 08 20:24:43 xeelee sensors[977]: CPUTIN: +56.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
Apr 08 20:24:43 xeelee sensors[977]: AUXTIN: +40.5°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
Apr 08 20:24:43 xeelee thermald[950]: 13 CPUID levels; family:model:stepping 0x6:3a:9 (6:58:9)
Apr 08 20:24:43 xeelee thermald[950]: Polling mode is enabled: 4
Apr 08 20:24:43 xeelee thermald[950]: failed to open /dev/acpi_thermal_rel
Apr 08 20:24:43 xeelee thermald[950]: failed to open /dev/acpi_thermal_rel
Apr 08 20:24:43 xeelee thermald[950]: TRT/ART read failed
Apr 08 20:24:43 xeelee thermald[950]: I/O warning : failed to load external entity "/var/run/thermald/thermal-conf.xml"
Apr 08 20:24:43 xeelee thermald[950]: error: could not parse file /var/run/thermald/thermal-conf.xml
Apr 08 20:24:43 xeelee thermald[950]: failed to open /dev/acpi_thermal_rel
Apr 08 20:24:43 xeelee thermald[950]: failed to open /dev/acpi_thermal_rel
Apr 08 20:24:43 xeelee thermald[950]: TRT/ART read failed
Apr 08 20:24:43 xeelee thermald[950]: I/O warning : failed to load external entity "/var/run/thermald/thermal-conf.xml"
Apr 08 20:24:43 xeelee thermald[950]: error: could not parse file /var/run/thermald/thermal-conf.xml
Apr 08 20:24:43 xeelee thermald[950]: sysfs write failed trip_point_0_temp
Apr 08 20:24:43 xeelee thermald[950]: sysfs write failed trip_point_0_temp
Apr 08 20:24:43 xeelee thermald[950]: failed to open /dev/acpi_thermal_rel
Apr 08 20:24:43 xeelee thermald[950]: failed to open /dev/acpi_thermal_rel
Apr 08 20:24:43 xeelee thermald[950]: TRT/ART read failed
Apr 08 20:24:43 xeelee thermald[950]: I/O warning : failed to load external entity "/var/run/thermald/thermal-conf.xml"
Apr 08 20:24:43 xeelee thermald[950]: error: could not parse file /var/run/thermald/thermal-conf.xml

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: thermald 1.5-2
Uname: Linux 4.5.0-040500-lowlatency x86_64
ApportVersion: 2.20.1-0ubuntu1
Architecture: amd64
CurrentDesktop: KDE
Date: Fri Apr 8 22:01:54 2016
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: thermald
UpgradeStatus: Upgraded to xenial on 2016-03-31 (8 days ago)

Colin Ian King (colin-king) wrote :

Can you attach a copy of /etc/thermald/thermal-conf.xml to the bug report

Changed in thermald (Ubuntu):
status: New → Incomplete
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)

This is not a bug as not all platforms have these tables. We need to downgrade the severity of these log messages.

Here it is.

heynnema (heynnema) wrote :

I have the same problem.

The BIG bug is that thermald doesn't find thermal-conf.xml in /etc/thermald like it should... it looks for it in /var/run/thermald/therm-conf.xml... which of course doesn't exist. So any changes that you make in /etc/thermald/thermal-conf.xml aren't recognized.

The problem with the /dev/acpi_thermal_rel error, is that device doesn't exist.

Further logs and info available on request.

Please advise,

Al

Changed in thermald (Ubuntu):
status: Incomplete → In Progress

Two issues in this bug:
- The error messages for thermal relationship tables: They are fine. All systems don't have that.
- For the other there was issue, which was fixed. I don't know if Colin picked up that or not. If the issue is still there even after this, let me know ASAP.

commit 4a890d7e173678644882e6b863f3650e30d33052
Author: Srinivas Pandruvada <email address hidden>
Date: Thu Feb 18 11:33:50 2016 -0800

    Regression for default config file

    f99f2b59fbbca04a13cad3f7d2dbc985bc7ee0cd caused regression
    where for Android the default path is changed to TDRUNDIR.
    This will cause issue to upgrade thermald.

    Thanks to Bruno Pagani "ArchangeGabriel" to identify and root
    causing this.

diff --git a/src/thd_parse.cpp b/src/thd_parse.cpp
index 7e8e84c..f73d43e 100644
--- a/src/thd_parse.cpp
+++ b/src/thd_parse.cpp
@@ -60,8 +60,13 @@ char *cthd_parse::char_trim(char *str) {

 cthd_parse::cthd_parse() :
                matched_thermal_info_index(-1), doc(NULL), root_element(NULL) {
+ std::string name_conf = TDCONFDIR;
        std::string name_run = TDRUNDIR;
+#ifdef ANDROID
        filename = name_run + "/" + "thermal-conf.xml";
+#else
+ filename = name_conf + "/" + "thermal-conf.xml";
+#endif
        filename_auto = name_run + "/" + "thermal-conf.xml.auto";
 }

heynnema (heynnema) wrote :

Thank you for reviewing this problem, and coming up with such a quick fix!

What's the fastest/best way for me to get the new code? I don't know how long it'll take to get into the normal repositories. I it's short, I can wait... but if it'll take a while...

Also, I've written my own thermal-conf.xml. Would it be too much to ask you to review it and see if it makes sense?

Thanks, Al

Colin Ian King (colin-king) wrote :

I'll try and get that packaged up and in a test PPA ASAP

Colin Ian King (colin-king) wrote :

I've applied fix (commit 4a890d7e173678644882e6b863f3650e30d33052) and uploaded the packed to a test PPA: ppa:colin-king/thermald-sru-1568123

This is currently building and will be available to test shortly. Can you test that this fixes the thermald config path issue for you:

sudo add-apt-repository ppa:colin-king/thermald-sru-1568123
sudo apt-get update
sudo apt-get upgrade

Let me know if this fixes the config file issue for you.

description: updated
description: updated
Changed in thermald (Ubuntu Xenial):
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
heynnema (heynnema) wrote :
Download full text (5.3 KiB)

Your patch has fixed the thermal-conf.xml file problem. Thanks!

If you could spend just a few minutes and look at my thermal-conf.xml file, I'd really appreciate it. The default didn't seem to work.

For these /sys entries:

/sys/class/thermal$ ll

lrwxrwxrwx 1 root root 0 Apr 29 07:55 cooling_device0 -> ../../devices/virtual/thermal/cooling_device0/
lrwxrwxrwx 1 root root 0 Apr 29 07:55 cooling_device1 -> ../../devices/virtual/thermal/cooling_device1/
lrwxrwxrwx 1 root root 0 Apr 29 07:55 cooling_device2 -> ../../devices/virtual/thermal/cooling_device2/
lrwxrwxrwx 1 root root 0 Apr 29 07:55 cooling_device3 -> ../../devices/virtual/thermal/cooling_device3/
lrwxrwxrwx 1 root root 0 Apr 29 07:55 cooling_device4 -> ../../devices/virtual/thermal/cooling_device4/
lrwxrwxrwx 1 root root 0 Apr 29 07:55 thermal_zone0 -> ../../devices/virtual/thermal/thermal_zone0/

cooling devices 0-3 are type processor
cooling device 4 is type intel_powerclamp
thermal zone 0 is type x86_pkg_temp

Here's my new thermal-conf.xml file:

$ more thermal-conf.xml

<?xml version="1.0"?>
<ThermalConfiguration>
<Platform>
        <Name>Toshiba Laptop</Name>
        <ProductName>*</ProductName>
        <Preference>QUIET</Preference>
        <ThermalSensors>
                <ThermalSensor>
                        <Type>x86_pkg_temp</Type>
                        <Path>/sys/class/thermal/thermal_zone0/</Path>
                        <AsyncCapable>1</AsyncCapable>
                </ThermalSensor>
        </ThermalSensors>
        <ThermalZones>
                <ThermalZone>
                        <Type>cpu package</Type>
                        <TripPoints>
                                <TripPoint>
                                        <SensorType>x86_pkg_temp</SensorType>
                                        <Temperature>59000</Temperature>
                                        <type>passive</type>
                                        <ControlType>PARALLEL</ControlType>
                                        <CoolingDevice>
                                                <index>0</index>
                                                <type>Processor</type>
                                                <influence> 10 </influence>
                                                <SamplingPeriod> 5 </SamplingPeriod>
                                        </CoolingDevice>
                                        <CoolingDevice>
                                                <index>1</index>
                                                <type>Processor</type>
                                                <influence> 10 </influence>
                                                <SamplingPeriod> 5 </SamplingPeriod>
                                        </CoolingDevice>
                                        <CoolingDevice>
                                                <index>2</index>
                                                <type>Processor</type>
                                                <influence> 10 </influence>
                                                <SamplingPeriod> 5 </SamplingPeriod>
                                      ...

Read more...

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package thermald - 1.5-3

---------------
thermald (1.5-3) unstable; urgency=medium

  * Update Standards-Version to 3.9.8
  * Fix incorrect path to thermald config file (LP: #1568123)
    - upstream commit 4a890d7e173678644882e6b863f3650e30d33052

 -- Colin King <email address hidden> Fri, 29 Apr 2016 07:07:57 +0000

Changed in thermald (Ubuntu):
status: In Progress → Fix Released

@heynnema,

I think you just wanted to lower your temperature threshold. If that's what you want, then you can just send

dbus-send --system --dest=org.freedesktop.thermald /org/freedesktop/thermald org.freedesktop.thermald.SetUserPassiveTemperature string:cpu uint32:59000

This setting is saved, so you have to do only one time.

heynnema (heynnema) wrote :

The default thermal-conf.xml file doesn't allow thermald to manage the temperature on my laptop. So I wrote my own thermal-conf.xml file, shown above in #10. I'm concerned that there's too much unnecessary stuff in there, or that I might have added cooling devices incorrectly. However, it does work. The temp on my laptop now stays between 115F to 141F.

Could you please take a look at it, and tell me what you think?

Cheers, Al

Hello Laurent, or anyone else affected,

Accepted thermald into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/thermald/1.5-2ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in thermald (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed

With the updated thermald version:

Package: thermald
Version: 1.5-2ubuntu1

I still get error messages but the error messages about the config file are gone:

mai 03 09:00:17 vougeot thermald[23949]: 13 CPUID levels; family:model:stepping 0x6:2a:7 (6:42:7)
mai 03 09:00:17 vougeot thermald[23949]: Polling mode is enabled: 4
mai 03 09:00:17 vougeot thermald[23949]: failed to open /dev/acpi_thermal_rel
mai 03 09:00:17 vougeot thermald[23949]: failed to open /dev/acpi_thermal_rel
mai 03 09:00:17 vougeot thermald[23949]: TRT/ART read failed
mai 03 09:00:17 vougeot thermald[23949]: sysfs write failed enabled
mai 03 09:00:17 vougeot thermald[23949]: sysfs write failed trip_point_0_temp

Thanks for the SRU!

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package thermald - 1.5-2ubuntu1

---------------
thermald (1.5-2ubuntu1) xenial; urgency=medium

  * Fix incorrect path to thermald config file (LP: #1568123)
    - upstream commit 4a890d7e173678644882e6b863f3650e30d33052

 -- Colin King <email address hidden> Fri, 29 Apr 2016 07:07:57 +0000

Changed in thermald (Ubuntu Xenial):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for thermald has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

SunBear (sunbear-c22) wrote :

Hi. I am still getting the same failure message despite using the latest thermald version. Can you tell me how to solve these failures.

May 18 10:08:11 Eliot thermald[1012]: failed to open /dev/acpi_thermal_rel
May 18 10:08:11 Eliot thermald[1012]: failed to open /dev/acpi_thermal_rel
May 18 10:08:11 Eliot thermald[1012]: TRT/ART read failed
May 18 10:08:12 Eliot thermald[1012]: sysfs read failed constraint_0_max_power_uw
May 18 10:08:12 Eliot thermald[1012]: sysfs write failed trip_point_0_temp
May 18 10:08:12 Eliot thermald[1012]: sysfs write failed trip_point_0_temp

$: dpkg -l thermald
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=====================================-=======================-=======================-===============================================================================
ii thermald 1.5-2ubuntu1 amd64 Thermal monitoring and controlling daemon

OS is Ubuntu 16.04 LTS

Jean-Pierre van Riel (jpvr) wrote :

@SunBear

I'm also having issues with these apparent errors:

thermald[1309]: 13 CPUID levels; family:model:stepping 0x6:3c:3 (6:60:3)
thermald[1309]: Polling mode is enabled: 4
thermald[1309]: sysfs write failed enabled
thermald[1309]: sysfs read failed constraint_0_max_power_uw
thermald[1309]: sysfs write failed trip_point_0_temp

Not sure, might be related to https://github.com/01org/thermal_daemon/issues/82

"no kernel driver to handle PCH sensor in Haswell"

And that involved a kernel patch?

On Thu, 2016-11-10 at 10:10 +0000, Jean-Pierre van Riel wrote:
> @SunBear
>
> I'm also having issues with these apparent errors:
>
> thermald[1309]: 13 CPUID levels; family:model:stepping 0x6:3c:3
> (6:60:3)
> thermald[1309]: Polling mode is enabled: 4
> thermald[1309]: sysfs write failed enable

Possibly you don't have CONFIG_THERMAL_WRITABLE_TRIPS=y in your kernel.
But this is not required, this is for the case when polling is
disabled.

> thermald[1309]: sysfs read failed constraint_0_max_power_uw
In some systems this is not provided, then we need to calculate power
dynamically.

> thermald[1309]: sysfs write failed trip_point_0_temp
>
Thanks,
Srinivas

> Not sure, might be related to
> https://github.com/01org/thermal_daemon/issues/82
>
> "no kernel driver to handle PCH sensor in Haswell"
>
> And that involved a kernel patch?
>
> ** Bug watch added: github.com/01org/thermal_daemon/issues #82
>    https://github.com/01org/thermal_daemon/issues/82
>

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.