thermald spamming syslog with 'sysfs write failed constraint_0_power_limit_uw'

Bug #1656528 reported by Graham Mitchell
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
thermald (Ubuntu)
Fix Released
Medium
Colin Ian King
Xenial
Fix Released
Undecided
Colin Ian King
Yakkety
Fix Released
Undecided
Colin Ian King
Zesty
Fix Released
Medium
Colin Ian King

Bug Description

[SRU REQUEST, Xenial, Yakkety]

Thermald is spamming the log with error messages on sysfs write failure messages. Demote these to debug level errors so that they aren't logged unless running in debug mode.

[FIX]

See comment #24, with the fix the messages are not logged when in normal non-debug running mode.

[REGRESSION POTENIAL]

Minimal, this just reduces the logging spam by turning messages into debug only for when running in debug mode.

---------------------------------------

This seems somewhat related to https://bugs.launchpad.net/ubuntu/+source/thermald/+bug/1543046

For some reason thermald is writing the message "sysfs write failed constraint_0_power_limit_uw" to the syslog every few seconds. However, there's nothing in the kernel log.

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.10
DISTRIB_CODENAME=yakkety
DISTRIB_DESCRIPTION="Ubuntu 16.10"

$ uname -a
Linux graham-desktop 4.8.0-34-generic #36-Ubuntu SMP Wed Dec 21 17:24:18 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

$ dpkg -l | grep thermald
ii thermald 1.5.3-4 amd64 Thermal monitoring and controlling daemon

$ grep "sysfs write failed constraint_0_power_limit_uw" /var/log/syslog | wc -l
14160

$ grep thermald /var/log/kern.log | wc -l
0

I'm an experienced Linux user but not an experienced bug submitter, so please let me know if I have failed to provide any necessary info. Am happy to attach additional logs.

summary: - thermald spamming syslog with sysfs write failed
- constraint_0_power_limit_uw
+ thermald spamming syslog with 'sysfs write failed
+ constraint_0_power_limit_uw'
Changed in thermald (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
Revision history for this message
Colin Ian King (colin-king) wrote :

@Graham,

I'm interested in seeing all the kernel log messages as that may give a hint on why this is happening. Can you attach it to the bug report?

I suspect you may also be getting the kernel reporting "powercap intel-rapl:0: package locked by BIOS, monitoring only"

Colin

Revision history for this message
Colin Ian King (colin-king) wrote :

@Srinivas, any ideas? It seems that the rapl interface is not writable, which is a surprise to me.

Revision history for this message
Colin Ian King (colin-king) wrote :

@Graham, just one more thing, which processor do you have?

grep "model name" /proc/cpuinfo

Revision history for this message
Graham Mitchell (graham-grahammitchell) wrote :

$ grep "model name" /proc/cpuinfo
model name : Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz (repeated 8 times)

And there's really nothing in the kernel log, I swear. I've attached it just so you can see for yourself.

Revision history for this message
Colin Ian King (colin-king) wrote :

Hi, can you run:

journalctl -S 2017-01-01 | grep thermald > thermald.log

and attach the log to the bug. Thanks!

Revision history for this message
Graham Mitchell (graham-grahammitchell) wrote :

Attached. I'll save you some suspense, though; it's literally the same line over and over 38,000 times.

$ cut -b 16- thermald.log | sort | uniq -c
  38147 graham-desktop thermald[1086]: sysfs write failed constraint_0_power_limit_uw

Revision history for this message
Colin Ian King (colin-king) wrote :

Thanks, the frequency of the spamming is a useful data point. Finally, I'd like to get some more context around today's spamming messages, so can you run:

journalctl -S 2017-01-16 > thermald-2.log

and attach that too. Thanks again.

Revision history for this message
Colin Ian King (colin-king) wrote :

I'd also like to see how the write failure is occurring and get some context around that, so can you also trace thermald as follows:

sudo strace -p $(pidof thermald) >& strace.log

let thus run for a minute or so, and then hit Control C and attach the strace.log file to the bug report too.

Revision history for this message
Graham Mitchell (graham-grahammitchell) wrote :

Yikes. Well, it looks like I need to disable SSH password authentication but here it is.

Revision history for this message
Graham Mitchell (graham-grahammitchell) wrote :

I attached the strace log.

Is there a chance it's related to this?

http://unix.stackexchange.com/questions/285529/internet-stops-working-failed-to-add-run-systemd-ask-password-to-directory-wa

https://bugzilla.redhat.com/show_bug.cgi?id=894483

After I updated /etc/ssh/sshd_config I got an interesting error:
$ sudo service ssh restart
Failed to add /run/systemd/ask-password to directory watch: No space left on device

...which leads me to wonder if the Crashplan back-up service is the most likely culprit. "It uses many inotify watches and, eventually, eats them all up."

Revision history for this message
Graham Mitchell (graham-grahammitchell) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

I'm more included to wonder why the write to the rapl sysfs file is not working, it should do. So, that strace didn't quite capture the failure, so do you mind re-running it once more, this time for ~10 minutes of time.

sudo strace -p $(pidof thermald) >& strace-2.log

Thanks.

Revision history for this message
Graham Mitchell (graham-grahammitchell) wrote :

I've got good and bad news. I configured Crashplan to no longer "Watch file system in real-time" and rebooted (to get rid of any stale open inotify watches).

And so far there hasn't been a single 'sysfs write failed constraint_0_power_limit_uw' message written to the syslog since rebooting (19 minutes and counting).

So the bad news is I can't provide good diagnostic information at the moment unless it starts back up again.

Revision history for this message
Graham Mitchell (graham-grahammitchell) wrote :

For the record, Crashplan is backing up a LOT of files for me: 3,324,286 files, so "watch file system in real-time" really could have been a LOT of inotify watches.

25 minutes of uptime and still no thermald messages to the syslog.

Revision history for this message
Colin Ian King (colin-king) wrote :

Thanks for those observations. It may be that we're just low on free file descriptors or some other resource because of all the inotify watches and hence thermald can't open the file for writing. It's hard to tell. Anyhow, thermald should not be spamming the log, so we need to fix thermald not to spam the log so aggressively.

Revision history for this message
Graham Mitchell (graham-grahammitchell) wrote :

Thanks for all your attention to my little issue. Sorry for rebooting in the middle of a diagnostic session; I guess that's bad form.

Revision history for this message
Graham Mitchell (graham-grahammitchell) wrote :

I'll reopen the bug if they return at some point and try to get you that strace.

Revision history for this message
Colin Ian King (colin-king) wrote :

Thank for you help in assisting me to debug this. I'm very happy to pick this up again if you see the issue once more.

Revision history for this message
Graham Mitchell (graham-grahammitchell) wrote :

Well, it took a week but the messages started back up again. I did a 10-minute strace, so hopefully the log has the info you're looking for.

Changed in thermald (Ubuntu):
status: New → In Progress
Revision history for this message
Colin Ian King (colin-king) wrote :

Hi Graham,

I've created a new package with a fix to this issue. Can you install it and give it a soak test to see if this stops the log spamming. Use.

sudo add-apt-repository ppa:colin-king/thermald-lp1656528
sudo apt-get update && sudo apt-get upgrade

I suggest loading the machine for a good few minutes to see if you can trip the RAPL warnings. One can use a tool such as stress-ng to do this, using:

sudo apt-get install stress-ng
stress-ng --cpu 0 -t 300s

(stresses all the CPUs for 300 seconds).

Let me know if this fixes the issue or not.

Revision history for this message
Graham Mitchell (graham-grahammitchell) wrote :

The VERY first time I ran the stress test there was a single message:

Feb 1 18:04:11 graham-desktop stress-ng: invoked with 'stress-ng --cpu 0 -t 300s' by user 1000
Feb 1 18:04:11 graham-desktop stress-ng: system: 'graham-desktop' Linux 4.8.0-34-generic #36-Ubuntu SMP Wed Dec 21 17:24:18 UTC 2016 x86_64
Feb 1 18:04:11 graham-desktop stress-ng: memory (MB): total 7891.60, free 198.99, shared 402.29, buffer 100.01, swap 8192.00, free swap 5817.87
Feb 1 18:04:17 graham-desktop thermald[9128]: sysfs write failed constraint_0_power_limit_uw

I did it twice more, about ten minutes apart and tried to use the machine while testing and the message didn't appear again.

I have to reboot my computer again because of another update, but I'll keep an eye on the syslog and keep you posted.

Revision history for this message
Colin Ian King (colin-king) wrote :

@Graham,

The result looks good then - we should get one write failure being logged and then that interface is disabled to stop subsequent failed writes and hence you should only see the one message.

Thanks for testing.

Revision history for this message
Srinivas Pandruvada (srinivas-pandruvada) wrote :

@Graham,

I see you attached kernel log, but doesn't have boot time log. Can you just boot the system and

# dmesg | grep "locked by BIOS"

Revision history for this message
Colin Ian King (colin-king) wrote :

FYI, I've updated the thermald package in the PPA to a version 1.5.3-5~2, this contains a proposed change as suggested by Srinivas (see below). I'd be interested to see how this works for you.

..to get the updated thermald please run:

sudo apt-get update && sudo apt-get upgrade

diff --git a/src/thd_sys_fs.cpp b/src/thd_sys_fs.cpp
index 044c8f0..055fc28 100644
--- a/src/thd_sys_fs.cpp
+++ b/src/thd_sys_fs.cpp
@@ -30,13 +30,13 @@ int csys_fs::write(const std::string &path, const
std::string &buf) {
        std::string p = base_path + path;
        int fd = ::open(p.c_str(), O_WRONLY);
        if (fd < 0) {
- thd_log_warn("sysfs write failed %s\n", p.c_str());
+ thd_log_debug("sysfs write failed %s\n", p.c_str());
                return -errno;
        }
        int ret = ::write(fd, buf.c_str(), buf.size());
        if (ret < 0) {
                ret = -errno;
- thd_log_warn("sysfs write failed %s\n", p.c_str());
+ thd_log_debug("sysfs write failed %s\n", p.c_str());
        }
        close(fd);

@@ -48,17 +48,17 @@ long long data) {
        std::string p = base_path + path;
        int fd = ::open(p.c_str(), O_WRONLY);
        if (fd < 0) {
- thd_log_warn("sysfs write failed %s\n", p.c_str());
+ thd_log_debug("sysfs write failed %s\n", p.c_str());
                return -errno;
        }
        if (::lseek(fd, position, SEEK_CUR) == -1) {
- thd_log_warn("sysfs write failed %s\n", p.c_str());
+ thd_log_debug("sysfs write failed %s\n", p.c_str());
                close(fd);
                return -errno;
        }
        int ret = ::write(fd, &data, sizeof(data));
        if (ret < 0)
- thd_log_warn("sysfs write failed %s\n", p.c_str());
+ thd_log_debug("sysfs write failed %s\n", p.c_str());
        close(fd);

        return ret;

Revision history for this message
Graham Mitchell (graham-grahammitchell) wrote :

Updated to the latest version of thermald and rebooted.

$ dmesg | grep "locked by BIOS"

(no output)

Revision history for this message
Colin Ian King (colin-king) wrote :

Thanks Graham, that's as expected.

Changed in thermald (Ubuntu):
status: In Progress → Fix Released
Changed in thermald (Ubuntu):
status: Fix Released → In Progress
Changed in thermald (Ubuntu Yakkety):
status: New → In Progress
Changed in thermald (Ubuntu Xenial):
status: New → In Progress
description: updated
Changed in thermald (Ubuntu Zesty):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package thermald - 1.5.4-4

---------------
thermald (1.5.4-4) unstable; urgency=medium

  * fix spamming of logs when writes file (LP: #1656528)
    - patch from Srinivas Pandruvada at Intel, not yet mainline

 -- Colin King <email address hidden> Thu, 23 Feb 2017 09:26:23 +0000

Changed in thermald (Ubuntu Zesty):
status: Fix Committed → Fix Released
Revision history for this message
Colin Ian King (colin-king) wrote :

Hello Graham, or anyone else affected,

Accepted thermald into yakkety-proposed and xenial-proposed. The package is now available in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in thermald (Ubuntu Yakkety):
assignee: nobody → Colin Ian King (colin-king)
Changed in thermald (Ubuntu Xenial):
assignee: nobody → Colin Ian King (colin-king)
tags: added: verification-needed-xenial verification-needed-yakkety
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

The SRU change itself looks good, but what I don't like and miss is any DEP-3 headers for the included patch - or at least the Description field of that. One reason is: formalities. The other: the change is fairly small and simple in overall, but someone might start wondering why suddenly a warning message got demoted to a debug, which is not really clear without a description.
Could you please add that and re-upload with the same version number? I would reject the old ones then.

Thank you!

Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Graham, or anyone else affected,

Accepted thermald into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/thermald/1.5.3-4ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in thermald (Ubuntu Yakkety):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Colin Ian King (colin-king) wrote :

Hi Brian, I've been running thermald and overheating a laptop with stress-ng for ~30+ minutes and can't see the log spamming, so this is tested and shown to be fixed. Marking as verified for Yakkety.

tags: added: verification-done-yakkety
removed: verification-needed-yakkety
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Graham, or anyone else affected,

Accepted thermald into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/thermald/1.5-2ubuntu3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in thermald (Ubuntu Xenial):
status: In Progress → Fix Committed
Revision history for this message
Colin Ian King (colin-king) wrote :

Hi Brian, I've been running thermald on Xenial with the -proposed thermald and overheating a laptop with stress-ng for ~30+ minutes and can't see the log spamming, so this is tested and shown to be fixed. Marking as verified for Xenial too.

tags: added: verification-done verification-done-xenial
removed: verification-needed verification-needed-xenial
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package thermald - 1.5.3-4ubuntu1

---------------
thermald (1.5.3-4ubuntu1) yakkety; urgency=medium

  * fix spamming of logs when writes file (LP: #1656528)
    - patch from Srinivas Pandruvada at Intel, not yet mainline

 -- Colin King <email address hidden> Thu, 23 Feb 2017 09:26:23 +0000

Changed in thermald (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for thermald has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package thermald - 1.5-2ubuntu3

---------------
thermald (1.5-2ubuntu3) xenial; urgency=medium

  * fix spamming of logs when writes file (LP: #1656528)
    - patch from Srinivas Pandruvada at Intel, not yet mainline

 -- Colin King <email address hidden> Wed, 15 Mar 2017 17:32:33 +0000

Changed in thermald (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
Andy Whitcroft (apw) wrote : Please test proposed package

Hello Graham, or anyone else affected,

Accepted thermald into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/thermald/1.7.0-5ubuntu4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in thermald (Ubuntu Bionic):
status: New → Fix Committed
tags: added: verification-needed verification-needed-bionic
removed: verification-done
Revision history for this message
Colin Ian King (colin-king) wrote :

The bionic SRU test message occurred because I accidentally uploaded the package with the entire old history. This bug has already been fixed and the verification for bionic can be ignored.

no longer affects: thermald (Ubuntu Bionic)
tags: added: verification-done
removed: verification-needed verification-needed-bionic
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.