Thinkfan will occasionally send garbage to fan control (setfan_ibm: Error writing to /proc/acpi/ibm/fan)

Bug #1494546 reported by Shuhao
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
thinkfan (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Thinkfan, for some reason, will occasionally send garbage output to /proc/acpi/ibm/fan, which causes the error to error with invalid argument, causing thinkfan to crash.

Thinkfan sometimes will work, but other times will not. I do not know the exact condition to reproduce.

This is due to a really strange bug in the software. I'm not sure I quite figured it out, but here's an analysis anyway (patch for this at the end). Note that I only used this in thinkpad fan mode, so idk about the behaviour with just pwm.

 In the file system.c, the setfan_ibm function is defined as

void setfan_ibm() {
 int ibm_fan, l = strlen(cur_lvl);

 if (unlikely((ibm_fan = open(IBM_FAN, O_RDWR, O_TRUNC)) < 0)) {
  prefix = "\n";
  report(LOG_ERR, LOG_ERR, IBM_FAN ": %s\n", strerror(errno));
  errcnt |= ERR_FAN_SET;
 }
 else {
  if (unlikely(write(ibm_fan, cur_lvl, l) < l)) {
   prefix = "\n";
   report(LOG_ERR, LOG_ERR, MSG_ERR_FANCTRL);
   errcnt |= ERR_FAN_SET;
  }
  close(ibm_fan);
 }
}

Note that it is trying to write `cur_lvl` to `ibm_fan`, which is /proc/acpi/ibm/fan.

If you look around, you'll realize a couple of things:

1. setfan_ibm() and what not is actually never called directly, it is called via config->setfan(), which is also almost never called directly, once in thinkpad.c:159, thinkpad.c:42, and system.c:216. There is a macro defined at around thinkpad.c:40 known as `set_fan` that also sets the cur_lvl.
2. There are only 2 locations where `cur_lvl` is set: in the `set_fan` macro as discussed, once on main before everything boots, and once when we read the configuration.

With some printf debugging, you can see once the program boots, it will first readconfig, which will give cur_lvl a value of "127", which is not in the format of "level <i>". Sending this to /proc/acpi/ibm/fan will fail. The code does set the correct limits (config.c:222), but `cur_lvl` is never changed.

Next, `setfan()` is called directly at thinkfan.c:159, which I believe usually only happens during first boot. This means `cur_lvl` is never set to a correct value. For some reason, adding a printf with cur_lvl's value here results in the string "V". This is what I'm not quite sure about.

In any case, switching line 159 to the `set_fan` macro works.

Tags: patch
Revision history for this message
Shuhao (shuhao) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "0001-Fixed-thinkfan-writing-out-garbage-to-fan-control.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Victor Mataré (matare) wrote :

The patch looks good to me, I committed it to the master branch on github: https://github.com/vmatare/thinkfan .
The plausible explanation for this is that thinkfan may remain idle for 120 seconds or more if temperatures stay below the threshold for level 1. Then after 120 seconds the watchdog timeout kicks in, but cur_lvl has never been initialized because we never did anything. So this means that this bug should be triggered consistently 120 seconds after thinkfan has been started, in case it has been sitting idle the whole time.

Revision history for this message
Shuhao (shuhao) wrote :

Thanks! I'm not sure who controls the packaging for this but is it possible to make a release so that at least 15.10 and above will pick this up?

Revision history for this message
Shuhao (shuhao) wrote :

If anyone encounters this, I created a package based on the latest 0.9 branch here: https://launchpad.net/~shuhao/+archive/ubuntu/fixed

Revision history for this message
Sudhir Khanger (sudhirkhanger) wrote :

I regularly encounter this bug on startup on Fedora 23 even after applying the patch.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in thinkfan (Ubuntu):
status: New → Confirmed
Revision history for this message
Hai NGUYEN VAN (psaxl) wrote :

I have solved this problem by installing the newest rewritten version of Thinkfan [1]. It should be included in the official repositories for Ubuntu Xenial.

[1] https://github.com/vmatare/thinkfan

Revision history for this message
Seth Johnson (sethj) wrote :

I experienced this bug today on 16.04. I think I solved it by specifying tp_fan manually:

tp_fan /proc/acpi/ibm/fan

Then it started working. Maybe a coincidence though. It did work before.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.