Fan stops after resume from suspend leading to overheating; requires reboot to fix [HP Probook 4710s and many others]

Bug #1290110 reported by Jernej Jakob
54
This bug affects 9 people
Affects Status Importance Assigned to Milestone
Ubuntu
Expired
High
linux (Ubuntu)
Fix Released
Medium
Dhinak G

Bug Description

After resume from suspend, the fan (there is only one) first speeds up to about a normal speed and then slows down in a couple of seconds to a complete stop. It doesn't spin up again even if the temperatures shoot way up (monitoring using indicator-sensors or lm-sensors, both use acpitz info).
There are 2 ways to get the fan spinning again:
- reboot the system
- echo 1 > /sys/class/thermal/cooling_device[4|12...don't remember which]/curr_state

This is happening on a clean install of Trusty from daily build that also has the updates-proposed source enabled. Previously the laptop ran 13.10 and suspend and resume worked fine.
Only thing is, I also updated the BIOS during formatting for Trusty, which could cause this issue but I'm leaning more towards an ACPI issue (the latest BIOS is from 2010 and someone would have probably noticed a problem with the fan?)

Attached is the output from cd /sys/class/thermal && grep . */* 2&>/dev/null after first boot and after resume suspend; as can be seen, after resume all fan states are 0 whereas on boot some are also 1.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-16-generic 3.13.0-16.36 [modified: boot/vmlinuz-3.13.0-16-generic]
ProcVersionSignature: Ubuntu 3.13.0-16.36-generic 3.13.5
Uname: Linux 3.13.0-16-generic x86_64
ApportVersion: 2.13.3-0ubuntu1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: jernej 2582 F.... pulseaudio
 /dev/snd/controlC0: jernej 2582 F.... pulseaudio
CurrentDesktop: Unity
CurrentDmesg:
 [ 78.199783] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
 [ 104.824460] audit_printk_skb: 147 callbacks suppressed
 [ 104.824464] type=1400 audit(1394393526.590:61): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/lib/cups/backend/cups-pdf" pid=1986 comm="apparmor_parser"
 [ 104.824472] type=1400 audit(1394393526.590:62): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/cupsd" pid=1986 comm="apparmor_parser"
 [ 104.824884] type=1400 audit(1394393526.590:63): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="/usr/sbin/cupsd" pid=1986 comm="apparmor_parser"
Date: Sun Mar 9 20:46:48 2014
HibernationDevice: RESUME=UUID=79b35f90-100a-4582-9b65-a84fbedf5bf7
InstallationDate: Installed on 2014-03-08 (0 days ago)
InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Alpha amd64 (20140308)
MachineType: Hewlett-Packard HP ProBook 4710s
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.13.0-16-generic root=UUID=9561bd56-ffa3-4f61-91cf-019c359ed8e8 ro quiet splash radeon.dpm=1 vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-16-generic N/A
 linux-backports-modules-3.13.0-16-generic N/A
 linux-firmware 1.126
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 12/02/2010
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: 68PZI Ver. F.18
dmi.board.name: 3074
dmi.board.vendor: Hewlett-Packard
dmi.board.version: KBC Version 24.0F
dmi.chassis.asset.tag: CNU9465LRD
dmi.chassis.type: 10
dmi.chassis.vendor: Hewlett-Packard
dmi.modalias: dmi:bvnHewlett-Packard:bvr68PZIVer.F.18:bd12/02/2010:svnHewlett-Packard:pnHPProBook4710s:pvrF.18:rvnHewlett-Packard:rn3074:rvrKBCVersion24.0F:cvnHewlett-Packard:ct10:cvr:
dmi.product.name: HP ProBook 4710s
dmi.product.version: F.18
dmi.sys.vendor: Hewlett-Packard

Revision history for this message
In , sunmooon15 (sunmooon15-linux-kernel-bugs) wrote :

My fans are acting strangely since 3.13 upgrade.

Behaviour on 3.12:

Fans running pretty much all the time on 30%, temperatures 30-40 C.

Behaviour on 3.13:

Fans are idle until temperatures rise to 84 C (this is hot!), then ramp up to 75% (high noise) for a few seconds until temperatures drop to 72 C. Then they idle again.

This seems pretty dangerous, because the threshold of 84 degrees is just too high. I'd be fine with 60.

Laptop: macbook air 2013
OS: Archlinux

Revision history for this message
In , linux (linux-linux-kernel-bugs) wrote :

Seen with other systems as well.

Additional information: https://bugs.archlinux.org/task/39005

Revision history for this message
In , linux (linux-linux-kernel-bugs) wrote :
Download full text (4.3 KiB)

e-mail exchange on the subject.

On 2014-03-08 16:59, Guenter Roeck wrote:
> On 03/08/2014 03:08 AM, Jean Delvare wrote:
>> On Fri, 7 Mar 2014 14:52:30 -0800, Guenter Roeck wrote:
>>> On Fri, Mar 07, 2014 at 11:04:29PM +0100, Manuel Krause wrote:
>>>> Hi, and thanks for the quick response!
>>>> No special fancy "fan control policy". 'fancontrol' isn't up or
>>>> running.
>>>> Vanilla kernels 3.11.* and 3.12.* had been working on here
>>>> without
>>>> any extra work.
>>>> --
>>>> # sensors
>>>> acpitz-virtual-0
>>>> Adapter: Virtual device
>>>> temp1: +71.0°C (crit = +256.0°C)
>>>> temp2: +69.0°C (crit = +110.0°C)
>>>> temp3: +52.0°C (crit = +105.0°C)
>>>> temp4: +25.0°C (crit = +110.0°C)
>>>> temp5: +58.0°C (crit = +110.0°C)
>>>>
>>>> coretemp-isa-0000
>>>> Adapter: ISA adapter
>>>> Core 0: +62.0°C (high = +105.0°C, crit = +105.0°C)
>>>> Core 1: +60.0°C (high = +105.0°C, crit = +105.0°C)
>>>> --
>>>> My notebook (HP/Compaq 6730b) does not have a seperate fan
>>>> sensor.
>>>> This is with 3.12.13 with my normal workload.
>>>>
>>>> Please, trust my above mentionned values of 94 °C vs. 74°C as I
>>>> don't like to boot 3.13.6 anymore, to avoid harm to the
>>>> notebook's
>>>> casing.
>>>
>>> Understood. Unfortunately, we'll need to get information
>>> from the new kernel to be able to track down the problem.
>>
>> Indeed. Not only the run-time temperatures, but also the high
>> and crit
>> limits.
>>
>>>> But I'd do to test any improvement-patch.
>>>
>>> So far I have no idea what is going on. I don't see anything
>>> in the
>>> drivers providing above data that would explain the behavior,
>>> but I might be missing something.
>>
>> Looks like a regression in the acpi subsystem or in power
>> management,
>> not hwmon. Hwmon is merely reporting the temperatures, it's not
>> responsible for the actual temperatures.
>>
>
> I would agree. I don't think we have enough information to be sure,
> though. There might be some unintended interaction or interference.
>
> gpu is a good hint ... for example, look at commit b9ed919f1c8
> (drm/nouveau/drm/pm: remove everything except the hwmon interfaces
> to THERM). nouveau does export pwm and fan control information,
> so any change in that code may have unintended side effects.
> Similar, I don't know how ec39f64bba (drm/radeon/dpm: Convert to
> use devm_hwmon_register_with_groups) could have the observed impact,
> as it is purely passive, but I prefer to be rather safe than sorry.
>
> This problem has now been submitted into bugzilla as
> https://bugzilla.kernel.org/show_bug.cgi?id=71711.
>
> Guenter
>

Sorry, for beeing late, had to search for/accumulate much info for you...
I hope, you like me to put it into one answer to you all CCing you.

My GFX is a GM45 Intel (mobile), shared memory, running the opensource Mesa drivers/extensions.
kernel-module: i915

According to the output of 'cpupower': I have
CPUidle driver: acpi_idle
CPUidle governor: menu

CPUfreq:
  driver: acpi-cpufreq
  available cpufreq governors: ondemand, performance
-
And "ondemand" is running.
--

# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +41.0°C (crit = ...

Read more...

Revision history for this message
Jernej Jakob (jjakob) wrote :
Revision history for this message
Jernej Jakob (jjakob) wrote :
summary: Fan stops after resume from suspend leading to overheating; requires
- reboot to fix
+ reboot to fix [HP Probook 4710s]
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :
Download full text (6.2 KiB)

# Based on the shown email in Comment 2 Rafael J Wysocki asked me on 2014-03-09 # 18:58:
> This almost certainly is an ACPI regression, but I'm not sure whether
> thermal management or CPU power management is broken on your system.
>
> Can you compare the contents of /sys/class/thermal/ from working and
> not working kernels, please?
>
> Rafael
>
# which I answered the following way (I hope it'll be complete on here):

Hi again,
unfortunately you didn't specify how deeply I should dig into /sys/class/thermal. So you get the lines from # BOF # to # EOF # below. I hope they're readable without more comments.

The most remarkable changes, in my eyes, had happened within "thermal_zone1".

Best regards,
Manuel Krause

# BOF #
Following ones are all from /sys/class/thermal/ which are links to -> ../../devices/virtual/thermal/

I've listed the directories in sections of cooling_devices and thermal_zones separately for each bad/good kernel. For Emailing purposes only. You can merge them into a spreadsheet for your evaluation on your own. I've left out reporting some subdirs and subdir's values that _really_ didn't seem to need attention.

Also, I've had collected the #sensors output for each readout, having reproduced nearly the same workload, represented by the "Fan speed" (thermal_zone4==FDTZ).

And I've done my very best to not produce typos or c&p errors.

 3.13.5 -- 20140309 -- 20:52 -- bad
=============================
dir |-
                 /type /cur_state /max_state
cooling_device0 Processor 0 10
cooling_device1 Processor 0 10
cooling_device2 Fan 0 1
cooling_device3 Fan 1 1
cooling_device4 Fan 0 1
cooling_device5 Fan 0 1
cooling_device6 Fan 0 1
cooling_device7 LCD 0 24

 3.12.13 -- 20140310 -- 00:26 -- good
==============================
dir |-
                 /type /cur_state /max_state
cooling_device0 Processor 0 10
cooling_device1 Processor 0 10
cooling_device2 Fan 0 1
cooling_device3 Fan 1 1
cooling_device4 Fan 1 1
cooling_device5 Fan 1 1
cooling_device6 Fan 1 1
cooling_device7 LCD 0 24

 3.13.5 -- 20140309 -- 20:52 -- bad
=============================
dir |-
              /passive /temp |- /cdev?_ /trip_ /trip_
                                      trip_ point_ point_
                                      point ?_temp ?_type
thermal_zone0 0 68000 ?=0 n.a. 256000 critical
thermal_zone1 n.a. 70000 |-
                                ?=0 6 110000 critical
                                ?=1 5 107000 passive
                                ?=2 4 90000 active
                                ?=3 3 75000 active
                                ?=4 2 55000 active
                                ?=5 1 45000 active
                                ?=6 1 30000 active
thermal_zone2 n.a. 54000 |-
  ...

Read more...

Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Re: Fan stops after resume from suspend leading to overheating; requires reboot to fix [HP Probook 4710s]

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.14 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14-rc6-trusty/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :
Download full text (4.6 KiB)

# also posted to linux-kernel && linux-pm
# my findings from tonight:

Hi, and thank you for your attention ^^

at the bottom of this email you'd get the actual values for the new 3.12.14 kernel for two different levels of usage and ambient temperature.
You'd read, in kernel 3.12.14 the /cdev?_trip_point enumeration has changed to the way of 3.13.? and also one /trip_point_?_temp did. But 3.12.14 is working as well as 3.12.13. (So my first eyecatcher didn't lead to useful things.)
I'm not capaple of finding or understanding the related code, but, please, let me present an idea of what MAY be going on:

In 3.12.13+, on my system, the effective cooling fan speed seems to be an accumulation, maybe bitwise, of cooling_device[2-6]/cur_state, that each get activated (=1) by a certain other temperature value or level; each of the cooling_device[2-6]/cur_state stays @1 as long as their ref. temp. does not undershoot. For my system this ref. temp. would most likely be triggered by temp2 == thermal_zone1/temp [CPUZ].

In 3.13.? there seems to get only one of cooling_device[2-6]/cur_state be set to 1, the others left and/or rewritten with 0. And the fan speed algorithm then accumulates only one 1 without seeing the [_LEVEL_] number of cooling_device[2-6]... or re-requesting the related trigger temperature.

I hope this leads you developers nearer to a conclusion on how to fix it,
best regards, Manuel Krause

_____________________________
3.12.14 -- 20140311 -- 19:07 -- changed, not broken -- normal use
=============================
/sys/class/thermal/* which
are links to -> ../../devices/virtual/thermal/*

dir |-
                 /type /cur_state /max_state Maybe
                                                      trigger
                                                      /PWM
...
cooling_device2 Fan 0 1 not yet
                                                      observed
cooling_device3 Fan 0 1 FDTZ==58°C
cooling_device4 Fan 1 1 FDTZ==45°C
cooling_device5 Fan 1 1 FDTZ==34°C
cooling_device6 Fan 1 1 FDTZ==25°C
...

dir |-
              /passive /temp |- /cdev?_ /trip_ /trip_
                                      trip_ point_ point_
                                      point ?_temp ?_type
...
thermal_zone1 n.a. 73000 |- (CPUZ)
                                ?=0 6 110000 critical
                                ?=1 5 107000 passive
                                ?=2 4 90000 active
                                ?=3 3 75000 active
                                ?=4 2 55000 active
                                ?=5 1 45000 active
                                ?=6 1 30000 active
...
thermal_zone4 n.a. 45000 ?=0 n.a. 110000 critical (FDTZ)
...

# sensors
acpitz-virtual-0
Adapter: Virtual device
temp1: +46.0°C (crit = +256.0°C)
temp2: +73.0°C (crit = +110.0°C)
temp3: +57.0°C (crit = +105.0°C)
temp4: +26.3°C (crit = +110.0°C)
temp5: ...

Read more...

Revision history for this message
Jernej Jakob (jjakob) wrote :

Hi,

I've tested the latest mainline kernel (v3.16-rc6-trusty) and the bug is still present. Also it seems that the fan speed does not change relative to temperature any more after OS startup.

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

Once this bug is reported upstream, please add the tag: 'kernel-bug-reported-upstream'.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu):
status: Confirmed → Triaged
Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

[SNIP]

Long time no reply from you... Have I overseen a unwritten convention? Or were my charts that unusable for your analysis/work?

Two days ago, I tried the 3.14.0-rc7-vanilla. And the problem persists. "Strange / dangerous fan policy..."

Since kernel 3.13.6 I've managed to 'fix' the potential overheating problem by manually issuing a:
"echo 1 > /sys/class/thermal/cooling_device3/cur_state" *)
_before_ obviously critical temperatures occur. Remind: This particular setting may only work for my system! ...and keeps working for 3.14-rc.

In the following I'd like to present you a modified output of my /sys/class/thermal, that I've written a script for (for my system), that shows the results in the way of linux/Documentation/thermal/sysfs-api.txt, point 3:
{I've uploded the files to pastebin, to not swamp you and the lists with so many lines of logs.}

For the last good kernel -- 3.12.14 -- in-use:
 http://pastebin.com/HL1PNcda
For my first bad kernel revision 3.13 -- at critical temp:
 http://pastebin.com/98hgf1a9
For the last bad kernel -- 3.14.0-rc7 -- at critical temp:
 http://pastebin.com/MuTwTnjD
For the last bad kernel -- 3.14.0-rc7 -- after issuing the
 *) command:
 http://pastebin.com/2peda54z

Please, have a look at them! And maybe, give me hints on how I can help you to further debug this issue, as my manual method works but it's annoying.

And, PLEASE CC: ME, as I'm not on the lists. Or lead this Email-thread to someone in charge.

Thank you for your work && best regards,
Manuel Krause

Revision history for this message
Jernej Jakob (jjakob) wrote :

Thank you. I will report the bug upstream.

I have done a bit of investigation on which cooling devices correspond to different fan speeds. IMO the cooling device to trip point mappings are wrong (do not change fan speed linearly with accordance to temperature/trip points).
And I've also found that thermal zone 5 temperature corresponds exactly to the processor fan speed. This should definitely be reported as fan speed, not temperature.

Should I also detail my findings here, or post them to the kernel mailing list?

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

3.12.15 works very well
3.13.7 fails
3.14.0-rc8 fails

I've tried the tmon tool, now, too. Nice eyecandy and for monitoring!

I've tried to revert all "thermal" related patches from 3.12.14->3.13.7 from 3.13.7. But they don't seem to matter. (Even if I apply the vice-versa patch to 3.12.15.)

So "thermal" is out?

For the failing kernels: Not any reached trip point (active) triggers ONE fan action!

Next would be ACPI, to be investigated,

THX for this audience,
Manuel Krause

Revision history for this message
In , the.ant (the.ant-linux-kernel-bugs) wrote :

I'm not sure if this is related to this bug but since Kernel 3.13 my fan speed is far to high and noisy as soon as the system is booting up ... I'm using Fedora 20. With Kernel 3.12.X everything was fine instead and fan speed was on a acceptable level ...

[ant@fedorant ~]$ sensors
nouveau-pci-0100
Adapter: PCI adapter
fan1: 6693 RPM
temp1: +69.0°C (high = +95.0°C, hyst = +3.0°C)
                       (crit = +105.0°C, hyst = +5.0°C)
                       (emerg = +135.0°C, hyst = +5.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0: +52.0°C (high = +83.0°C, crit = +99.0°C)
Core 1: +50.0°C (high = +83.0°C, crit = +99.0°C)
Core 2: +52.0°C (high = +83.0°C, crit = +99.0°C)
Core 3: +50.0°C (high = +83.0°C, crit = +99.0°C)

it8720-isa-0a10
Adapter: ISA adapter
in0: +0.86 V (min = +0.00 V, max = +4.08 V) ALARM
in1: +3.04 V (min = +0.00 V, max = +4.08 V) ALARM
in2: +3.33 V (min = +0.00 V, max = +4.08 V) ALARM
+5V: +3.04 V (min = +0.00 V, max = +4.08 V) ALARM
in4: +2.94 V (min = +0.00 V, max = +4.08 V) ALARM
in5: +2.16 V (min = +0.00 V, max = +4.08 V) ALARM
in6: +2.16 V (min = +0.00 V, max = +4.08 V) ALARM
5VSB: +2.96 V (min = +0.00 V, max = +4.08 V) ALARM
Vbat: +2.99 V
fan1: 838 RPM (min = 0 RPM)
fan2: 949 RPM (min = 0 RPM)
temp1: +127.0°C (low = -1.0°C, high = +127.0°C) ALARM sensor = thermal diode
temp2: +22.0°C (low = -1.0°C, high = +127.0°C) ALARM sensor = thermistor
temp3: -47.0°C (low = -1.0°C, high = +127.0°C) sensor = Intel PECI
cpu0_vid: +0.000 V
intrusion0: ALARM

Any ideas?

Revision history for this message
In , linux (linux-linux-kernel-bugs) wrote :

On 04/02/2014 01:39 AM, <email address hidden> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>
> --- Comment #7 from Roman Spirgi <email address hidden> ---
> I'm not sure if this is related to this bug but since Kernel 3.13 my fan
> speed
> is far to high and noisy as soon as the system is booting up ... I'm using
> Fedora 20. With Kernel 3.12.X everything was fine instead and fan speed was
> on
> a acceptable level ...
>
> [ant@fedorant ~]$ sensors
> nouveau-pci-0100
> Adapter: PCI adapter
> fan1: 6693 RPM
> temp1: +69.0°C (high = +95.0°C, hyst = +3.0°C)
> (crit = +105.0°C, hyst = +5.0°C)
> (emerg = +135.0°C, hyst = +5.0°C)
>
Looks like Nouveau fan control does not work. No idea what may be causing this ...
well, possibly. There are two suspicious commits between 3.12 and 3.13.
Maybe the "remove everything" commit has undesirable side effects.

eec9901 drm/nouveau/hwmon: fix compilation without CONFIG_HWMON
b9ed919 drm/nouveau/drm/pm: remove everything except the hwmon interfaces to THERM

I would suggest to open a separate bug against the Nouveau component.

[ Side note: The displayed values for hyst are wrong. Those should be absolute
   temperatures, not temperature differences. But that is yet another bug. ]

> coretemp-isa-0000
> Adapter: ISA adapter
> Core 0: +52.0°C (high = +83.0°C, crit = +99.0°C)
> Core 1: +50.0°C (high = +83.0°C, crit = +99.0°C)
> Core 2: +52.0°C (high = +83.0°C, crit = +99.0°C)
> Core 3: +50.0°C (high = +83.0°C, crit = +99.0°C)
>
> it8720-isa-0a10
> Adapter: ISA adapter
> in0: +0.86 V (min = +0.00 V, max = +4.08 V) ALARM
> in1: +3.04 V (min = +0.00 V, max = +4.08 V) ALARM
> in2: +3.33 V (min = +0.00 V, max = +4.08 V) ALARM
> +5V: +3.04 V (min = +0.00 V, max = +4.08 V) ALARM
> in4: +2.94 V (min = +0.00 V, max = +4.08 V) ALARM
> in5: +2.16 V (min = +0.00 V, max = +4.08 V) ALARM
> in6: +2.16 V (min = +0.00 V, max = +4.08 V) ALARM
> 5VSB: +2.96 V (min = +0.00 V, max = +4.08 V) ALARM
> Vbat: +2.99 V
> fan1: 838 RPM (min = 0 RPM)
> fan2: 949 RPM (min = 0 RPM)
> temp1: +127.0°C (low = -1.0°C, high = +127.0°C) ALARM sensor =
> thermal diode
> temp2: +22.0°C (low = -1.0°C, high = +127.0°C) ALARM sensor =
> thermistor
> temp3: -47.0°C (low = -1.0°C, high = +127.0°C) sensor = Intel PECI
> cpu0_vid: +0.000 V
> intrusion0: ALARM
>

Something in your system configuration is wrong. Usually this comes from the BIOS,
so you you might want to check if there is a BIOS upgrade available. It looks like
the system believes that your CPU is freezing and therefore runs the CPU fan at
minimum speed. That may be ok with the current load, but might be a problem
if the CPUs get busy and run hot. That is not related to the nouveau problem,
though.

Guenter

Revision history for this message
In , daniele.rogora (daniele.rogora-linux-kernel-bugs) wrote :

I can confirm the original bug reported. I reproduced it with a HP 625 (AMD athlon processor with AMD HD 4200 graphics) laptop.

I tested ubuntu 3.12, 3.13 and 3.14 kernels, and the problem appeared in 3.13.

Best regards,
Daniele

Revision history for this message
In , jdelvare (jdelvare-linux-kernel-bugs) wrote :

(In reply to Guenter Roeck from comment #8)
> Something in your system configuration is wrong. Usually this comes from the
> BIOS, so you you might want to check if there is a BIOS upgrade available. It
> looks like the system believes that your CPU is freezing and therefore runs
> the CPU fan at minimum speed.

As I recall the IT87xx chips need an offset programmed by the
BIOS in order to return "sane" temperature values from PECI sources.
Without the offset, the driver returns the thermal margin as a negative
value (-47°C here would mean the CPU runs 47 pseudo-°C below its
critical temperature.) This matches the values returned by coretemp (99
- 47 = 52). This would justify the low fan speeds.

The original poster could try setting temp3_offset to 99 (in the right chip section of sensors.conf, followed by "sensors -s" as root) and see if it makes the system behave differently.

Revision history for this message
In , the.ant (the.ant-linux-kernel-bugs) wrote :

Jean, indeed:
...
temp3: +46.0°C (low = -1.0°C, high = +127.0°C) sensor = Intel PECI
...
But it's definitely noisier now ;)

Guenter, thank you, I did open "https://bugs.freedesktop.org/show_bug.cgi?id=77003" for the NVIDIA fan speed issue.

Thank you guys,
Roman

Revision history for this message
In , jdelvare (jdelvare-linux-kernel-bugs) wrote :

It really all depends on what the automatic fan control setup expects. Unfortunately I don't think the it87 driver exposes its trip points to user-space so you'd have to poke at the registers directly.

Jernej Jakob (jjakob)
tags: added: kernel-bug-reported-upstream
Revision history for this message
In , jernej.jakob (jernej.jakob-linux-kernel-bugs) wrote :

Hello everyone,

I can confirm this bug as well on an HP Probook 4710s. So there are now at least 5 confirmed reports.

Please see:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1290110 (my bug report)
http://lkml.iu.edu//hypermail/linux/kernel/1404.0/02012.html (my archived post to the linux-kernel mailing list)

It would be worth to also take a look at the DSDT, as there are other minor quirks on my system that could point there... (brightness always on max after reboot/suspend, coarse brightness setting range)
I've already disassembled mine but am stumped at what to do next (this is my first look at anything ACPI related), how to debug...

But as previous kernels worked okay with this same DSDT, maybe they didn't control the fan speed through ACPI but left it to the BIOS?

For info on disassembling the DSDT see https://wiki.archlinux.org/index.php/DSDT

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

I've now bisected two times. From two different kernel origins, just to be sure, as I'm new to this stupid-and-lengthy method, and, to be sure, I haven't given a false positive inbetween due to boredom.

In the end it says each time:
# git bisect bad | tee -a /var/log/bisect.log
cc8ef52707341e67a12067d6ead991d56ea017ca is the first bad commit
commit cc8ef52707341e67a12067d6ead991d56ea017ca
Author: Zhang Rui <email address hidden>
Date: Wed Sep 25 20:39:45 2013 +0800

    ACPI / AC: convert ACPI ac driver to platform bus

    Signed-off-by: Zhang Rui <email address hidden>
    Signed-off-by: Rafael J. Wysocki <email address hidden>

:040000 040000 5a0d397cfcbf53c03390f2805b83754cb7837d84 4a2af1454f65d67f1d1a507c08e3b9ef3ffe57e7 M drivers

Please help me, on how I can help debug this more, and please also read the newest from
https://bugzilla.kernel.org/show_bug.cgi?id=71711

Manuel Krause

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :
Download full text (3.9 KiB)

Hi, Manuel,

nice report.

(In reply to Manuel Krause from comment #3)
>
> 3.13.5 -- 20140309 -- 20:52 -- bad
> =============================
> dir |-
> /type /cur_state /max_state
> cooling_device0 Processor 0 10
> cooling_device1 Processor 0 10
> cooling_device2 Fan 0 1
> cooling_device3 Fan 1 1
> cooling_device4 Fan 0 1
> cooling_device5 Fan 0 1
> cooling_device6 Fan 0 1
> cooling_device7 LCD 0 24
>
> 3.12.13 -- 20140310 -- 00:26 -- good
> ==============================
> dir |-
> /type /cur_state /max_state
> cooling_device0 Processor 0 10
> cooling_device1 Processor 0 10
> cooling_device2 Fan 0 1
> cooling_device3 Fan 1 1
> cooling_device4 Fan 1 1
> cooling_device5 Fan 1 1
> cooling_device6 Fan 1 1
> cooling_device7 LCD 0 24
>
>
> 3.13.5 -- 20140309 -- 20:52 -- bad
> =============================
> dir |-
> /passive /temp |- /cdev?_ /trip_ /trip_
> trip_ point_ point_
> point ?_temp ?_type
> thermal_zone0 0 68000 ?=0 n.a. 256000 critical
> thermal_zone1 n.a. 70000 |-
> ?=0 6 110000 critical
> ?=1 5 107000 passive
> ?=2 4 90000 active
> ?=3 3 75000 active
> ?=4 2 55000 active
> ?=5 1 45000 active
> ?=6 1 30000 active
> thermal_zone2 n.a. 54000 |-
> ?=0 1 105000 critical
> ?=1 1 95000 passive
> thermal_zone3 n.a. 25800 |-
> ?=0 1 110000 critical
> ?=1 1 60000 passive
> thermal_zone4 0 58000 ?=0 n.a. 110000 critical
>
>
> 3.12.13 -- 20140310 -- 00:26 -- good
> ==============================
> dir |-
> /passive /temp |- /cdev?_ /trip_ /trip_
> trip_ point_ point_
> point ?_temp ?_type
> thermal_zone0 0 50000 ?=0 n.a. 256000 critical
> thermal_zone1 n.a. 70000 |-
> ?=0 1 110000 critical
> ?=1 1 107000 passive
> ?=2 2 90000 active
> ?=3 3 67000 active
> ?=4 4 55000 active
> ?=5 5 45000 active
> ?=6 6 30000 active
> thermal_zon...

Read more...

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :
Download full text (6.9 KiB)

Let's start with my actual GOOD kernel:

# uname -r
3.12.16-ck2
# grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
/sys/class/thermal/thermal_zone1/cdev0/device/path:\_TZ_.FAN4
/sys/class/thermal/thermal_zone1/cdev1/device/path:\_TZ_.FAN3
/sys/class/thermal/thermal_zone1/cdev2/device/path:\_TZ_.FAN2
/sys/class/thermal/thermal_zone1/cdev3/device/path:\_TZ_.FAN1
/sys/class/thermal/thermal_zone1/cdev4/device/path:\_TZ_.FAN0
/sys/class/thermal/thermal_zone1/cdev5/device/path:\_PR_.CPU1
/sys/class/thermal/thermal_zone1/cdev6/device/path:\_PR_.CPU0
/sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.CPU1
/sys/class/thermal/thermal_zone2/cdev1/device/path:\_PR_.CPU0
/sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.CPU1
/sys/class/thermal/thermal_zone3/cdev1/device/path:\_PR_.CPU0
# grep . /sys/class/thermal/cooling_device*/device/path
/sys/class/thermal/cooling_device0/device/path:\_PR_.CPU0
/sys/class/thermal/cooling_device1/device/path:\_PR_.CPU1
/sys/class/thermal/cooling_device2/device/path:\_TZ_.FAN0
/sys/class/thermal/cooling_device3/device/path:\_TZ_.FAN1
/sys/class/thermal/cooling_device4/device/path:\_TZ_.FAN2
/sys/class/thermal/cooling_device5/device/path:\_TZ_.FAN3
/sys/class/thermal/cooling_device6/device/path:\_TZ_.FAN4
/sys/class/thermal/cooling_device7/device/path:\_SB_.PCI0.GFX0.DD02

And have a newer BAD kernel:

# uname -r
3.13.8-ck1
# grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
/sys/class/thermal/thermal_zone1/cdev0/device/path:\_TZ_.FAN4
/sys/class/thermal/thermal_zone1/cdev1/device/path:\_TZ_.FAN3
/sys/class/thermal/thermal_zone1/cdev2/device/path:\_TZ_.FAN2
/sys/class/thermal/thermal_zone1/cdev3/device/path:\_TZ_.FAN1
/sys/class/thermal/thermal_zone1/cdev4/device/path:\_TZ_.FAN0
/sys/class/thermal/thermal_zone1/cdev5/device/path:\_PR_.CPU1
/sys/class/thermal/thermal_zone1/cdev6/device/path:\_PR_.CPU0
/sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.CPU1
/sys/class/thermal/thermal_zone2/cdev1/device/path:\_PR_.CPU0
/sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.CPU1
/sys/class/thermal/thermal_zone3/cdev1/device/path:\_PR_.CPU0
# grep . /sys/class/thermal/cooling_device*/device/path
/sys/class/thermal/cooling_device0/device/path:\_PR_.CPU0
/sys/class/thermal/cooling_device1/device/path:\_PR_.CPU1
/sys/class/thermal/cooling_device2/device/path:\_TZ_.FAN0
/sys/class/thermal/cooling_device3/device/path:\_TZ_.FAN1
/sys/class/thermal/cooling_device4/device/path:\_TZ_.FAN2
/sys/class/thermal/cooling_device5/device/path:\_TZ_.FAN3
/sys/class/thermal/cooling_device6/device/path:\_TZ_.FAN4
/sys/class/thermal/cooling_device7/device/path:\_SB_.PCI0.GFX0.DD02

The "grep . /sys/class/thermal/cooling_device*/device/path" results stay
always the same as above, so I omit them in the following.

There are generally only two different re-occurring scenarios for
"grep . /sys/class/thermal/thermal_zone*/cdev*/device/path", so that I
want to abbreviate them in the following:

Scenario-1:
# grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
/sys/class/thermal/thermal_zone1/cdev0/device/path:\_PR_.CPU1
/sys/class/thermal/thermal_zone1/cdev1/device/path:\_PR_.CPU0
/sys/class/thermal/thermal_...

Read more...

Revision history for this message
Jernej Jakob (jjakob) wrote :

Changed URL to HTTPS

Revision history for this message
In , tianyu.lan (tianyu.lan-linux-kernel-bugs) wrote :

ping Rui ... Please have a look this bug.

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

There had been additional steps in the meantime, but unfortunately no sulution so far.

You can read the related postings to lkml e.g. with:
http://marc.info/?l=linux-kernel&w=2&r=1&s=dangerous+fan+policy&q=b

Best regards, Manuel Krause

Revision history for this message
In , thohl (thohl-linux-kernel-bugs) wrote :

Hi!

I have similar problem on HP ProBook 4510s (Firmware F.20, Intel T3000) running 64bit-kernel 3.13.9 (Kubuntu) or 64-bit kernel 3.13 - 3.14 on arch. I remarked that the regulation of the fan (not necessarily the fan itself!) stops after boot. So, if the system is cold, the fan is running at 0% (= off) or at 20% (which is an unusual number as the fan speed rises usually in 15% stepintel pentium dual core t3000 "microcode" updates on this hardware). On reboot, when the machine is warm, fan speeds of 30% or 45% are often observed depending on the CPU temperature at boot time. After booting, the fan speed does not change anymore and keeps constant. So, when the machine was started cold, the fan is off until the temperature reaches critical values and runs then with 90% (= full speed) until the temperature drops. It goes then off again completely. This is not nice as the cooling might not be sufficient and my machine may shut down hard. Such behaviour is not nice and also not in-line with the idea of 'Laptop' because the machine gets so hot that I don't want to leave it on the top of my lap to avoid burning myself :-)
According to my interpretation, the system ignores all active trip points, but reacts on the passive and critical trip points.

I found also a not so perferct workaround after some trial and error with boot parameters: passing 'thermal.tzp=1' (or any other higher number) to the kernel at boot time (unload and reload thermal with the tzp-parameter does not help) restores the temperature depending fan speed regulation. This work around comes unfortunately with the trade-off of two or three kworker-processes that consume up to the full capacity of one CPU, which makes the system sluggy and raises power consumption.

I hope that this info on the problem helps finding a real fix, which would be appreciated.

Regards, Thomas

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

(In reply to Pohjoistuuli from comment #19)
[...]
> I have similar problem on HP ProBook 4510s (Firmware F.20, Intel T3000)
> running 64bit-kernel 3.13.9 (Kubuntu) or 64-bit kernel 3.13 - 3.14 on arch.
[...]

@Pohjoistuuli // Thomas
Your machine has the same symptoms as mine with 3.13.x +
Have you tried a 3.12.y kernel of your distro (or even vanilla)?

BTW, you can issue a command at runtime or via a startup script to set "echo 1 > /sys/class/thermal/cooling_device3/cur_state" e.g. (my favourite). 6 is the lowest of cooling_device~ representing fan speed knobs. Just try.

@ Rui Zhang
I don't want this to be handled as a HP-Laptop-only problem, as 3.12.x is able to serve the fans and temps appropriately.

Best regards, Manuel

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

(In reply to Pohjoistuuli from comment #19)
> Hi!
>
> I have similar problem on HP ProBook 4510s (Firmware F.20, Intel T3000)
> running 64bit-kernel 3.13.9 (Kubuntu) or 64-bit kernel 3.13 - 3.14 on arch.
> I remarked that the regulation of the fan (not necessarily the fan itself!)
> stops after boot. So, if the system is cold, the fan is running at 0% (=
> off) or at 20% (which is an unusual number as the fan speed rises usually in
> 15% stepintel pentium dual core t3000 "microcode" updates on this hardware).
> On reboot, when the machine is warm, fan speeds of 30% or 45% are often
> observed depending on the CPU temperature at boot time. After booting, the
> fan speed does not change anymore and keeps constant. So, when the machine
> was started cold, the fan is off until the temperature reaches critical
> values and runs then with 90% (= full speed) until the temperature drops. It
> goes then off again completely.

I've seen exactly the same behavior on one of my test laptop.
And the problem is that ACPICA can not handle some kind of AML code well, PLUS, the fix for the problem ships in 3.13-rc1.
So the symptom I've seen is not a regression and exists in all Linux previous release.
Anyway, please attach the acpidump of your machine, so that I can check if they are the same AML problem.

BTW, it would be nice if you can try 3.12 kernel to verify if this is a regression or not.

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

(In reply to Manuel Krause from comment #16)
> There are generally only two different re-occurring scenarios for
> "grep . /sys/class/thermal/thermal_zone*/cdev*/device/path", so that I
> want to abbreviate them in the following:
>
> Scenario-1:
> # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
> /sys/class/thermal/thermal_zone1/cdev0/device/path:\_PR_.CPU1
> /sys/class/thermal/thermal_zone1/cdev1/device/path:\_PR_.CPU0
> /sys/class/thermal/thermal_zone1/cdev2/device/path:\_TZ_.FAN0
> /sys/class/thermal/thermal_zone1/cdev3/device/path:\_TZ_.FAN1
> /sys/class/thermal/thermal_zone1/cdev4/device/path:\_TZ_.FAN2
> /sys/class/thermal/thermal_zone1/cdev5/device/path:\_TZ_.FAN3
> /sys/class/thermal/thermal_zone1/cdev6/device/path:\_TZ_.FAN4
> /sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.CPU1
> /sys/class/thermal/thermal_zone2/cdev1/device/path:\_PR_.CPU0
> /sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.CPU1
> /sys/class/thermal/thermal_zone3/cdev1/device/path:\_PR_.CPU0
>
> Scenario-2:
> # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
> /sys/class/thermal/thermal_zone1/cdev0/device/path:\_TZ_.FAN4
> /sys/class/thermal/thermal_zone1/cdev1/device/path:\_TZ_.FAN3
> /sys/class/thermal/thermal_zone1/cdev2/device/path:\_TZ_.FAN2
> /sys/class/thermal/thermal_zone1/cdev3/device/path:\_TZ_.FAN1
> /sys/class/thermal/thermal_zone1/cdev4/device/path:\_TZ_.FAN0
> /sys/class/thermal/thermal_zone1/cdev5/device/path:\_PR_.CPU1
> /sys/class/thermal/thermal_zone1/cdev6/device/path:\_PR_.CPU0
> /sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.CPU1
> /sys/class/thermal/thermal_zone2/cdev1/device/path:\_PR_.CPU0
> /sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.CPU1
> /sys/class/thermal/thermal_zone3/cdev1/device/path:\_PR_.CPU0
>
> Already, during bisecting this issue, I've found out, that these scenarios
> have something to do with rebooting: So, I've rebooted the new bisected
> kernel
> twice in the second roundup.
> But I haven't expected the following disorder:
>
> This is a row of results from last night, rebooting different kernels, one
> after the other, and capturing some relevant data.
>
>
> # uname -r
> 3.12.16
> # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
> Scenario-2
>
> # uname -r
> 3.13.8
> # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
> Scenario-2
>
> # uname -r
> 3.13.8
> # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
> Scenario-1
>
> # uname -r
> 3.12.13
> # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
> Scenario-2
>
> # uname -r
> 3.12.13
> # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
> Scenario-1
>
> # uname -r
> 3.12.13
> # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
> Scenario-2
>
I suppose these 3.12.13 kernel are the exactly the same kernel without any rebuilding, right?
could you please change your config file and always build in the ACPI thermal and fan driver and see if this problem still exists?

Revision history for this message
In , Erbureth (erbureth-linux-kernel-bugs) wrote :

(In reply to Zhang Rui from comment #21)
> (In reply to Pohjoistuuli from comment #19)
> > Hi!
> >
> > I have similar problem on HP ProBook 4510s (Firmware F.20, Intel T3000)
> > running 64bit-kernel 3.13.9 (Kubuntu) or 64-bit kernel 3.13 - 3.14 on arch.
> > I remarked that the regulation of the fan (not necessarily the fan itself!)
> > stops after boot. So, if the system is cold, the fan is running at 0% (=
> > off) or at 20% (which is an unusual number as the fan speed rises usually
> in
> > 15% stepintel pentium dual core t3000 "microcode" updates on this
> hardware).
> > On reboot, when the machine is warm, fan speeds of 30% or 45% are often
> > observed depending on the CPU temperature at boot time. After booting, the
> > fan speed does not change anymore and keeps constant. So, when the machine
> > was started cold, the fan is off until the temperature reaches critical
> > values and runs then with 90% (= full speed) until the temperature drops.
> It
> > goes then off again completely.
>
> I've seen exactly the same behavior on one of my test laptop.
> And the problem is that ACPICA can not handle some kind of AML code well,
> PLUS, the fix for the problem ships in 3.13-rc1.
> So the symptom I've seen is not a regression and exists in all Linux
> previous release.
> Anyway, please attach the acpidump of your machine, so that I can check if
> they are the same AML problem.
>
> BTW, it would be nice if you can try 3.12 kernel to verify if this is a
> regression or not.

I can confirm having the same problem with HP Compaq 6830s -- the fan is off until temperature reaches critical, then runs full speed. When the temperature drops below 8x °C, the fan stops completely. This is happening both on 3.13 and 3.14

3.12 works fine

I'll post my acpidump when I get to the machine. Are there any more listings you are interested in?

Revision history for this message
In , jernej.jakob (jernej.jakob-linux-kernel-bugs) wrote :

These symptoms are exactly the ones I am experiencing. Please see comment 13 and my post to the mailing list: http://lkml.iu.edu//hypermail/linux/kernel/1404.0/02012.html

I have disassembled the DSDT from my machine, fixed most errors and warnings and tried booting with this one, but no change. I haven't dumped the other tables yet, but I will post them when I do.

3.12 is what was on this laptop until now (Ubuntu Saucy), then everything worked fine. No other changes, no fan control utilities, no negative temperatures (checked with lm-sensors). Just stock installs...

Revision history for this message
In , e.glorg (e.glorg-linux-kernel-bugs) wrote :

Got the same bug on Debian 7.4 with kernel 3.13-0, HP 4310s laptop. While kernels 3.12 worked correctly, after installing 3.13 fan went off after boot and turned on only when temperature reached 80 C and for very high speed. After cooling to ~75 C the fan went off again. The only thing I can state now is that this bug seems to be chipset-independed, it shows itself on AMD and Intel laptops and even on old Athlon-based desktop box.

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :
Download full text (4.7 KiB)

(In reply to Zhang Rui from comment #22)
> (In reply to Manuel Krause from comment #16)
> > There are generally only two different re-occurring scenarios for
> > "grep . /sys/class/thermal/thermal_zone*/cdev*/device/path", so that I
> > want to abbreviate them in the following:
> >
> > Scenario-1:
> > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
> > /sys/class/thermal/thermal_zone1/cdev0/device/path:\_PR_.CPU1
> > /sys/class/thermal/thermal_zone1/cdev1/device/path:\_PR_.CPU0
> > /sys/class/thermal/thermal_zone1/cdev2/device/path:\_TZ_.FAN0
> > /sys/class/thermal/thermal_zone1/cdev3/device/path:\_TZ_.FAN1
> > /sys/class/thermal/thermal_zone1/cdev4/device/path:\_TZ_.FAN2
> > /sys/class/thermal/thermal_zone1/cdev5/device/path:\_TZ_.FAN3
> > /sys/class/thermal/thermal_zone1/cdev6/device/path:\_TZ_.FAN4
> > /sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.CPU1
> > /sys/class/thermal/thermal_zone2/cdev1/device/path:\_PR_.CPU0
> > /sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.CPU1
> > /sys/class/thermal/thermal_zone3/cdev1/device/path:\_PR_.CPU0
> >
> > Scenario-2:
> > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
> > /sys/class/thermal/thermal_zone1/cdev0/device/path:\_TZ_.FAN4
> > /sys/class/thermal/thermal_zone1/cdev1/device/path:\_TZ_.FAN3
> > /sys/class/thermal/thermal_zone1/cdev2/device/path:\_TZ_.FAN2
> > /sys/class/thermal/thermal_zone1/cdev3/device/path:\_TZ_.FAN1
> > /sys/class/thermal/thermal_zone1/cdev4/device/path:\_TZ_.FAN0
> > /sys/class/thermal/thermal_zone1/cdev5/device/path:\_PR_.CPU1
> > /sys/class/thermal/thermal_zone1/cdev6/device/path:\_PR_.CPU0
> > /sys/class/thermal/thermal_zone2/cdev0/device/path:\_PR_.CPU1
> > /sys/class/thermal/thermal_zone2/cdev1/device/path:\_PR_.CPU0
> > /sys/class/thermal/thermal_zone3/cdev0/device/path:\_PR_.CPU1
> > /sys/class/thermal/thermal_zone3/cdev1/device/path:\_PR_.CPU0
> >
> > Already, during bisecting this issue, I've found out, that these scenarios
> > have something to do with rebooting: So, I've rebooted the new bisected
> > kernel
> > twice in the second roundup.
> > But I haven't expected the following disorder:
> >
> > This is a row of results from last night, rebooting different kernels, one
> > after the other, and capturing some relevant data.
> >
> >
> > # uname -r
> > 3.12.16
> > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
> > Scenario-2
> >
> > # uname -r
> > 3.13.8
> > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
> > Scenario-2
> >
> > # uname -r
> > 3.13.8
> > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
> > Scenario-1
> >
> > # uname -r
> > 3.12.13
> > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
> > Scenario-2
> >
> > # uname -r
> > 3.12.13
> > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
> > Scenario-1
> >
> > # uname -r
> > 3.12.13
> > # grep . /sys/class/thermal/thermal_zone*/cdev*/device/path
> > Scenario-2
> >
> I suppose these 3.12.13 kernel are the exactly the same kernel without any
> rebuilding, right?

Yes, of course, without rebuilding. Only re-/booting previously built kernels, to show you the obvious differences after rebooting....

Read more...

Revision history for this message
In , thohl (thohl-linux-kernel-bugs) wrote :

(In reply to Zhang Rui from comment #21)
> (In reply to Pohjoistuuli from comment #19)

Sorry for answering quite late. I am usually busy during the week and testing this is surprisingly time-consuming (waiting for the system to have the right start temperature and then then waiting for it to raise etc). I use now tmon, which makes testing the thermal behaviour of laptops much easier. It is also a quite handy tool to regulate the fan speed. I raise the CPU-temperature usually with 'openssl speed'. Finding this 'technique' improved testing speed quite much.

> Anyway, please attach the acpidump of your machine, so that I can check if
> they are the same AML problem.

The acpidump is now on my harddrive, but I did not find a function to attach a file to this message. I run also a check with fwts on my machine (on Ubuntu 14.04). fwts reported problems in the DSDT. I can provide also this log if needed (and when I know how ;-).

> BTW, it would be nice if you can try 3.12 kernel to verify if this is a
> regression or not.

I have checked out ArchLinux kernels 3.12.9-2 and 3.13.1-1. 3.12.9-2 runs fine and 3.13.1-1 does not regulate the fan speed when passing an active trip point temperature. Other ArchLinux kernels that I have tested so far are 3.10.37-1 (lts), which works fine, and 3.14.1-1 (today's kernel), which does not regulate the fan speed.

Some other remarks:
- I can confirm Manuel's observations regarding cdev*_trip_point. I can see also all three numbering versions on my laptop (version 3 on Ubuntu 14.04, which has the the acpi routines compiled in the kernel). tmon does not have any problems with this and shows under kernels 3.10, 3.12., 3.13 and 3.14 the same setup and works without any differences. Additionally checking dmesg did not reveal relevant differences between 3.12 and 3.13 to me.
- My machine has a thermal zone GFXZ (acpitz0), which isthat not connected to any hardware because my computer has only chipset graphics. The 'temperature' is constant at 16'C. Is this perhaps a problem in this context? Is the acpi system looking only at the wrong thermal zone?
- The behaviour of my machine is different when on battery and when on AC. The reason for this is a BIOS setting, which affects the lowest fan speed level. On battery, it is always 0% rpm (= completely off). When on AC, it is possible to choose in the BIOS between 0% rpm (like when on battery) or 20% rpm as minimum value (my setup). This difference between AC and battery made remarking this error in the beginning quite difficult.
- For cooling my machine at normal CPU load, 20-30% rpm are often sufficient. Under full load, the CPU temperature rarely exceeds 60'C when the fans are running with 45% of max. rpm. Therefore, problems with overheating and fan regulation were first quite confusing.
- tmon is really nice - including the user interface!!!

Thanks for looking into this, Thomas

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

Created attachment 134061
acpidump HP Compaq 6730b

Maybe a acpidump from my machine can help?

@Pohjoistuuli / Thomas: At the top, above the comments and below the header of this bugzilla page, there is the box "Attachment" with the function to add one. (I also needed a while to find it.) ;-)

I hope there's still someone working on this bug?!
Regards, Manuel

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

And kernel 3.15.0-rc2 also fails in (all) the same way(s). Regards, Manuel

Revision history for this message
In , rjw (rjw-linux-kernel-bugs) wrote :

Rui, care to prepare a revert of commit cc8ef5270734 (ACPI / AC: convert ACPI ac driver to platform bus) on top of 3.15-rc3 so that Manuel can test it?

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

Rui, best for me would be a patch to apply to some released kernels, as I don't want to go bisecting again for nothing. Thx!

Revision history for this message
In , rjw (rjw-linux-kernel-bugs) wrote :

It would be most useful to us to know if the revert on top of the current mainline (that is, 3.15-rc3) works, though. If it doesn't, we need to look somewhere else anyway.

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

O.K. You're right, indeed. 3.15-rc3 is here. So, please: Give me a patch!!!

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

Without any patch from you... :-(

3.14.3 fails and
3.15.0-rc4 fails, too.

Revision history for this message
In , linux (linux-linux-kernel-bugs) wrote :

I'll send a compile-tested-only patch in a minute. For the Brave ...

Revision history for this message
In , rjw (rjw-linux-kernel-bugs) wrote :

Patch to test: https://patchwork.kernel.org/patch/4124871/

Thanks Guenter!

Revision history for this message
In , rjw (rjw-linux-kernel-bugs) wrote :

Created attachment 135301
ACPI / AC: Use proper name for netlink event generation

Manuel, if the Guenter's patch from the previous comment helps, can you please check if this one helps too?

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

Thank you both to provide something to test finally!!! :-)))

I've now tested the two variants with 3.15.0-rc4, they apply && compile fine. (For now only with the thermal, fan and processor _built into_ the kernel.)

Guenters reverting patch works !!!
Rafaels does not, it does not change fan speeds when passing the trip point temperatures.

And now?

Revision history for this message
In , rjw (rjw-linux-kernel-bugs) wrote :

Well, I'll queue up the revert for 3.15 and then we'll need to figure out what was wrong with that commit.

Thanks!

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

Oh, and in the meantime I've patched my 3.14.3 with Guenters reverting patch (with some fuzzes and offsets o.k.) -- and it also works very well!

I stay tuned to this bug -- and still like to help you to figure out.

Best regards to all participants, Manuel

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

Created attachment 136881
Guenter Roecks patch adapted for a 3.14.4 vanilla kernel

Unfortunately I haven't seen someone to add Guenters reverting patch to 3.14.x kernels so far.
So I'd like to post you something adapted for 3.14.4. There were only cosmetical changes needed from Guenters original version for 3.15-rcX. And, yes, it works on here.

Revision history for this message
In , linux (linux-linux-kernel-bugs) wrote :

Unless I am missing something, the patch is not yet upstream, so we can not back-port it to 3.14.

Revision history for this message
In , angelo.compagnucci (angelo.compagnucci-linux-kernel-bugs) wrote :

Just compiled and installed kernel 3.15-rc6 on my Intel ICH9 laptop, the problem still remain and it's very dangerous.

with this kernel at least the fan runs at a very low speed, but doesn't follow thermal variances, so the temperature can easily rise to 80C.

So thi is not resolved for me.

Revision history for this message
In , linux (linux-linux-kernel-bugs) wrote :

Quite surprising, because 3.15-rc6 does include the fix,
as tested by Manuel.

Manuel, any chance you can re-test with 3.15-rc6 ?

Revision history for this message
In , angelo.compagnucci (angelo.compagnucci-linux-kernel-bugs) wrote :

Hi Guenter,

My fault, I was running 3.15rc5 instead of rc6! RC& works wonderfully,
fan runs smoothly than any previous kernel thermal management. There
is only one hiccup, fan never reaches 100% full speed also if the
temperature rises over 77C the fun runs max at 70%.

I have to manually write 1 into
/sys/devices/virtual/thermal/cooling_device0/cur_state to freshen the
cpu to a normal level, this is particularly annoying when I'm
compiling, because I have to reissue a command occasionally.

Thank you for your support!

2014-05-23 15:41 GMT+02:00 <email address hidden>:
> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>
> --- Comment #44 from Guenter Roeck <email address hidden> ---
> Quite surprising, because 3.15-rc6 does include the fix,
> as tested by Manuel.
>
> Manuel, any chance you can re-test with 3.15-rc6 ?
>
> --
> You are receiving this mail because:
> You are on the CC list for the bug.

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

(In reply to Guenter Roeck from comment #44)
> Quite surprising, because 3.15-rc6 does include the fix,
> as tested by Manuel.
>
> Manuel, any chance you can re-test with 3.15-rc6 ?

Yes, I've just tested it -- and it works fine for me, as expected.

And, I'm not concerned about the temp. <-> fan levels as Angelo mentions. IIRC, this is the normal behaviour also known from kernels before 3.13 .

Thanks to you, Guenter!

Jernej Jakob (jjakob)
tags: added: kernel-fixed-upstream kernel-fixed-upstream-3.15-rc6
removed: kernel-bug-exists-upstream kernel-bug-reported-upstream
Revision history for this message
Jernej Jakob (jjakob) wrote :

The bug is fixed in 3.15-rc6. The patch is at https://patchwork.kernel.org/patch/4124871/ and a patch adapted for 3.14.4 is here: https://bugzilla.kernel.org/attachment.cgi?id=136881

A backport to 3.13 is needed for it to get into Trusty AFAIK...

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

3.14.5 is out now... without this fix... Can someone of you sleepy guys, please, ... begin to... at least think of... bringing Guenters patch to the so called "stable" kernel... finally ??!
My simply converted patch for 3.14.4 is still working with 3.14.5. See Comment 41.

This is a quite disappointig thread. Has someone begun to work on the original failure, why the conversion of AC to platform bus didn't work?

Thanks, Manuel

Revision history for this message
In , linux (linux-linux-kernel-bugs) wrote :

On Sun, Jun 01, 2014 at 04:24:41PM +0000, <email address hidden> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>
> --- Comment #47 from Manuel Krause <email address hidden> ---
> 3.14.5 is out now... without this fix... Can someone of you sleepy guys,
> please, ... begin to... at least think of... bringing Guenters patch to the
> so
> called "stable" kernel... finally ??!
> My simply converted patch for 3.14.4 is still working with 3.14.5. See
> Comment
> 41.
>
Please chill down. You do have a working solution, don't you ?

The 3.14 maintainer mentioned a couple of days ago that he has more than 200
patches pending for 3.14, on top of 3.14.5. Greg is doing an excellent job
maintaining the stable kernel releases. Calling him sleepy is, to say it very
politely, not appropriate.

> This is a quite disappointig thread. Has someone begun to work on the
> original
> failure, why the conversion of AC to platform bus didn't work?
>

As far as I know no one who actually helped fixing your problem is getting paid
for this task, including me. Actually, I am specifically _not_ paid for anything
I do in the upstream kernel. In addition to that, it occurs to me that you are
most likely not paying anything to anyone for providing you support either.
You might want to consider adjusting your expectations a bit, or switch to a
pay-for-use operating system.

Having said that, Linux being an open source operating system, I am sure the
responsible maintainer would be happy to get a patch from you to fix the
original failure.

Thanks,
Guenter

Revision history for this message
In , jza (jza-linux-kernel-bugs) wrote :

HP 2230s is also affected. A fresh kernel pulled from the Linus tree seems to work fine now.

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

At first I want to apologize a bit for my words in my Comment 47. I'm no native english speaker so I obviously/may have not found the *right* words to express my disappointment with the ongoing of this thread since early 2014/03. And I felt that I should not "chill down" until this is included into the actual kernel series.

Of course, I did NOT want to question the work of people *working* on this bug. Neither those, helping me to help to resolve it for other people, too. Guenter is a great helper.

I don't think my disappointment is worth a discussion about paid support or something related. IIRC, I have provided needed info ASAP and also invested some of my spare time for your debugging work, as well as you and others. And I'd do it in future again, too.
Don't blame me for not having enough Linux programming knowledge, so far, to just provide a better "convert AC to platform bus" patch -- that's a bit inappropriate, too.
---
According to a yesterdays' message from Greg and a look to the stable queue: Guenters revert patch would be included in 4.14.6.
---
Cheers!
And thank you for your understanding,

Manuel

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

- revert patch would be included in 4.14.6.
+ revert patch would be included in 3.14.6.

Sorry for the typo.

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

HOUSTON, WE'VE GOT A PROBLEM...

I don't know why I haven't tested it thoroughly so far... Maybe, due to the ambient temperatures and my usual workflow for testing this one, only aiming at high temperatures? (I used worldcommunitygrid to achieve this.)

This patches' settings DO NOT surviwe a SUSPEND TO DISK: The settings for the actually needed trip point <-> fan speed are, unfortunately, then forgotten?

For the suspend-to-disk way I've checked several kernels, today,
3.15.0 pure vanilla NOGO
3.14.5 +BFQ +CK/BFS + revert patch NOGO
3.14.6 +BFQ +CK/BFS +TuxOnIce NOGO
3.14.7 +BFQ +CK/BFS +TuxOnIce NOGO
3.12.18 +BFQ +CK/BFS NOGO

It's a pity, to bother you again,

any ideas?!

Best regards, Manuel

Revision history for this message
In , linux (linux-linux-kernel-bugs) wrote :

On Thu, Jun 12, 2014 at 05:22:29PM +0000, <email address hidden> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=71711
>
> --- Comment #52 from Manuel Krause <email address hidden> ---
> HOUSTON, WE'VE GOT A PROBLEM...
>
> I don't know why I haven't tested it thoroughly so far... Maybe, due to the
> ambient temperatures and my usual workflow for testing this one, only aiming
> at
> high temperatures? (I used worldcommunitygrid to achieve this.)
>
> This patches' settings DO NOT surviwe a SUSPEND TO DISK: The settings for
> the
> actually needed trip point <-> fan speed are, unfortunately, then forgotten?
>
> For the suspend-to-disk way I've checked several kernels, today,
> 3.15.0 pure vanilla NOGO
> 3.14.5 +BFQ +CK/BFS + revert patch NOGO
> 3.14.6 +BFQ +CK/BFS +TuxOnIce NOGO
> 3.14.7 +BFQ +CK/BFS +TuxOnIce NOGO
> 3.12.18 +BFQ +CK/BFS NOGO
>
> It's a pity, to bother you again,
>
> any ideas?!
>
Unless I am missing something, looks like a separate problem.
Does this work with any earlier kernels ?

Guenter

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

To be more accurate: The last triggered trip_point before suspend seems to be taken as the one to focus as next after suspend. But there is no correlation to lower fan speeds. It's lost, then?

I can pass this trip point upwardly and the fan goes to the related level. Going below, it may go to 0 fan speed.

The higher fan numbers (what are the fan's speed levels on here, but in vice-versa order, 04: is 24% fan; 03: 34%; 02: 45%; 01: 58%; 00: 100%) come up as 0 then (B).

Meaning with the help of the "tmon" tool:

(A) At boot everything is ok (for all the mentioned kernels):

ID Cooling Dev Cur Max Thermal Zone Binding │
│00 Fan 0 1 │││││││││││ ││││*││││││ │││││││││││ │││││││││││ ││││││││││││ │
│01 Fan 1 1 │││││││││││ │││*│││││││ │││││││││││ │││││││││││ ││││││││││││ │
│02 Fan 1 1 │││││││││││ ││*││││││││ │││││││││││ │││││││││││ ││││││││││││ │
│03 Fan 1 1 │││││││││││ │*│││││││││ │││││││││││ │││││││││││ ││││││││││││ │
│04 Fan 1 1 │││││││││││ *││││││││││ │││││││││││ │││││││││││ ││││││││││││

(B) At resume NOT ok:

│00 Fan 0 1 │││││││││││ ││││*││││││ │││││││││││ │││││││││││ ││││││││││││ │
│01 Fan 1 1 │││││││││││ │││*│││││││ │││││││││││ │││││││││││ ││││││││││││ │
│02 Fan 0 1 │││││││││││ ││*││││││││ │││││││││││ │││││││││││ ││││││││││││ │
│03 Fan 0 1 │││││││││││ │*│││││││││ │││││││││││ │││││││││││ ││││││││││││ │
│04 Fan 0 1 │││││││││││ *││││││││││ │││││││││││ │││││││││││ ││││││││││││

This is affecting suspend-to-ram, too, on here.
(I've already reported this symptom at the beginning of this thread
~ Comment 3.)

@Guenter: Do I really need to dig out kernels from before 3.12?

Best regards, Manuel

Revision history for this message
Oliver Joos (oliver-joos) wrote :

My old HP Compaq nx8220 is affected, too.

I use Kernel 3.13.0-27-generic. Since this is an LTS (Long Term Support) release I hope that a fix will be backported to 3.13.
After reading the upstream report I think the current patches are not the final solution.

summary: Fan stops after resume from suspend leading to overheating; requires
- reboot to fix [HP Probook 4710s]
+ reboot to fix [HP Probook 4710s and many others]
Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

First of all, this seems to be a different problem.
could you please file a new bug, build the latest upstream kernel, say 3.15, boot and
1. attach the output of "grep . /sys/class/thermal/thermal_zone*/cdev*/device/path"
2. attach the output of "# grep . /sys/class/thermal/cdev*/device/path"
3. run "# echo 'module thermal_sys +fp' > /sys/kernel/debug/dynamic_debug/control"
4. reproduce the problem you showed in comment #54
5. attach the dmesg output and tmon output.

Revision history for this message
In , jza (jza-linux-kernel-bugs) wrote :

For my hardware both suspend and hibernate are OK.

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

(In reply to Zhang Rui from comment #55)
> First of all, this seems to be a different problem.
> could you please file a new bug, build the latest upstream kernel, say 3.15,
> boot and
> 1. attach the output of "grep .
> /sys/class/thermal/thermal_zone*/cdev*/device/path"
> 2. attach the output of "# grep . /sys/class/thermal/cdev*/device/path"
> 3. run "# echo 'module thermal_sys +fp' >
> /sys/kernel/debug/dynamic_debug/control"
> 4. reproduce the problem you showed in comment #54
> 5. attach the dmesg output and tmon output.

Thank you very much, for pointing out the details that would be helpful. Of course, I can file a new bug.
But before I'd do this -- could you, please, have a look at
 https://bugzilla.kernel.org/show_bug.cgi?id=67101
 "weird fan control with 3.12, was ok in 3.9"
that I've found by coincidence. The symptoms seem to be the same (except for my system not needing to shut down, as the thermal's emergency cooling is very effective). Unfortunately the original poster didn't finish.
What do you say?
Please, advise me, whether it would be better to revive that bug and add my additional info or to file a new one.

Thank you in advance, Manuel

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

(In reply to Joonas Saarinen from comment #56)
> For my hardware both suspend and hibernate are OK.

Can you, please, tell me which BIOS version you're running? I'm running the one before the latest as the latest is only installable via Windows with much more addon software.

Mine is a: (excerpt from 'dmesg | grep BIOS')
DMI: Hewlett-Packard HP Compaq 6730b (KU489ET#ABD)/30DD, BIOS 68PDD Ver. F.17 12/02/2010

Thank you in advance, Manuel

Revision history for this message
In , jza (jza-linux-kernel-bugs) wrote :

DMI: Hewlett-Packard HP 2230s /3037, BIOS 68PHU Ver. F.20 12/10/2011

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

Manuel, please file a new bug.

Dhinak G (dhinak2004)
affects: linux → ubuntu
Changed in ubuntu:
status: Unknown → Confirmed
status: Confirmed → Fix Released
Revision history for this message
Dhinak G (dhinak2004) wrote :

I checked on the Linux Bug Tracker (or something related) and it says a patch is out.

Run "sudo apt-get update && sudo apt-get upgrade" and if that doesn't work,

install indicator-cpufreq: "sudo apt-get install indicator-cpufreq", log out, and login.

Look at your menu bar. A pic of a CPU will be there, and click. Select Powersave from the menu.

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

(In reply to Zhang Rui from comment #60)
> Manuel, please file a new bug.

A BIOS update from F.17 to F.20 did not achieve any efforts.

Btw., some distro specific bug reports falsely (not from my hands) point to here.

I've now filed a new bug upon my Comment 52 ++
https://bugzilla.kernel.org/show_bug.cgi?id=78201

Thank you all for your guidance,
Manuel

Revision history for this message
Oliver Joos (oliver-joos) wrote :

Hi Dhinak. Which kernel does work for you?

I tested 3.13.0-24, then 3.13.0-27 and since this week there is 3.13.0-29 to upgrade to. For my two HP nx8220 all 3 kernels do not control the cpu fan correctly! Both laptops ran Ubuntu since 7 years without major problems.
My 2 laptops are: HP Compaq nx8220 (PY518EA#UUZ)/0934, BIOS 68DTV Ver. F.16 07/11/2007

I do not agree that this bug has a "Fix Released". The upstream bug is "resolved - patch available", but there is a discussion that this patch only solves it for some hardware - not all! And for latest Ubuntu 14.04 (LTS!) it is not solved until the patch has has been backported to kernel 3.13.x - this has not been done yet either.

Therefore I think we should rewind this report to "Confirmed" to prevent affected people from opening new reports about the same issue. Please write if you disagree.

BTW: the workaround to force Powersave does work, but makes our laptops unusable slow.

Dhinak G (dhinak2004)
Changed in linux (Ubuntu):
status: Triaged → Confirmed
Revision history for this message
Dhinak G (dhinak2004) wrote :

The problem is becaue its the daily build. Try a reinstall with stable.

Revision history for this message
Dhinak G (dhinak2004) wrote :

For me, I have it working on every kernel.

Revision history for this message
Oliver Joos (oliver-joos) wrote :

I did not install a daily build of Ubuntu or the kernel. But fan does not work since 14.04.

@Dhinak: could you write which kernel 3.13.0-?? works for you, and what hardware/BIOS you have, e.g. with Terminal command:
sudo dmidecode | head

Revision history for this message
Jernej Jakob (jjakob) wrote :

For me this bug is definitely fixed in 3.15-rc6, and AFAIK from this release onwards, so daily build of mainline should also work. Of course the patch is not yet backported, which it urgently must be. For now just install latest mainline from instructions on Ubuntu wiki.
If there is still a fault when you try 3.15-rc6 this must be an unrelated bug.

Revision history for this message
Jernej Jakob (jjakob) wrote :

I mean the kernel, of course. There is no need to reinstall Ubuntu 14.04 daily (in fact it would be pointless as the bug is still there). I've had it since alpha and it works fine with kernel 3.15-rc6 installed.

Dhinak G (dhinak2004)
Changed in linux (Ubuntu):
assignee: nobody → Dhinak G (dhinak2004)
affects: ubuntu → baltix
affects: baltix → ubuntu
no longer affects: linux (Debian)
Revision history for this message
Joonas Saarinen (jza) wrote :

Apparently this was reverted in v3.13.11.4.

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.11.4-trusty/CHANGES

The main 3.13.0-?? kernel (linux-image package of Ubuntu 14.04) is still affected though.

Revision history for this message
In , oliver.joos (oliver.joos-linux-kernel-bugs) wrote :

Our 3 laptops Compaq nx8220 run Mint 17 and I just upgraded to 3.13.0-30. They are still affected. After resume they heat up to 100°C until cpu throttling occurs. A quite serious issue.

Jörg-Karl Bösner did a reverse-bisect and may have found the evil commit: https://launchpad.net/bugs/1312860

Please backport the fix also to 3.13.x, since this kernel is part of many "Long Term Support" distros.

Revision history for this message
In , jza (jza-linux-kernel-bugs) wrote :

That 3.13.0-30 is an Ubuntu kernel and is always based on upstream 3.13.0 with Canonical's own selection of patches applied on top of it. From there the same kernel seems to trickle to Mint. So Ubuntu would have to apply the patch "ACPI / AC: convert ACPI ac driver to platform bus" to the 3.13.0-?? patch queue.

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

I don't know if it's still valid, but the patch had been picked up by Kamal Mostafa who has told to maintain 3.13.y.z.
Patch: http://patchwork.ozlabs.org/patch/360895/

Maybe you'd also like to read
 https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable
and https://lkml.org/lkml/2014/4/23/516

Best regards,
Manuel Krause

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

(In reply to Oliver Joos from comment #62)
> Our 3 laptops Compaq nx8220 run Mint 17 and I just upgraded to 3.13.0-30.
> They are still affected. After resume they heat up to 100°C until cpu
> throttling occurs. A quite serious issue.
>
> Jörg-Karl Bösner did a reverse-bisect and may have found the evil commit:
> https://launchpad.net/bugs/1312860
>
> Please backport the fix also to 3.13.x, since this kernel is part of many
> "Long Term Support" distros.

This BUG, here, only covers false fan speed after booting.

For the issue of high temperatures without fan action after resume from disk/RAM, please attach to https://bugzilla.kernel.org/show_bug.cgi?id=78201.

Thank you in advance,
Manuel Krause

Revision history for this message
In , jza (jza-linux-kernel-bugs) wrote :

> So Ubuntu would have to apply the patch "ACPI / AC: convert ACPI ac driver to
> platform bus" to the 3.13.0-?? patch queue.

Just to refine my message a bit...they obviously should apply the *revert* patch. :)

Here's also a direct link to the aforementioned "extended stable" Ubuntu kernel, where it already is reverted:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13.11.4-trusty/

But as Manuel says, in Oliver's case it might actually be a different bug if it reveals itself after suspend.

Revision history for this message
In , jza (jza-linux-kernel-bugs) wrote :

Looking at this changelog, the revert patch seems to be already part of upcoming 3.13.0-31 Ubuntu kernel.

https://launchpad.net/ubuntu/trusty/+source/linux/+changelog

Revision history for this message
Oliver Joos (oliver-joos) wrote :

For me it is solved with kernel 3.13.0-32 from https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa
Careful users should wait a few days to get the fixed kernel as a normal update!

From what I have read the problem was a kernel patch called "ACPI / AC: convert ACPI ac driver to platform bus" which has now been reverted in kernel 3.13.0-31 and higher.

Revision history for this message
Taleb Abdelhak (my-rk) wrote :

Running kernel 3.13.0-32, bug is STILL PRESENT
laptop : HP 620

Revision history for this message
Oliver Joos (oliver-joos) wrote :

@Taleb: I feel with you! But Jerney (reporter of this bug) also wrote in comment #16 that it is solved for him. Therefore I would open a new report exactly about your issue, with all details of your system and symptoms.

Jernej Jakob (jjakob)
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Juan Carlos (arareka-ha) wrote :

Hi, installed Ubuntu 14.10 on my HP Probook 4525s and when resume from suspend fan work slowly. Normally laptop work at 53/55 C.
But after resume from suspend temperature go to 60/70 C.

Revision history for this message
In , jose_wojnacki (josewojnacki-linux-kernel-bugs) wrote :

I'm running Archlinux (up to date) and I'm having the same issue. With the latest kernel 3.18.2 the problem is still there for me. Every time I unplug the ac power the laptop fan stops until temp reaches 84°C and then ramps down to 74°C with full speed fan.
With kernel 3.11.4 I have no problem at all.
Do you guys still have this issue?

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

There are currently many HP/Compaq notebook owners having problems with kernel 3.18.x. We are waiting for Zhang Rui to wake up from his winter sleep & him to catch up. See: https://bbs.archlinux.org/viewtopic.php?id=192255&p=2 (Read from the first page to get full info, and, some people on there don't handle the full fan speed value correctly.)

Most probably you would need to file a new BUG, but I'd attach to it soon with my logs..

Best regards,
 Manuel

Revision history for this message
In , manuelkrause (manuelkrause-linux-kernel-bugs) wrote :

You can also have a look at https://bugzilla.kernel.org/show_bug.cgi?id=78201, if that's something regarding your fan problem.

Best regards, Manuel

Revision history for this message
In , sunmooon15 (sunmooon15-linux-kernel-bugs) wrote :

for my hardware , the problem seems to be resolved by installing the latest beta of osx 10.10.2 , it has a firmware update that solves the issue under linux & windows .
hope this helps .

Revision history for this message
In , sunmooon15 (sunmooon15-linux-kernel-bugs) wrote :

nope , the problem is still there :

temperature is fine around 35 to 40 c ,

but the fans kiks rpm from 2000 to 5900 & then back to 4100 ,

cpu utilization is 10 - 13 % .

kernel : 3.18.4
os : Archlinux
Hardware : Macbook Air 2013

Revision history for this message
In , sunmooon15 (sunmooon15-linux-kernel-bugs) wrote :

PLEASE RESPOND ,

the problem is solved by updating to a new firmware with osx 10.10.2 ,

in linux 3.19 , the patch you have made make the laptop very noisy & fans

spinning at a very high rpm .

in linux 3.14.33 , everything is fine ( thermal , fan rpm ) ,

so can you please kindly revert or remove the patch , as it's not necessary any

more after osx 10.10.2 update .

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

(In reply to step-ali from comment #73)
> PLEASE RESPOND ,
>
> the problem is solved by updating to a new firmware with osx 10.10.2 ,
>
> in linux 3.19 , the patch you have made make the laptop very noisy & fans
>
> spinning at a very high rpm .
>
which patch are you referring to?

> in linux 3.14.33 , everything is fine ( thermal , fan rpm ) ,
>
>
> so can you please kindly revert or remove the patch , as it's not necessary
> any
>
> more after osx 10.10.2 update .

Revision history for this message
In , sunmooon15 (sunmooon15-linux-kernel-bugs) wrote :

(In reply to Zhang Rui from comment #74)
> (In reply to step-ali from comment #73)
> > PLEASE RESPOND ,
> >
> > the problem is solved by updating to a new firmware with osx 10.10.2 ,
> >
> > in linux 3.19 , the patch you have made make the laptop very noisy & fans
> >
> > spinning at a very high rpm .
> >
> which patch are you referring to?
>
> > in linux 3.14.33 , everything is fine ( thermal , fan rpm ) ,
> >
> >
> > so can you please kindly revert or remove the patch , as it's not necessary
> > any
> >
> > more after osx 10.10.2 update .

the patch that made the fans spin harder ,

all i know is on 3.19 there is no heat but the fans spin at high rpm on 10-15

cpu utilization

on 3.14.33 there is heat up to 89 c & the fans doesn't spin up on the same cpu

utilization .

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

(In reply to step-ali from comment #75)
> (In reply to Zhang Rui from comment #74)
> > (In reply to step-ali from comment #73)
> > > PLEASE RESPOND ,
> > >
> > > the problem is solved by updating to a new firmware with osx 10.10.2 ,
> > >
> > > in linux 3.19 , the patch you have made make the laptop very noisy & fans
> > >
> > > spinning at a very high rpm .
> > >
> > which patch are you referring to?
> >
> > > in linux 3.14.33 , everything is fine ( thermal , fan rpm ) ,
> > >
> > >
> > > so can you please kindly revert or remove the patch , as it's not
> necessary
> > > any
> > >
> > > more after osx 10.10.2 update .
>
> the patch that made the fans spin harder ,
>
step-ali,
actually, I don't think which patch introduces this problem.
But there is indeed some bug report complaining that the fan speed never changes after boot, since 3.18.
so can you please refer to bug #93301 and check if it is the same commit (6ab3430129e258ea31dd214adf1c760dfafde67a) that introduces this problem for you?

Revision history for this message
In , sunmooon15 (sunmooon15-linux-kernel-bugs) wrote :

(In reply to Zhang Rui from comment #76)
> (In reply to step-ali from comment #75)
> > (In reply to Zhang Rui from comment #74)
> > > (In reply to step-ali from comment #73)
> > > > PLEASE RESPOND ,
> > > >
> > > > the problem is solved by updating to a new firmware with osx 10.10.2 ,
> > > >
> > > > in linux 3.19 , the patch you have made make the laptop very noisy &
> fans
> > > >
> > > > spinning at a very high rpm .
> > > >
> > > which patch are you referring to?
> > >
> > > > in linux 3.14.33 , everything is fine ( thermal , fan rpm ) ,
> > > >
> > > >
> > > > so can you please kindly revert or remove the patch , as it's not
> necessary
> > > > any
> > > >
> > > > more after osx 10.10.2 update .
> >
> > the patch that made the fans spin harder ,
> >
> step-ali,
> actually, I don't think which patch introduces this problem.
> But there is indeed some bug report complaining that the fan speed never
> changes after boot, since 3.18.
> so can you please refer to bug #93301 and check if it is the same commit
> (6ab3430129e258ea31dd214adf1c760dfafde67a) that introduces this problem for
> you?

I don't think so ,

before 3.18 we had a high cpu utilization (25 to 30%) that was fixed by recent

apple osx 10.10.2 update , ( the problem was solved temporarily by disabling

some gpe ) but there wasn't any fan or heat problem .

After the osx 10.10.2 update ( was during linux 3.18 ) the fan spins up ( very

high rpm )on very little cpu utilization ( watching a video in chrome ) & then

spins down when idling .

on 3.14.33 it's the reverse , the fan doesn't spin up but the temperature rises

to 90 degree celsius ( also while watching videos on chrome ) , which is

harmful to the laptop .

the solution would be something in the middle ,

BUT PLEASE HURRY , MY MACHINE IS FRYING .

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

Please
1. rebuild your kernel with the patches at https://bugzilla.kernel.org/show_bug.cgi?id=78201#c150 applied.
2. run echo 'module thermal_sys +fp' > /sys/kernel/debug/dynamic_debug/control after boot
3. attach the dmesg output after the problem is reproduced.

Revision history for this message
In , sunmooon15 (sunmooon15-linux-kernel-bugs) wrote :

(In reply to Zhang Rui from comment #78)
> Please
> 1. rebuild your kernel with the patches at
> https://bugzilla.kernel.org/show_bug.cgi?id=78201#c150 applied.
> 2. run echo 'module thermal_sys +fp' >
> /sys/kernel/debug/dynamic_debug/control after boot
> 3. attach the dmesg output after the problem is reproduced.

sorry , I don't know how to merge a patch & compile .

after weeks of testing it looks like another firmware issue that needs to be

updated from apple , like the gpe66 issue , because the issue is also occurring

in windows too ( high temperature ) .

when i first bought the laptop it ran fine with linux , i guess i wish i never

updated osx , i never use it anyway .

i will submit a bug report to apple & see what happen .

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

as there is a firmware update, so can you please try 3.14 kernel again with your new firmware(In reply to step-ali from comment #77)
> (In reply to Zhang Rui from comment #76)
> > (In reply to step-ali from comment #75)
> > > (In reply to Zhang Rui from comment #74)
> > > > (In reply to step-ali from comment #73)
> > > > > PLEASE RESPOND ,
> > > > >
> > > > > the problem is solved by updating to a new firmware with osx 10.10.2
> ,
> > > > >
> > > > > in linux 3.19 , the patch you have made make the laptop very noisy &
> fans
> > > > >
> > > > > spinning at a very high rpm .
> > > > >
> > > > which patch are you referring to?
> > > >
> > > > > in linux 3.14.33 , everything is fine ( thermal , fan rpm ) ,
> > > > >
> > > > >
> > > > > so can you please kindly revert or remove the patch , as it's not
> necessary
> > > > > any
> > > > >
> > > > > more after osx 10.10.2 update .
> > >
> > > the patch that made the fans spin harder ,
> > >
> > step-ali,
> > actually, I don't think which patch introduces this problem.
> > But there is indeed some bug report complaining that the fan speed never
> > changes after boot, since 3.18.
> > so can you please refer to bug #93301 and check if it is the same commit
> > (6ab3430129e258ea31dd214adf1c760dfafde67a) that introduces this problem for
> > you?
>
> I don't think so ,
>
> before 3.18 we had a high cpu utilization (25 to 30%) that was fixed by
> recent
>
> apple osx 10.10.2 update , ( the problem was solved temporarily by disabling
>
> some gpe ) but there wasn't any fan or heat problem .
>
>
> After the osx 10.10.2 update ( was during linux 3.18 ) the fan spins up (
> very
>
> high rpm )on very little cpu utilization ( watching a video in chrome ) &
> then
>
> spins down when idling .
>
>
> on 3.14.33 it's the reverse , the fan doesn't spin up but the temperature
> rises
>
> to 90 degree celsius ( also while watching videos on chrome ) , which is
>
> harmful to the laptop.
>
is this symptom got with updated firmware?

Revision history for this message
In , sunmooon15 (sunmooon15-linux-kernel-bugs) wrote :

it's after kernel 3.18 & osx firmware update

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

> >
> > on 3.14.33 it's the reverse , the fan doesn't spin up but the temperature
> > rises
> >
> > to 90 degree celsius ( also while watching videos on chrome ) , which is
> >
> > harmful to the laptop.
> >
> is this symptom got with updated firmware?

I mean did you get this symptom with 3.14 kernel, after firmware updated?

Please do the following test on 4.0-rc kernel
1. apply the patches at
https://patchwork.kernel.org/patch/6077231/
https://patchwork.kernel.org/patch/6077241/
https://patchwork.kernel.org/patch/6077251/
2. please apply the two patches attached later
3. after build, please boot with kernel parameter module.dyndbg="module thermal_sys +fp" dyndbg="file thermal_core.c +fp; file step_wise.c +fp"
4. attach the acpidump output of your mac book
5. attach the output of "grep . /sys/class/thermal/*/*/path" after boot
6. attach the dmesg output after the bug reproduced
7. attach the output of "grep . /sys/class/thermal/thermal*/*" after the bug reproduced

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

Created attachment 171921
patch 4

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

Created attachment 171931
patch-5

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

ping...

Revision history for this message
In , sunmooon15 (sunmooon15-linux-kernel-bugs) wrote :

(In reply to Zhang Rui from comment #82)
> > >
> > > on 3.14.33 it's the reverse , the fan doesn't spin up but the temperature
> > > rises
> > >
> > > to 90 degree celsius ( also while watching videos on chrome ) , which is
> > >
> > > harmful to the laptop.
> > >
> > is this symptom got with updated firmware?
>
> I mean did you get this symptom with 3.14 kernel, after firmware updated?
>
> Please do the following test on 4.0-rc kernel
> 1. apply the patches at
> https://patchwork.kernel.org/patch/6077231/
> https://patchwork.kernel.org/patch/6077241/
> https://patchwork.kernel.org/patch/6077251/
> 2. please apply the two patches attached later
> 3. after build, please boot with kernel parameter module.dyndbg="module
> thermal_sys +fp" dyndbg="file thermal_core.c +fp; file step_wise.c +fp"
> 4. attach the acpidump output of your mac book
> 5. attach the output of "grep . /sys/class/thermal/*/*/path" after boot
> 6. attach the dmesg output after the bug reproduced
> 7. attach the output of "grep . /sys/class/thermal/thermal*/*" after the bug
> reproduced

yes , the symptom is htere after firmware update on 3.14 lts & 3.19

Revision history for this message
In , e.glorg (e.glorg-linux-kernel-bugs) wrote :

Upgraded to kernel 3.16 from Debian Jessie repos. Having performed no firmware upgrade, just upgraded OS. Strange, but problem has gone.
Here's uname output:
$ uname -srvom
Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt7-1 (2015-03-01) x86_64 GNU/Linux
$ cat /etc/issue
Debian GNU/Linux 8
Installation of 3.16 on Debian 7.x still gives that old problem.

Revision history for this message
In , sunmooon15 (sunmooon15-linux-kernel-bugs) wrote :

(In reply to E.Glorg from comment #87)
> Upgraded to kernel 3.16 from Debian Jessie repos. Having performed no
> firmware upgrade, just upgraded OS. Strange, but problem has gone.
> Here's uname output:
> $ uname -srvom
> Linux 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt7-1 (2015-03-01) x86_64
> GNU/Linux
> $ cat /etc/issue
> Debian GNU/Linux 8
> Installation of 3.16 on Debian 7.x still gives that old problem.

yep , upgrade to any kernel above 3.17 & you will have the problem again

Revision history for this message
In , sunmooon15 (sunmooon15-linux-kernel-bugs) wrote :

ha ,

i discovered something strange ,

under wayland everything is running normally 69 degree celsius ( where it was

around 85-90 under Xorg ) fan rpm is 1300 ( 5000 under xorg ) all using the same

kernel 3.14.37 lts running under Archlinux .

could it be a xorg-server issue ??!

if so then how come the problem disappear with kernel under 3.17 ??!

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

(In reply to step-ali from comment #86)
> (In reply to Zhang Rui from comment #82)
> > > >
> > > > on 3.14.33 it's the reverse , the fan doesn't spin up but the
> temperature
> > > > rises
> > > >
> > > > to 90 degree celsius ( also while watching videos on chrome ) , which
> is
> > > >
> > > > harmful to the laptop.
> > > >
> > > is this symptom got with updated firmware?
> >
> > I mean did you get this symptom with 3.14 kernel, after firmware updated?
> >
> > Please do the following test on 4.0-rc kernel
> > 1. apply the patches at
> > https://patchwork.kernel.org/patch/6077231/
> > https://patchwork.kernel.org/patch/6077241/
> > https://patchwork.kernel.org/patch/6077251/
> > 2. please apply the two patches attached later
> > 3. after build, please boot with kernel parameter module.dyndbg="module
> > thermal_sys +fp" dyndbg="file thermal_core.c +fp; file step_wise.c +fp"
> > 4. attach the acpidump output of your mac book
> > 5. attach the output of "grep . /sys/class/thermal/*/*/path" after boot
> > 6. attach the dmesg output after the bug reproduced
> > 7. attach the output of "grep . /sys/class/thermal/thermal*/*" after the
> bug
> > reproduced
>
> yes , the symptom is htere after firmware update on 3.14 lts & 3.19

please do the test and attach the debug information requested above.

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

ping...

Revision history for this message
In , sunmooon15 (sunmooon15-linux-kernel-bugs) wrote :

sorry , don't know how to apply patches to the kernel ,

but the problem is still there with kernel 4.0 .

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

do you know how to build a customized kernel?
please download the patches and run "patch -p1 < foo.patch" to apply each of them in ascending order, and then build the kernel.

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

ping...

Revision history for this message
In , rui.zhang (rui.zhang-linux-kernel-bugs) wrote :

bug closed as we can more make any progress w/o bug reporter' response and help.
Please feel free to reopen it if you can build customized kernel to help debug the issue.

Revision history for this message
In , sunmooon15 (sunmooon15-linux-kernel-bugs) wrote :

(In reply to Zhang Rui from comment #95)
> bug closed as we can more make any progress w/o bug reporter' response and
> help.
> Please feel free to reopen it if you can build customized kernel to help
> debug the issue.

sorry , just don't have the time to build a customized kernel ,

will test with 4.1 .

Revision history for this message
derWalter (walter-derwalter) wrote :

Happens regularly but not every time on ubuntu 16.04 on a thinkpad x201.

Revision history for this message
Julien Olivier (julo) wrote :

The bug is still very present in Ubuntu 19.04.

Brad Figg (brad-figg)
tags: added: cscc
Revision history for this message
In , peterek355 (peterek355-linux-kernel-bugs) wrote :

Mr Zhang Rui,

I notice that this bug is still affected HP notebooks in all new kernels. I want to reopen this bug report.

I am not experienced user but I tried all popular distributions like Fedora 34 with 5.11 kernel, Ubuntu 20.04 with 5.4 kernel, SUSE Linux Enterprise Desktop 15 SP3 with 5.3 kernel.

My fans are constantly in IDLE speed. It is doesn't matter if my CPU usage is 100% or 0%, this have this same low speed. Sometimes my notebook shut down because it is overheating. Sometimes my fans are running 100% speed for few seconds when my hardware is very hot and then this return to behaviour with IDLE speed.

So I have question. Why these patches from this bug was not applied to upstream final kernel?

Can I fix my issue without compiling new kernel with modifications? I thinked about thermald daemon but I don't know it is compatilbe with AMD processors? If yes, how I can configure it to fix issues? I also finded some program in github to HP 625 but still I am not coder (I learning) so I don't know if it program is working and if it is safe.

I hope that you can help.

Yours faithfully,
PeterQ

Changed in ubuntu:
importance: Unknown → High
status: Fix Released → Expired
Revision history for this message
In , o3ouo4yip (o3ouo4yip-linux-kernel-bugs) wrote :

Hello. I am still experiencing this bug on 6.1. Could you reopen this bug, so we will be able to solve it?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.