15.10 20150826 snapshot -- MacBookPro3,1 has high load (and temperature) due to unhandled external power state GPE

Bug #1489622 reported by Jeremy Sequoia
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
High
Unassigned
Wily
Confirmed
High
Unassigned

Bug Description

I just put the 20150826 15.10 snapshot onto my MacBookPro3,1. I blacklisted nouveau due to https://bugs.freedesktop.org/show_bug.cgi?id=91779 and updated systemd to start in text mode instead of launching X11.

After booting into this configuration and letting the system idle, it is consuming only about 3% CPU but has a load average of 1.5

$ uptime
 14:23:00 up 11 min, 3 users, load average: 1.65, 1.55, 0.95

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 1 0 0 3803976 25900 109412 0 0 247 3 5519 6495 1 4 92 2 0
 0 0 0 3803928 25900 109412 0 0 0 0 9652 11390 0 5 95 0 0
 0 0 0 3803928 25900 109412 0 0 0 0 9680 11429 0 2 98 0 0
 0 0 0 3803928 25900 109412 0 0 0 0 9651 11344 0 2 97 1 0
 1 0 0 3803896 25900 109412 0 0 0 0 9659 11269 1 2 97 0 0
 0 0 0 3803928 25900 109412 0 0 0 0 9660 11329 0 4 96 0 0
 0 0 0 3803928 25900 109412 0 0 0 0 9589 11236 0 4 96 0 0
 1 0 0 3803960 25900 109412 0 0 0 0 9531 11203 1 3 96 0 0
 0 0 0 3803928 25900 109412 0 0 0 0 9590 11217 0 4 96 0 0
 0 0 0 3803896 25900 109412 0 0 0 0 9601 11390 0 3 97 0 0
 0 0 0 3803864 25900 109412 0 0 0 0 9527 11221 0 3 97 0 0
 1 0 0 3803864 25900 109412 0 0 0 0 9652 11380 0 4 96 0 0
 0 0 0 3803832 25900 109412 0 0 0 0 9661 11447 0 4 96 0 0
 1 0 0 3803800 25900 109412 0 0 0 0 9595 11372 1 4 96 0 0
 1 0 0 3803800 25900 109412 0 0 0 0 9569 11232 0 1 99 0 0
 1 0 0 3803864 25900 109412 0 0 0 0 9538 11400 0 5 95 0 0
 0 0 0 3803896 25900 109412 0 0 0 0 9520 11205 0 5 95 0 0
 0 0 0 3803864 25900 109412 0 0 0 0 9468 11121 0 2 98 0 0
 0 0 0 3803864 25900 109412 0 0 0 0 9678 11436 0 4 96 0 0
 1 0 0 3803864 25900 109412 0 0 0 0 9639 11229 1 4 96 0 0
 0 0 0 3803896 25900 109412 0 0 0 0 9776 11504 0 3 97 0 0
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 1 0 0 3803912 25900 109412 0 0 0 0 9493 11253 0 2 97 1 0
 1 0 0 3803944 25900 109412 0 0 0 0 9640 11303 0 3 97 0 0
 1 0 0 3803880 25900 109412 0 0 0 0 9708 11373 0 5 95 0 0
 1 0 0 3803944 25900 109412 0 0 0 0 9824 11603 0 3 97 0 0
 2 0 0 3803912 25900 109412 0 0 0 0 9584 11216 1 4 95 0 0
 0 0 0 3803076 25904 109416 0 0 8 0 9757 11438 1 1 96 2 0
 1 0 0 3803076 25904 109416 0 0 0 0 9741 11548 0 2 98 0 0
 1 0 0 3803076 25904 109416 0 0 0 0 9635 11241 0 3 97 0 0
 0 0 0 3803076 25904 109416 0 0 0 0 9739 11475 0 3 97 0 0
 0 0 0 3803076 25904 109416 0 0 0 0 9786 11563 0 3 97 0 0
 1 0 0 3803044 25912 109408 0 0 0 48 9509 11125 0 4 95 1 0
 1 0 0 3803076 25912 109416 0 0 0 0 9696 11428 0 3 97 0 0
 0 0 0 3803076 25912 109416 0 0 0 0 9434 11053 0 1 99 0 0
 0 0 0 3803044 25912 109416 0 0 0 0 9613 11296 0 4 96 0 0
 0 0 0 3803044 25912 109416 0 0 0 0 9450 11133 0 4 96 0 0
 0 0 0 3803076 25912 109416 0 0 0 0 9661 11377 0 4 96 0 0
 0 0 0 3803044 25912 109416 0 0 0 0 9601 11241 0 2 98 0 0
 1 0 0 3803076 25912 109416 0 0 0 0 9640 11357 0 2 98 0 0
 1 0 0 3803076 25912 109416 0 0 0 0 9682 11393 0 3 97 0 0
 1 0 0 3803076 25912 109416 0 0 0 0 9744 11451 1 5 94 1 0
 0 0 0 3803076 25912 109416 0 0 0 0 9542 11174 0 3 97 0 0
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r b swpd free buff cache si so bi bo in cs us sy id wa st
 1 0 0 3803076 25912 109416 0 0 0 0 9659 11393 0 3 97 0 0
 0 0 0 3803076 25912 109416 0 0 0 0 9759 11476 0 2 98 0 0
 1 0 0 3803076 25912 109416 0 0 0 0 9629 11271 0 3 97 0 0
 1 0 0 3803108 25912 109416 0 0 0 0 9710 11420 0 4 96 0 0
 0 0 0 3803076 25912 109416 0 0 0 0 9502 11166 1 3 96 0 0
 1 0 0 3803076 25912 109416 0 0 0 0 9754 11423 0 3 97 0 0
 1 0 0 3803108 25912 109416 0 0 0 0 9849 11575 0 4 96 0 0
 1 0 0 3803076 25912 109416 0 0 0 0 9645 11384 0 5 95 0 0
 1 0 0 3803108 25912 109416 0 0 0 0 9580 11277 0 2 98 0 0
 0 0 0 3803108 25912 109416 0 0 0 0 9282 10918 0 4 96 0 0
 1 0 0 3803076 25912 109416 0 0 0 0 9537 11203 0 4 96 0 0
 1 0 0 3803108 25912 109416 0 0 0 0 9793 11524 0 3 97 0 0
 0 0 0 3803076 25912 109416 0 0 0 0 9838 11556 0 4 96 0 0
 1 0 0 3803076 25912 109416 0 0 0 0 9410 11037 1 2 97 0 0
 1 0 0 3803108 25912 109416 0 0 0 0 9604 11378 0 2 98 0 0
 0 0 0 3803076 25912 109416 0 0 0 0 9603 11305 0 3 97 0 0
 0 0 0 3803108 25912 109416 0 0 0 0 9404 11070 0 4 96 0 0
 0 0 0 3803076 25912 109416 0 0 0 0 9736 11475 0 3 97 0 0
 1 0 0 3803172 25912 109416 0 0 0 0 9672 11279 0 4 96 1 0
 0 0 0 3803172 25912 109416 0 0 0 8 9655 11373 0 2 98 0 0
 0 0 0 3803172 25912 109416 0 0 0 0 9598 11301 0 4 96 0 0

iotop similarly shows nothing interesting:

Total DISK READ : 0.00 B/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 0.00 B/s

ProblemType: Bug
DistroRelease: Ubuntu 15.10
Package: linux-image-4.1.0-3-generic 4.1.0-3.3 [modified: boot/vmlinuz-4.1.0-3-generic]
ProcVersionSignature: Ubuntu 4.1.0-3.3-generic 4.1.3
Uname: Linux 4.1.0-3-generic x86_64
ApportVersion: 2.18-0ubuntu7
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/pcmC0D0c: jeremy 2048 F...m pulseaudio
 /dev/snd/pcmC0D0p: jeremy 2048 F...m pulseaudio
 /dev/snd/controlC0: jeremy 2048 F.... pulseaudio
Date: Thu Aug 27 14:26:33 2015
HibernationDevice: RESUME=UUID=2ede38d6-1e71-41fe-9c70-4e9cb81059f3
InstallationDate: Installed on 2015-08-27 (0 days ago)
InstallationMedia: Ubuntu 15.10 "Wily Werewolf" - Alpha amd64 (20150826)
MachineType: Apple Inc. MacBookPro3,1
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 EFI VGA
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.1.0-3-generic.efi.signed root=/dev/mapper/ubuntu--vg-root ro text
RelatedPackageVersions:
 linux-restricted-modules-4.1.0-3-generic N/A
 linux-backports-modules-4.1.0-3-generic N/A
 linux-firmware 1.147
SourcePackage: linux
UdevLog: Error: [Errno 2] No such file or directory: '/var/log/udev'
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 03/05/08
dmi.bios.vendor: Apple Inc.
dmi.bios.version: MBP31.88Z.0070.B07.0803051658
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: Mac-F4238BC8
dmi.board.vendor: Apple Inc.
dmi.board.version: PVT
dmi.chassis.asset.tag: Asset Tag#
dmi.chassis.type: 2
dmi.chassis.vendor: Apple Inc.
dmi.chassis.version: Mac-F4238BC8
dmi.modalias: dmi:bvnAppleInc.:bvrMBP31.88Z.0070.B07.0803051658:bd03/05/08:svnAppleInc.:pnMacBookPro3,1:pvr1.0:rvnAppleInc.:rnMac-F4238BC8:rvrPVT:cvnAppleInc.:ct2:cvrMac-F4238BC8:
dmi.product.name: MacBookPro3,1
dmi.product.version: 1.0
dmi.sys.vendor: Apple Inc.

Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Jeremy Sequoia (jeremyhu) wrote : Re: 15.10 20150826 snapshot -- MacBookPro3,1 has high load (and temperature) despite low CPU and low IO

I wonder if it is ACPI related:

$ uptime && cat /sys/firmware/acpi/interrupts/gpe17
 16:22:48 up 0 min, 1 user, load average: 2.26, 0.74, 0.26
  376509 enabled

that's a lot of interrupts for a system that has been up less than a minute.

Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :

Although even with gpe17, I'm still seeing the load average around 1.25 with perf reporting a lot of time in acpi_processor_ffh_cstate_enter

- 43.32% 43.32% swapper [kernel.kallsyms] [k] acpi_processor_ffh_cstate_enter

Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :
Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :
Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :

On one particular boot with the external power connected, there were only 1176 interrupts for this event a few minutes after boot. It was stable at 1176. Upon disconnecting external power, it started its rapid cycling. Reconnecting the external power source did not solve the problem.

On another boot when running on battery, the same thing occurred upon connecting the external power source.

When in such a stable state for the interrupts, behavior matches them being disabled (which one would likely expect), with load averaging around 1.15-1.25 (compared to 1.65-1.95 when the interrupts are constantly firing).

So we probably need a quirk to just ignore this power-state-change GPE and still need to figure out the rest of the issue.

Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :

Upon trying it again, I'm actually seeing the load drop down in the "steady gpe17 after boot" case. I just didn't wait long enough for the boot load spike to get out of the average. Please disregard that second half of my previous comment.

Here are some more experiments:

With processor.max_cstate=0 in boot args and without changing battery/external power state, I'm seeing a steady gpe17 interrupt count, and the 1-minute load after being booted for 5 minutes at 0.01. Upon disconnecting external power, I do not see the load jump and gpe17 remains steady.

With idle=halt in boot args and without changing battery/external power state, I'm seeing a steady gpe17 interrupt count, and the 1-minute load after being booted for 5 minutes at 0.02. Upon disconnecting external power, I do see gpe17 spin and the 1-minute load jump to about 1.50 after two minutes and perf top showing:

Samples: 21K of event 'cycles', Event count (approx.): 2163359670
Overhead Shared Object Symbol
  45.50% [kernel] [k] acpi_os_read_port
   5.73% [kernel] [k] acpi_ec_read_status.isra.4

With idle=mwait in boot args and without changing battery/external power state, I'm seeing a steady gpe17 interrupt count, and the 1-minute load after being booted for 5 minutes at 0.02. Upon disconnecting external power, I do see gpe17 spin and the 1-minute load jump to about 1.45 after two minutes and perf top showing:

Samples: 100K of event 'cycles', Event count (approx.): 7259748699
Overhead Shared Object Symbol
  34.33% [kernel] [k] acpi_os_read_port
  20.85% [kernel] [k] read_hpet
   6.56% [kernel] [k] acpi_processor_ffh_cstate_enter
   4.28% [kernel] [k] acpi_ec_read_status.isra.4

In the two cases where gpe17 is spinning, disabling it drops the load back down to about 1.0 with perf top then showing:

Samples: 127K of event 'cycles', Event count (approx.): 1193854375
Overhead Shared Object Symbol
  40.05% [kernel] [k] acpi_processor_ffh_cstate_enter
   8.25% [kernel] [k] read_hpet
   5.06% [kernel] [k] _raw_spin_lock
   2.46% [kernel] [k] acpi_os_read_port
   0.99% [kernel] [k] hpet_legacy_next_event

Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :

However, now on other attempts, I am seeing that processor.max_cstate=0 does not always eliminate the issue as I saw in the previous comment. Puzzling.

Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :

I don't see this issue on the 12.04.5 Live CD (3.13.0-32-generic kernel)

Changed in linux (Ubuntu):
importance: Undecided → Low
importance: Low → Undecided
importance: Undecided → High
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.2 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.2-unstable/

tags: added: kernel-da-key
Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :

Will do.

Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :

The issue persists with the current 4.2 build.

tags: added: kernel-bug-exists-upstream
Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :

v3.16.7-ck16-utopic and v3.17.8-vivid do not contain the issue.

v3.18-rc1-utopic contains the issue.

Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :

Comparing the commit log for driver/acpi changes between 3.17 and 3.18-rc1, I notice this commit standing out as the most likely related commit:

commit 9faf6136ff4647452580b019f4b16f8c5082e589
Author: Matthew Garrett <email address hidden>
Date: Sat Sep 20 13:19:46 2014 +0200

    ACPI / SBS: Disable smart battery manager on Apple

    Touching the smart battery manager at all on Apple hardware appears to
    make it unhappy - unplugging the AC adapter triggers accesses that hang
    the controller for several minutes. Quirk it out via DMI in order to
    avoid this. Compensate by changing battery presence if we fail to
    communicate with the battery.

    Signed-off-by: Matthew Garrett <email address hidden>
    Signed-off-by: Andreas Noever <email address hidden>
    Signed-off-by: Rafael J. Wysocki <email address hidden>

summary: 15.10 20150826 snapshot -- MacBookPro3,1 has high load (and temperature)
- despite low CPU and low IO
+ due to unhandled external power state GPE
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

Once this bug is reported upstream, please add the tag: 'kernel-bug-reported-upstream'.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :

I'd be happy to, but first I want to reproduce it in a kernel I built myself instead of one that I got from Ubuntu. I am not able to do that.

The issue does not reproduce for me when I use linux 3.18-rc1 built myself, using the same kernel configuration as v3.18-rc1-utopic.

I installed linux-image-3.18.0-031800rc1-generic_3.18.0-031800rc1.201410192135_amd64.deb and copied its configuration from /boot and then used it to build my own kernel, and the resulting kernel that I built does not have this issue. What I did to build my kernel:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
cd linux-stable
git remote add ubuntu-trusty git://kernel.ubuntu.com/ubuntu/ubuntu-trusty.git
git checkout ubuntu-trusty/master

cp -a /usr/share/kernel-package ../ubuntu-package
cp debian/control-scripts/{postinst,postrm,preinst,prerm} ../ubuntu-package/pkg/image/
cp debian/control-scripts/headers-postinst ../ubuntu-package/pkg/headers/

git checkout v3.18-rc1

sudo make-kpkg clean
sudo make-kpkg --initrd --overlay-dir=../ubuntu-package kernel_image kernel_headers

sudo dpkg -i ../*deb

I rebooted, verified it was my kernel with 'uname -a' and attempted to reproduce the issue, but it did not reproduce.

What would be differnet between the way I built the kernel and the way Ubuntu built linux-image-3.18.0-031800rc1-generic_3.18.0-031800rc1.201410192135_amd64.deb ?

Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :

Still reproduces on 2016.04.1's kernel.

I narrowed it down to the sbs kernel module. If I move the sbs module out of /lib/modules (or just build my kernel without it), the issue does not reproduce.

Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :

Which exactly matches my comment above about this being the likely culprit:

commit 9faf6136ff4647452580b019f4b16f8c5082e589
Author: Matthew Garrett <email address hidden>
Date: Sat Sep 20 13:19:46 2014 +0200

    ACPI / SBS: Disable smart battery manager on Apple

    Touching the smart battery manager at all on Apple hardware appears to
    make it unhappy - unplugging the AC adapter triggers accesses that hang
    the controller for several minutes. Quirk it out via DMI in order to
    avoid this. Compensate by changing battery presence if we fail to
    communicate with the battery.

    Signed-off-by: Matthew Garrett <email address hidden>
    Signed-off-by: Andreas Noever <email address hidden>
    Signed-off-by: Rafael J. Wysocki <email address hidden>

Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :

I just sent a ping email to Matthew about this. I emailed him last year without response.

Can we just revert that change in the Ubuntu kernel to get things back to a usable state?

Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :

The last hunk looks quite wrong to me. The manager_present logic looks inverted.

- result = acpi_manager_get_info(sbs);
- if (!result) {
- sbs->manager_present = 1;
- for (id = 0; id < MAX_SBS_BAT; ++id)
- if ((sbs->batteries_supported & (1 << id)))
- acpi_battery_add(sbs, id);
- } else
+ result = 0;
+
+ if (!sbs_manager_broken) {
+ result = acpi_manager_get_info(sbs);
+ if (!result) {
+ sbs->manager_present = 0;
+ for (id = 0; id < MAX_SBS_BAT; ++id)
+ if ((sbs->batteries_supported & (1 << id)))
+ acpi_battery_add(sbs, id);
+ }
+ }
+
+ if (!sbs->manager_present)
                acpi_battery_add(sbs, 0);

Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :

Actually, the obvious logic bug was fixed by 61f8ff69 (which is already in 4.4), but the other issue still remains.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did you ever receive any feedback from Matthew regarding that commit?

Revision history for this message
Jeremy Sequoia (jeremyhu) wrote :

Nope. But I did do a manual change to remove the Apple quirk (the revert isn't clean due to additional changes, but it's simple enough to remove the few lines that setup the quirk). I was running fine for a while, but eventually started seeing the same behavior (/sys/firmware/acpi/interrupts/gpe17 forever increasing and load spiking), so maybe something else is going south and this change just made it trigger reliably sooner.

Either way, I'm basically running without SBS on my system for now. Luckily, the MBP3,1 has an LED battery level indicator on the battery itself, but users of newer models aren't so fortunate.

Revision history for this message
bcowan (bradly-cowan) wrote :

Any update to this bug? I narrowed this down to sbs and the same "kernel fix" several years ago and have been running without sbs since. It's still present on 4.11. Sucks not having battery and charging info available. Btw this is on a MacBook 2,1. I also didn't see this on systems without systemd if something in that code or module loading/parallel has anything to do with it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.