Power consumption regression after upgrade to 20.10 5.8.0-31-generic

Bug #1907212 reported by George Kapetanos on 2020-12-08
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned

Bug Description

After upgrading to Ubuntu 20.10 which ships kernel 5.8.0-31-generic, battery life is considerably less and power consumption is about 10 watts more idle, (idling at 14-15 watts) using integrated graphics. This is a hybrid GPU laptop Intel HD 630/GP107M, and based on past experiences a possible cause is not powering off the dedicated GPU completely while using the integrated.

Running sudo powertop --auto-tune doesn't eliminate the issue.

$ cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_status
suspended
$ cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_suspended_time
1815948

$ cat /sys/bus/pci/devices/0000:01:00.1/power/runtime_status
active
$ cat /sys/bus/pci/devices/0000:01:00.1/power/runtime_suspended_time
3827

Device "01:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller" appears active, does this mean that the audio subsystem of dGPU keeps it enabled, causing the 10-watt consumption?

Revision history for this message
George Kapetanos (kapgeorge) wrote :
Revision history for this message
George Kapetanos (kapgeorge) wrote :
Revision history for this message
George Kapetanos (kapgeorge) wrote :
summary: - Power consumption regression after ugrade to 20.10 5.8.0-31-generic
+ Power consumption regression after upgrade to 20.10 5.8.0-31-generic
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
tags: added: groovy
Revision history for this message
George Kapetanos (kapgeorge) wrote :

This error on dmesg is relevant `snd_hda_intel 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible)`, if I unbind the nvidia audio device function with `echo 0000:01:00.1 > /sys/module/snd_hda_intel/drivers/pci:snd_hda_intel/unbind` power consumption returns to normal, idling at 5-6 watts and runtime_status becomes suspended.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

What was the last working version?

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
George Kapetanos (kapgeorge) wrote :

It was certainly working with 5.4.0-* from 20.04 at some time, I have been mainly using dGPU for 2-3 months, and wouldn't have noticed any regressions.

Strangely, I tested with 5.4.0-56-generic that remained after the upgrade, and it is also not powering down dGPU.

20.10 didn't upgrade nvidia driver, I currently have 450.80.02

Something important I just noticed, at 19.10 and 20.04 nvidia vga and related audio function devices weren't visible at lspci when booting with intel gpu only, but after upgrading to 20.10 they are visible again when booting with intel gpu. I suspect this mechanism was essential for power management and it has changed with 20.10

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Please test latest mainline kernel:
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.10-rc6/amd64/

And attach dmesg here.

Revision history for this message
George Kapetanos (kapgeorge) wrote :

$ uname -r
5.10.0-051000rc6-generic

Dmesg attached after booting with intel gpu and 5.10, high power consumption remains, again
`echo 0000:01:00.1 | sudo tee /sys/module/snd_hda_intel/drivers/pci:snd_hda_intel/unbind` works.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Can you please test this kernel:
https://people.canonical.com/~khfeng/lp1907212/

Revision history for this message
George Kapetanos (kapgeorge) wrote :

Dmesg after booting with provided kernel https://people.canonical.com/~khfeng/lp1907212/

High power consumption remains, `echo 0000:01:00.1 | sudo tee /sys/module/snd_hda_intel/drivers/pci:snd_hda_intel/unbind` works.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

What's the "runtime_status" and "control" of device 0000:01:00.1?

Revision history for this message
George Kapetanos (kapgeorge) wrote :

$ cat /sys/bus/pci/devices/0000:01:00.1/power/runtime_status
active
$ cat /sys/bus/pci/devices/0000:01:00.1/power/control
auto

They are the same with both 5.8.0-31-generic and 5.8.0-35-generic from https://people.canonical.com/~khfeng/lp1907212/

Changed in linux (Ubuntu):
assignee: nobody → Kai-Heng Feng (kaihengfeng)
status: Incomplete → In Progress
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
George Kapetanos (kapgeorge) wrote :

With this kernel 5.8.0-35-generic #37~lp1907212+2 nvidia gpu is successfully powered down, and idle power consumption was normal at 5-6 watt.

$ cat /sys/bus/pci/devices/0000:01:00.1/power/runtime_status
suspended
$ cat /sys/bus/pci/devices/0000:01:00.1/power/control
auto

On dmesg output there are a lot more `snd_hda_intel 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible)` complaints.

I also rebooted at nvidia performance mode, everything is normal but `0000:01:00.1/power/runtime_status` is again suspended. I don't know if it is supposed to be like this but with 5.8.0-31-generic when booting with nvidia performance mode, `0000:01:00.1/power/runtime_status` is active.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
George Kapetanos (kapgeorge) wrote :

With this one 5.8.0-35-generic #37~lp1907212+3, nvidia gpu isn't powered down:
$ cat /sys/bus/pci/devices/0000:01:00.1/power/runtime_status
active
$ cat /sys/bus/pci/devices/0000:01:00.1/power/control
auto
At ~242 there are also snd_hda_intel errors on attached dmesg, which are later reprinted every ~120 seconds.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Thanks. Please bear with me and test this one:
https://people.canonical.com/~khfeng/lp1907212+4/

Revision history for this message
George Kapetanos (kapgeorge) wrote :

5.8.0-35-generic #37~lp1907212+4
Nvidia gpu successfully powered down.
$ cat /sys/bus/pci/devices/0000:01:00.1/power/runtime_status
suspended

The previous snd_hda_intel error messages aren't printed anymore, yet "snd_hda_intel 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible)" are printed about 20 times again at boot.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Does this happen when boot with HDMI/DP connected?

Revision history for this message
George Kapetanos (kapgeorge) wrote :

Those errors aren't printed when booting with hdmi connected. Attached related dmesg output. Device `01:00.1` is suspended.

I can confirm from documentation about this laptop (Acer aspire A715-71G) that the embedded display is connected to the intel GPU (eDP) and the HDMI port is connected to the nvidia GPU, this must be the most probable case on these laptops. Otherwise, maybe `01:00.1` device wouldn't exist.

When booting with intel gpu only, HDMI interface isn't listed on xrandr and HDMI screen has no input. This is supposed to happen as HDMI is connected to nvidia and nvidia should be powered off, right? This also happens with previous kernels and should not be related with this bug.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Turns out this is going to be really hard. It requires change in both PCI and HDA, and potentially in nouveau and proprietary nvidia driver.

I'll revisit this once upstream devs have a concrete plan.

Changed in linux (Ubuntu):
status: In Progress → Confirmed
assignee: Kai-Heng Feng (kaihengfeng) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers