High power consumption using 5.0.0-25-generic

Bug #1840835 reported by George Kapetanos
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Disco
Fix Released
Medium
Unassigned
Eoan
Fix Released
Medium
Unassigned

Bug Description

=== SRU Justification ===
[Impact]
Some Nvidia graphics card has audio function in addition to video
function. When the video function is not bound to any driver, the audio
function can't be runtime suspended, hence preventing the power
resources from being turned off.

[Fix]
Allow Nvidia audio function to be runtime suspended.

[Test]
The user reported the fix works.

[Regression Potential]
Low. It only allows the audio function of discrete Nvidia GPU to be
runtime suspended. Regular audio or audio function with a discrete GPU
already bound to a driver are unaffected.

=== Original Bug Report ===
After updating to 5.0.0-25-generic kernel, battery life is considerably less and power consumption is about 10 watts more idle, using integrated graphics. This is a hybrid GPU laptop Intel HD 630/GP107M, and a possible cause is not powering off the dedicated GPU completely while using the integrated.

nvidia-driver-430
GP107M [GeForce GTX 1050 Mobile]

CVE References

Revision history for this message
George Kapetanos (kapgeorge) wrote :
Revision history for this message
George Kapetanos (kapgeorge) wrote :
Revision history for this message
lotuspsychje (lotuspsychje) wrote :

Thank you for filing the bug and make ubuntu better!

To file bugs in the future please use: ubuntu-bug packagename so relevant system info
gets pulled into your bug and the developers can help/debug you a betetr way.

You can still use: apport-collect bugID in this stage to add your info afterwards.

is it also possible you add your nvidia driver version your card/chipset uses? (nvidia-smi and/or ubuntu-drivers list)

Does this occur on other kernel versions too?

Revision history for this message
George Kapetanos (kapgeorge) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1840835

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: disco
Revision history for this message
George Kapetanos (kapgeorge) wrote :

This has only occurred on 5.0.0-25-generic.

nvidia-driver-430
GP107M [GeForce GTX 1050 Mobile]

description: updated
Revision history for this message
George Kapetanos (kapgeorge) wrote :

apport-collect has this issue:

$ apport-collect 1840835
dpkg-query: no packages found matching linux

affects: linux (Ubuntu) → ubuntu
affects: ubuntu → linux (Ubuntu)
Revision history for this message
George Kapetanos (kapgeorge) wrote :

I noticed that on 5.0.0-25-generic the module snd_hda_codec_hdmi is loaded, which is not true for 5.0.0-23-generic.

http://changelogs.ubuntu.com/changelogs/pool/main/l/linux-hwe/linux-hwe_5.0.0-25.26~18.04.1/changelog

  * Unhide Nvidia HDA audio controller (LP: #1836308)
    - PCI: Enable NVIDIA HDA controllers

There was a fix to make nvidia audio controller visible that causes that module to be loaded, forcing the dedicated video card awake.

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1836308

Workaround, adding:

blacklist snd_hda_codec_hdmi
options snd_hda_intel enable=1,0

on /etc/modprobe.d/blacklist.conf fixes the problem.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Does enable HDA's runtime power management help? Command like `sudo powertop --auto-tune`.

Revision history for this message
George Kapetanos (kapgeorge) wrote :

No, running sudo powertop --auto-tune doesn't eliminate the issue. While having all powertop tunables enabled, powertop reports ~10 watt more on 5.0.0-25

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Can you please attach dmesg? Thanks.

Revision history for this message
George Kapetanos (kapgeorge) wrote :
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

[ 4.753032] snd_hda_intel 0000:01:00.1: Handle vga_switcheroo audio client
So a device link is established.

Can you please attach the following outputs:
`cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_suspended_time`
`cat /sys/bus/pci/devices/0000:01:00.1/power/runtime_suspended_time`
`sudo lspci -vv -s 01:00.1`

Changed in linux (Ubuntu):
assignee: nobody → Kai-Heng Feng (kaihengfeng)
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

I assume "power/control" is already "auto" after `powertop --auto-tune`.

Revision history for this message
George Kapetanos (kapgeorge) wrote :

$ cat /sys/bus/pci/devices/0000:01:00.0/power/runtime_suspended_time
0
$ cat /sys/bus/pci/devices/0000:01:00.1/power/runtime_suspended_time
4432

Revision history for this message
George Kapetanos (kapgeorge) wrote :
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

My proposed patch to fix the issue:
https://lkml.org/lkml/2019/8/27/956

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

A test kernel can be found here:
https://people.canonical.com/~khfeng/lp1840835/

Revision history for this message
George Kapetanos (kapgeorge) wrote :

I installed the test kernel:
linux-headers-5.0.0-28_5.0.0-28.29~lp1840835_all.deb
linux-headers-5.0.0-28-generic_5.0.0-28.29~lp1840835_amd64.deb
linux-image-unsigned-5.0.0-28-generic_5.0.0-28.29~lp1840835_amd64.deb
linux-modules-5.0.0-28-generic_5.0.0-28.29~lp1840835_amd64.deb
linux-modules-extra-5.0.0-28-generic_5.0.0-28.29~lp1840835_amd64.deb

uname -r: 5.0.0-28-generic

Power consumption was completely normal, idling at less than 4 watts as expected. While using iGPU, I noticed that each time I run lspci, it lags for less than a second and the following messages appear on dmesg:
[ 7.587195] snd_hda_codec_hdmi hdaudioC1D0: Unable to sync register 0x7f0800. -5
[ 7.587200] snd_hda_codec_hdmi hdaudioC1D0: HDMI: invalid ELD buf size -1

On file dmesg50028 I provide the dmesg output.

I switch to nvidia gpu for test purposes. Everything is normal, and those messages only appear at gpu initialization, and lspci didnt lag at all. On file dmesg50028nvidia I provide the dmesg output after switching to nvidia gpu.

Excluding those messages, I conclude that it fixes the power consumption problem.

Revision history for this message
George Kapetanos (kapgeorge) wrote :
description: updated
Changed in linux (Ubuntu):
assignee: Kai-Heng Feng (kaihengfeng) → nobody
Seth Forshee (sforshee)
Changed in linux (Ubuntu):
status: Incomplete → Fix Committed
Stefan Bader (smb)
Changed in linux (Ubuntu Disco):
importance: Undecided → Medium
Changed in linux (Ubuntu Eoan):
importance: Undecided → Medium
Changed in linux (Ubuntu Disco):
status: New → Fix Committed
Changed in linux (Ubuntu Eoan):
status: New → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-eoan' to 'verification-done-eoan'. If the problem still exists, change the tag 'verification-needed-eoan' to 'verification-failed-eoan'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-eoan
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-disco' to 'verification-done-disco'. If the problem still exists, change the tag 'verification-needed-disco' to 'verification-failed-disco'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-disco
Revision history for this message
George Kapetanos (kapgeorge) wrote :

Enabling proposed on eoan, and upgrading kernel to 5.3.0-24-generic fixes the issue.

In eoan 19.10 the way graphics switch between nvidia and intel changed, specifically:
If system booted with Intel, nvidia is not detectable on lspci and a reboot is required to use it, so nvidia is sort of disabled and the issue doesn't exist.
If system booted with nvidia, you can switch to intel by logging out/in, but nvidia card remains powered on until next reboot, so this bug is applied on this case.

In the second case, after upgrading kernel from proposed, nvidia card was disabled when not used, as expected. So I conclude that the problem is solved.

It is worth noted that after upgrading, snd_hda_codec_hdmi displayed errors on dmesg about "hdaudioC1D0: HDMI: invalid ELD buf size -1". Also powerstat -D reports a watt elevated idling CPU power consumption from RAPL(from ~1.3 to 2.3), though it may be due to unrelated reason.

tags: added: verification-done-eoan
removed: verification-needed-eoan
Revision history for this message
Sultan Alsawaf (kerneltoast) wrote :

Could you please verify that disco is fixed as well? Thanks.

Revision history for this message
George Kapetanos (kapgeorge) wrote :

Unfortunately, the installation was upgraded to eoan and as this is my daily used system, I can't provide a disco installation during the foreseeable future. However I had verified a test kernel on 2019-08-27 #19 on disco.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (19.3 KiB)

This bug was fixed in the package linux - 5.0.0-37.40

---------------
linux (5.0.0-37.40) disco; urgency=medium

  * disco/linux: 5.0.0-37.40 -proposed tracker (LP: #1852253)

  * System hangs at early boot (LP: #1851216)
    - x86/timer: Skip PIT initialization on modern chipsets

  * drm/i915: Add support for another CMP-H PCH (LP: #1848491)
    - drm/i915/cml: Add second PCH ID for CMP

  * Some EFI systems fail to boot in efi_init() when booted via maas
    (LP: #1851810)
    - efi: efi_get_memory_map -- increase map headroom

  * seccomp: fix SECCOMP_USER_NOTIF_FLAG_CONTINUE test (LP: #1849281)
    - SAUCE: seccomp: avoid overflow in implicit constant conversion
    - SAUCE: seccomp: rework define for SECCOMP_USER_NOTIF_FLAG_CONTINUE
    - SAUCE: seccomp: fix SECCOMP_USER_NOTIF_FLAG_CONTINUE test

  * dkms artifacts may expire from the pool (LP: #1850958)
    - [Packaging] dkms -- try launchpad librarian for pool downloads
    - [Packaging] dkms -- dkms-build quieten wget verbiage

  * update ENA driver to version 2.1.0 (LP: #1850175)
    - net: ena: fix swapped parameters when calling
      ena_com_indirect_table_fill_entry
    - net: ena: fix: Free napi resources when ena_up() fails
    - net: ena: fix incorrect test of supported hash function
    - net: ena: fix return value of ena_com_config_llq_info()
    - net: ena: improve latency by disabling adaptive interrupt moderation by
      default
    - net: ena: fix ena_com_fill_hash_function() implementation
    - net: ena: add handling of llq max tx burst size
    - net: ena: ethtool: add extra properties retrieval via get_priv_flags
    - net: ena: replace free_tx/rx_ids union with single free_ids field in
      ena_ring
    - net: ena: arrange ena_probe() function variables in reverse christmas tree
    - net: ena: add newline at the end of pr_err prints
    - net: ena: documentation: update ena.txt
    - net: ena: allow automatic fallback to polling mode
    - net: ena: add support for changing max_header_size in LLQ mode
    - net: ena: optimise calculations for CQ doorbell
    - net: ena: add good checksum counter
    - net: ena: use dev_info_once instead of static variable
    - net: ena: add MAX_QUEUES_EXT get feature admin command
    - net: ena: enable negotiating larger Rx ring size
    - net: ena: make ethtool show correct current and max queue sizes
    - net: ena: allow queue allocation backoff when low on memory
    - net: ena: add ethtool function for changing io queue sizes
    - net: ena: remove inline keyword from functions in *.c
    - net: ena: update driver version from 2.0.3 to 2.1.0
    - net: ena: Fix bug where ring allocation backoff stopped too late
    - Revert "net: ena: ethtool: add extra properties retrieval via
      get_priv_flags"
    - net: ena: don't wake up tx queue when down
    - net: ena: clean up indentation issue

  * Add Intel Comet Lake ethernet support (LP: #1848555)
    - SAUCE: e1000e: Add support for Comet Lake

  * Intel Wireless AC 3168 on Eoan complaints FW error in SYNC CMD
    GEO_TX_POWER_LIMIT (LP: #1846016)
    - iwlwifi: exclude GEO SAR support for 3168

  * tsc marked unstable after entered PC10 on Intel CoffeeLake (LP: #1840239...

Changed in linux (Ubuntu Disco):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (33.2 KiB)

This bug was fixed in the package linux - 5.3.0-24.26

---------------
linux (5.3.0-24.26) eoan; urgency=medium

  * eoan/linux: 5.3.0-24.26 -proposed tracker (LP: #1852232)

  * Eoan update: 5.3.9 upstream stable release (LP: #1851550)
    - io_uring: fix up O_NONBLOCK handling for sockets
    - dm snapshot: introduce account_start_copy() and account_end_copy()
    - dm snapshot: rework COW throttling to fix deadlock
    - Btrfs: fix inode cache block reserve leak on failure to allocate data space
    - btrfs: qgroup: Always free PREALLOC META reserve in
      btrfs_delalloc_release_extents()
    - iio: adc: meson_saradc: Fix memory allocation order
    - iio: fix center temperature of bmc150-accel-core
    - libsubcmd: Make _FORTIFY_SOURCE defines dependent on the feature
    - perf tests: Avoid raising SEGV using an obvious NULL dereference
    - perf map: Fix overlapped map handling
    - perf script brstackinsn: Fix recovery from LBR/binary mismatch
    - perf jevents: Fix period for Intel fixed counters
    - perf tools: Propagate get_cpuid() error
    - perf annotate: Propagate perf_env__arch() error
    - perf annotate: Fix the signedness of failure returns
    - perf annotate: Propagate the symbol__annotate() error return
    - perf annotate: Fix arch specific ->init() failure errors
    - perf annotate: Return appropriate error code for allocation failures
    - perf annotate: Don't return -1 for error when doing BPF disassembly
    - staging: rtl8188eu: fix null dereference when kzalloc fails
    - RDMA/siw: Fix serialization issue in write_space()
    - RDMA/hfi1: Prevent memory leak in sdma_init
    - RDMA/iw_cxgb4: fix SRQ access from dump_qp()
    - RDMA/iwcm: Fix a lock inversion issue
    - HID: hyperv: Use in-place iterator API in the channel callback
    - kselftest: exclude failed TARGETS from runlist
    - selftests/kselftest/runner.sh: Add 45 second timeout per test
    - nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request
    - arm64: cpufeature: Effectively expose FRINT capability to userspace
    - arm64: Fix incorrect irqflag restore for priority masking for compat
    - arm64: ftrace: Ensure synchronisation in PLT setup for Neoverse-N1 #1542419
    - tty: serial: owl: Fix the link time qualifier of 'owl_uart_exit()'
    - tty: serial: rda: Fix the link time qualifier of 'rda_uart_exit()'
    - serial/sifive: select SERIAL_EARLYCON
    - tty: n_hdlc: fix build on SPARC
    - misc: fastrpc: prevent memory leak in fastrpc_dma_buf_attach
    - RDMA/core: Fix an error handling path in 'res_get_common_doit()'
    - RDMA/cm: Fix memory leak in cm_add/remove_one
    - RDMA/nldev: Reshuffle the code to avoid need to rebind QP in error path
    - RDMA/mlx5: Do not allow rereg of a ODP MR
    - RDMA/mlx5: Order num_pending_prefetch properly with synchronize_srcu
    - RDMA/mlx5: Add missing synchronize_srcu() for MW cases
    - gpio: max77620: Use correct unit for debounce times
    - fs: cifs: mute -Wunused-const-variable message
    - arm64: vdso32: Fix broken compat vDSO build warnings
    - arm64: vdso32: Detect binutils support for dmb ishld
    - serial: mctrl_gpio: Check for NULL pointer
    - serial: 8250_...

Changed in linux (Ubuntu Eoan):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.