power off stress test will hang on the TGL machines

Bug #1919930 reported by Hui Wang
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
High
Hui Wang
Groovy
Fix Released
High
Unassigned
Hirsute
Fix Released
High
Hui Wang
linux-oem-5.10 (Ubuntu)
Invalid
High
Unassigned
Focal
Fix Released
High
Unassigned

Bug Description

Intel suggested that we do 2 actions to fix this problem, the 1st is
merging 5 kernel patches, this only applies to H and OEM-5.10 since
there is no tgl.c in the groovy kernel yet. the 2nd is change a kernel
config, this change applies to H, G and OEM-5.10.

https://github.com/thesofproject/linux/issues/2781

[Impact]
When we run poweroff/on stress test on some lenovo TGL laptop, the
system will randomly hang, and when this issue happens, the dmesg
shows the sof audio driver fails.

[Fix]
Intel recommend that we backport 5 kernel patches and change a
kernel config.

[Test]
After applying the changes, and test on TGL/cml/whl machines,
the audio function works as good as before, and the poweroff stress
test didn't hang anymore.

[Where problems could occur]
The kernel patches probably could introduce issues when system
powre off or reboot on TGL machines, but this possibility is low
since we have tested these patches on different TGL machines.

the kernel option change could introduce power consumption
regression, but it only affects power saving and package_cstate values
when any capture stream is active, while no impact if all capture
streams are inactive. that is to say, in theory it will not impact
the power consumption in short idle or long idle. And I checked the
system cound enter package_c10 after this change.

Hui Wang (hui.wang)
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux-oem-5.10 (Ubuntu):
importance: Undecided → High
no longer affects: linux-oem-5.10 (Ubuntu Groovy)
no longer affects: linux (Ubuntu Focal)
no longer affects: linux-oem-5.10 (Ubuntu Hirsute)
Changed in linux (Ubuntu Groovy):
importance: Undecided → High
Changed in linux-oem-5.10 (Ubuntu Focal):
importance: Undecided → High
Changed in linux (Ubuntu Groovy):
status: New → In Progress
Changed in linux (Ubuntu Hirsute):
status: New → In Progress
Changed in linux-oem-5.10 (Ubuntu):
status: New → In Progress
Changed in linux-oem-5.10 (Ubuntu Focal):
status: New → In Progress
Hui Wang (hui.wang)
description: updated
Hui Wang (hui.wang)
tags: added: originate-from-1906747
Timo Aaltonen (tjaalton)
Changed in linux-oem-5.10 (Ubuntu Focal):
status: In Progress → Fix Committed
Tim Gardner (timg-tpi)
Changed in linux (Ubuntu Groovy):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Hirsute):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Hui Wang (hui.wang) wrote :

tested on the lenovo machine, stress poweroff not hang

tags: added: verification-done-focal
removed: verification-needed-focal
Timo Aaltonen (tjaalton)
Changed in linux-oem-5.10 (Ubuntu):
status: In Progress → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-5.10 - 5.10.0-1021.22

---------------
linux-oem-5.10 (5.10.0-1021.22) focal; urgency=medium

  * focal/linux-oem-5.10: 5.10.0-1021.22 -proposed tracker (LP: #1922921)

  * Display abnormal on the TGL+4k panel machines (LP: #1922885)
    - drm/i915/display: Do not allow DC3CO if PSR SF is enabled
    - SAUCE: drm/i915/display/psr: Disable DC3CO when the PSR2 is used

  * Fix mic on P620 after S3 resume (LP: #1921757)
    - ALSA: usb-audio: Carve out connector value checking into a helper
    - ALSA: usb-audio: Check connector value on resume

 -- Timo Aaltonen <email address hidden> Wed, 07 Apr 2021 18:07:55 +0300

Changed in linux-oem-5.10 (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (37.7 KiB)

This bug was fixed in the package linux - 5.11.0-14.15

---------------
linux (5.11.0-14.15) hirsute; urgency=medium

  * hirsute/linux: 5.11.0-14.15 -proposed tracker (LP: #1923103)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Include Infiniband Peer Memory interface (LP: #1923104)
    - SAUCE: RDMA/core: Introduce peer memory interface

  * Hirsute update: v5.11.12 upstream stable release (LP: #1923069)
    - arm64: mm: correct the inside linear map range during hotplug check
    - virtiofs: Fail dax mount if device does not support it
    - ext4: shrink race window in ext4_should_retry_alloc()
    - ext4: fix bh ref count on error paths
    - fs: nfsd: fix kconfig dependency warning for NFSD_V4
    - rpc: fix NULL dereference on kmalloc failure
    - iomap: Fix negative assignment to unsigned sis->pages in
      iomap_swapfile_activate
    - ASoC: rt1015: fix i2c communication error
    - ASoC: rt5640: Fix dac- and adc- vol-tlv values being off by a factor of 10
    - ASoC: rt5651: Fix dac- and adc- vol-tlv values being off by a factor of 10
    - ASoC: sgtl5000: set DAP_AVC_CTRL register to correct default value on probe
    - ASoC: es8316: Simplify adc_pga_gain_tlv table
    - ASoC: soc-core: Prevent warning if no DMI table is present
    - ASoC: cs42l42: Fix Bitclock polarity inversion
    - ASoC: cs42l42: Fix channel width support
    - ASoC: cs42l42: Fix mixer volume control
    - ASoC: cs42l42: Always wait at least 3ms after reset
    - NFSD: fix error handling in NFSv4.0 callbacks
    - ASoC: mediatek: mt8192: fix tdm out data is valid on rising edge
    - kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing
    - vhost: Fix vhost_vq_reset()
    - io_uring: fix ->flags races by linked timeouts
    - io_uring: halt SQO submission on ctx exit
    - scsi: st: Fix a use after free in st_open()
    - scsi: qla2xxx: Fix broken #endif placement
    - staging: comedi: cb_pcidas: fix request_irq() warn
    - staging: comedi: cb_pcidas64: fix request_irq() warn
    - ASoC: rt5659: Update MCLK rate in set_sysclk()
    - ASoC: rt711: add snd_soc_component remove callback
    - thermal/core: Add NULL pointer check before using cooling device stats
    - locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling
    - locking/ww_mutex: Fix acquire/release imbalance in
      ww_acquire_init()/ww_acquire_fini()
    - nvmet-tcp: fix kmap leak when data digest in use
    - io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls
    - Revert "PM: ACPI: reboot: Use S5 for reboot"
    - nouveau: Skip unvailable ttm page entries
    - static_call: Align static_call_is_init() patching condition
    - ext4: do not iput inode under running transaction in ext4_rename()
    - io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with
      MSG_WAITALL
    - net: mvpp2: fix interrupt mask/unmask skip condition
    - mptcp: deliver ssk errors to msk
    - mptcp: fix poll after shutdown
    - mptcp: init mptcp request socket earlier
    - mptcp: add a missing retransmission timer scheduling
    - flow_dissector: fix TTL and TOS dissection on IPv4 fragments
    - mptcp: fix DATA_FIN processing f...

Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-groovy' to 'verification-done-groovy'. If the problem still exists, change the tag 'verification-needed-groovy' to 'verification-failed-groovy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-groovy
Revision history for this message
Hui Wang (hui.wang) wrote :

Installed the groovy proposed kernel, did 30times poweroff stress test, there is no hang. verification done with groovy.

tags: added: verification-done-groovy
removed: verification-needed-groovy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.8.0-53.60

---------------
linux (5.8.0-53.60) groovy; urgency=medium

  * CVE-2021-3491
    - io_uring: fix provide_buffers sign extension
    - io_uring: fix overflows checks in provide buffers
    - SAUCE: proc: Avoid mixing integer types in mem_rw()
    - SAUCE: io_uring: truncate lengths larger than MAX_RW_COUNT on provide
      buffers

  * CVE-2021-3490
    - bpf: Fix a verifier failure with xor
    - SAUCE: bpf: verifier: fix ALU32 bounds tracking with bitwise ops

  * CVE-2021-3489
    - SAUCE: bpf: ringbuf: deny reserve of buffers larger than ringbuf
    - SAUCE: bpf: prevent writable memory-mapping of read-only ringbuf pages

 -- Stefan Bader <email address hidden> Thu, 06 May 2021 07:43:20 +0200

Changed in linux (Ubuntu Groovy):
status: Fix Committed → Fix Released
Changed in hwe-next:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.