SRU: Fix system hang when stress S3 on radeon with TTM

Bug #1893609 reported by AaronMa
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
Fix Released
Undecided
Unassigned
linux-oem-5.6 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned

Bug Description

SRU Justification:

[Impact]
System hang when stress S3 more than 90 times.

[Fix]
Upstream kernel is good, after bisecting, bad commit is a Ubuntu sauce
patch: "vfio -- release device lock before userspace requests"
5.8 kernel got this fix.
5.4 kernel got this commit by stable update LP:#1888560

[Test Case]
Verified for 500 times of S3, system runs good.

[Regression Potential]
Low
uptream fix for specific commit, verified with postive result.

==================================================
[Summary]
System got hang during executing the S3-30-cycle test case.

[Steps to reproduce]
$ sudo checkbox-support-fwts_test -l /home/u/suspend_30_cycles -f none -s s3 --s3-device-check --s3-device-check-delay=60 --s3-sleep-delay=60 --s3-multiple=30

[Expected result]
The test script should be finished smoothly.

[Actual result]
System got hang at the 7th or 8th S3 resumed back.

[Failure rate]
100%

[Additional information]
CPU: Intel(R) Core(TM) i5-10600 CPU @ 3.30GHz (12x)
GPU: 00:02.0 Display controller [0380]: Intel Corporation Device [8086:9bc8]
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Oland [Radeon HD 8570 / R7 240/340 OEM] [1002:6611] (rev 87)
kernel-version: 5.6.0-1020-oem

CVE References

Revision history for this message
AaronMa (mapengyu) wrote :

After revert commit in Focal and oem-5.6, this issue can be fixed.

From b1cc82bb294939c76dc6b61146e960bd9f810222 Mon Sep 17 00:00:00 2001
From: Aaron Ma <email address hidden>
Date: Fri, 28 Aug 2020 11:03:46 +0800
Subject: [PATCH] Revert "UBUNTU: SAUCE: vfio -- release device lock before
 userspace requests"

This reverts part of commit 240766c8b029a4445f632f822a744c6ae34c48e3.

Part of changes break stress S3 test on ttm driver with radeon.

Error log:
[TTM] Erroneous page count. Leaking pages.

Signed-off-by: Aaron Ma <email address hidden>
---
 drivers/base/dd.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index b4f9e99f7372..b25bcab2a26b 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -1135,13 +1135,6 @@ static void __device_release_driver(struct device *dev, struct device *parent)
                        dev->bus->remove(dev);
                else if (drv->remove)
                        drv->remove(dev);
- /*
- * A concurrent invocation of the same function might
- * have released the driver successfully while this one
- * was waiting, so check for that.
- */
- if (dev->driver != drv)
- return;

                device_links_driver_cleanup(dev);

--

tags: added: oem-priority originate-from-1890075 sutton
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1893609

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
AaronMa (mapengyu)
description: updated
AaronMa (mapengyu)
summary: - System hang when stress S3 on radeon with TTM
+ SRU: System hang when stress S3 on radeon with TTM
summary: - SRU: System hang when stress S3 on radeon with TTM
+ SRU: Fix system hang when stress S3 on radeon with TTM
AaronMa (mapengyu)
no longer affects: linux (Ubuntu Focal)
AaronMa (mapengyu)
no longer affects: linux (Ubuntu)
Timo Aaltonen (tjaalton)
Changed in linux-oem-5.6 (Ubuntu Focal):
status: New → Fix Committed
Timo Aaltonen (tjaalton)
tags: added: verification-done verification-done-focal
tags: removed: verification-done verification-done-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
AaronMa (mapengyu) wrote :

Verified on 5.6.0-1028-oem.

Good for 300 times of S3.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-5.6 - 5.6.0-1028.28

---------------
linux-oem-5.6 (5.6.0-1028.28) focal; urgency=medium

  * focal/linux-oem-5.6: 5.6.0-1028.28 -proposed tracker (LP: #1894630)

  * Cannot probe sata disk on sata controller behind VMD: ata1.00: failed to
    IDENTIFY (I/O error, err_mask=0x4) (LP: #1894778)
    - SAUCE: PCI: vmd: Add AHCI to fast interrupt list

  * SRU: Fix system hang when stress S3 on radeon with TTM (LP: #1893609)
    - mei: bus: don't clean driver pointer

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts
    - update dkms package versions

  * Introduce the new NVIDIA 450-server and the 450 UDA series (LP: #1887674)
    - [packaging] add signed modules for the 450 nvidia driver

  * CVE-2020-12888
    - vfio/type1: Support faulting PFNMAP vmas
    - vfio-pci: Fault mmaps to enable vma tracking
    - vfio-pci: Invalidate mmaps and block MMIO access on disabled memory

  * Missing id 8086:a0bc for VMD quirk PCI_DEV_FLAGS_ENABLE_ASPM (LP: #1893194)
    - SAUCE: PCI/ASPM: VMD: add ASPM quirk for 8086:a0bc

  * The DP/HDMI audio via USB-C to DP dongle or Dell Zeus adapter can't work
    after suspend (LP: #1893290)
    - ALSA: hda/hdmi: always check pin power status in i915 pin fixup

  * Comet Lake PCH-H RAID not support on Ubuntu20.04 (LP: #1892288)
    - ahci: Add Intel Comet Lake PCH-H PCI ID

  * device doesn't boot with kernel older than v5.7.7 on a usb key: hang at
    efi_tpm_eventlog_init (LP: #1892827)
    - efi/tpm: Verify event log header before parsing

 -- Timo Aaltonen <email address hidden> Tue, 08 Sep 2020 11:40:14 +0300

Changed in linux-oem-5.6 (Ubuntu Focal):
status: Fix Committed → Fix Released
Timo Aaltonen (tjaalton)
Changed in linux-oem-5.6 (Ubuntu):
status: New → Invalid
Changed in hwe-next:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.