System crash on resume from sleep

Bug #2065838 reported by Darin Miller
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Operating System: Ubuntu 12
KDE Plasma Version: 5.27.11
KDE Frameworks Version: 5.115.0
Qt Version: 5.15.13
Kernel Version: 6.8.0-31-generic (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 5800X 8-Core Processor
Memory: 31.2 GiB of RAM
Graphics Processor: AMD Radeon RX 6750 XT
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: X570S AORUS ELITE AX
System Version: -CF

1.
No LSB modules are available.
Description: Ubuntu 24.04 LTS
Release: 24.04

2. Initiate sleep mode and wake computer immediately
3. Expect resume to desktop
4. System crashes to a power off state. Power button press required to initiate a fresh boot process.

My system resumes from sleep successfully with:
 - 6.5 ubuntu kernel
 - 6.6 Xanmod kernel

But fails to resume with:
 - ubutnu linux-image-6.8.0-31-generic/noble
 - any 6.8 or 6.9 Xmodod kernel

I swapped AMD to NVidia graphic card, but still same issue.

ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: linux-image-6.8.0-31-generic 6.8.0-31.31
ProcVersionSignature: Ubuntu 6.8.0-31.31-generic 6.8.1
Uname: Linux 6.8.0-31-generic x86_64
ApportVersion: 2.28.1-0ubuntu3
Architecture: amd64
CasperMD5CheckResult: pass
CurrentDesktop: KDE
Date: Wed May 15 20:32:57 2024
InstallationDate: Installed on 2022-12-15 (518 days ago)
InstallationMedia: Kubuntu 22.10 "Kinetic Kudu" - Release amd64 (20221020)
MachineType: Gigabyte Technology Co., Ltd. X570S AORUS ELITE AX
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.8.0-31-generic root=UUID=2893bf6a-4841-4926-9ed9-a83a1e0739ea ro amdgpu.ppfeaturemask=0xffffffff vt.handoff=7
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-6.8.0-31-generic N/A
 linux-backports-modules-6.8.0-31-generic N/A
 linux-firmware 20240318.git3b128b60-0ubuntu2
SourcePackage: linux
UpgradeStatus: Upgraded to noble on 2024-04-27 (19 days ago)
dmi.bios.date: 08/09/2023
dmi.bios.release: 5.17
dmi.bios.vendor: American Megatrends International, LLC.
dmi.bios.version: F7b
dmi.board.asset.tag: Default string
dmi.board.name: X570S AORUS ELITE AX
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInternational,LLC.:bvrF7b:bd08/09/2023:br5.17:svnGigabyteTechnologyCo.,Ltd.:pnX570SAORUSELITEAX:pvr-CF:rvnGigabyteTechnologyCo.,Ltd.:rnX570SAORUSELITEAX:rvrx.x:cvnDefaultstring:ct3:cvrDefaultstring:skuDefaultstring:
dmi.product.family: X570 MB
dmi.product.name: X570S AORUS ELITE AX
dmi.product.sku: Default string
dmi.product.version: -CF
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

Revision history for this message
Darin Miller (darinmiller) wrote :
Revision history for this message
Jake (ubuntujake) wrote (last edit ):
Download full text (11.5 KiB)

I am having the exact same issue.

OS: Ubuntu 24.04 LTS
Kernel: 6.8.0-31-generic #31-Ubuntu SMP PREEMPT_DYNAMIC
CPU: 12th Gen Intel(R) Core(TM) i7-12700H
-------------------------------------
jake@CodeBox:~/temp$ cat lshw.txt
H/W path Device Class Description
==================================================================
                                        system 21DCCTO1WW (LENOVO_MT_21DC_BU_Think_FM_ThinkPad P1 Gen 5)
/0 bus 21DCCTO1WW
/0/1 memory 64GiB System Memory
/0/1/0 memory 32GiB SODIMM Synchronous 4800 MHz (0.2 ns)
/0/1/1 memory 32GiB SODIMM Synchronous 4800 MHz (0.2 ns)
/0/c memory 288KiB L1 cache
/0/d memory 192KiB L1 cache
/0/e memory 7680KiB L2 cache
/0/f memory 24MiB L3 cache
/0/10 memory 256KiB L1 cache
/0/11 memory 512KiB L1 cache
/0/12 memory 4MiB L2 cache
/0/13 memory 24MiB L3 cache
/0/14 processor 12th Gen Intel(R) Core(TM) i7-12700H
/0/15 memory 128KiB BIOS
/0/100 bridge 12th Gen Core Processor Host Bridge/DRAM Registers
/0/100/1 bridge 12th Gen Core Processor PCI Express x16 Controller #1
/0/100/1/0 /dev/fb0 display GA107GLM [RTX A1000 Laptop GPU]
/0/100/1/0.1 card0 multimedia NVIDIA Corporation
/0/100/1/0.1/0 input13 input HDA NVidia HDMI/DP,pcm=3
/0/100/1/0.1/1 input14 input HDA NVidia HDMI/DP,pcm=7
/0/100/1/0.1/2 input15 input HDA NVidia HDMI/DP,pcm=8
/0/100/1/0.1/3 input16 input HDA NVidia HDMI/DP,pcm=9
/0/100/2 /dev/fb0 display Alder Lake-P GT2 [Iris Xe Graphics]
/0/100/4 generic Alder Lake Innovation Platform Framework Processor Participant
/0/100/6 bridge 12th Gen Core Processor PCI Express x4 Controller #0
/0/100/6/0 /dev/nvme0 storage WD PC SN810 SDCQNRY-1T00-1001
/0/100/6/0/0 hwmon3 disk NVMe disk
/0/100/6/0/2 /dev/ng0n1 disk NVMe disk
/0/100/6/0/1 /dev/nvme0n1 disk 1024GB NVMe disk
/0/100/6/0/1/1 /dev/nvme0n1p1 volume 99MiB Windows FAT volume
/0/100/6/0/1/2 /dev/nvme0n1p2 volume 15MiB reserved partition
/0/100/6/0/1/3 /dev/nvme0n1p3 volume 464GiB Windows NTFS volume
/0/100/6/0/1/4 /dev/nvme0n1p4 volume 624MiB Windows NTFS volume
/0/100/6/0/1/5 /dev/nvme0n1p5 ...

Revision history for this message
Mario Limonciello (superm1) wrote :

As a random guess; could this be the same as https://bugzilla.kernel.org/show_bug.cgi?id=218849?

Try reverting d410ee5109a1 ("ACPICA: avoid "Info: mapping multiple BARs. Your kernel is fine."")

Revision history for this message
Darin Miller (darinmiller) wrote :

My bug symptoms are different as my PC successfully reaches a suspend state. After additional testing, the PC will successfully resume about 50% of the time with the 6.8.0-31-generic kernel. Another side note, when the PC sleep is initiated by the user, PC resume seems to fail more consistently. PC sleep induced by the system power management timeout (almost?) always resumes successfully. This resume observation could be purely coincidental however.

Reverting d410ee5109a1 is beyond my skillset without explicit instructions.

Revision history for this message
Mario Limonciello (superm1) wrote :

Yeah I know they're different symptoms but the reason for that revert might have a similar root cause.
I'm saying this because I've got a different system that fails to boot up that reverting that helps.

In terms of specific instructions, I'd start with this:

https://itsfoss.com/compile-linux-kernel/

Once you can get that compiling on your own I can help you with a applying a revert patch to see if it helps.

Revision history for this message
Darin Miller (darinmiller) wrote :

OK, kernel successfully compiled sans reverting the commit. Googling kernel patching, I found these commands to revert based on git repo:

git checkout d410ee5109a1
git checkout d410ee5109a1

However, since I downloaded the file from here, https://mirrors.edge.kernel.org/pub/linux/kernel/v6.x/, I don't know how to initialize the my download as a get repo.

Please advise.

Revision history for this message
Darin Miller (darinmiller) wrote :

Correction. To revert, the command should be:

git revert d410ee5109a1

Revision history for this message
Mario Limonciello (superm1) wrote :

Yes since you didn't clone using git you can't use git revert.

Once you can successfully build and test that kernel I'll post you a revert patch' with explanation how to use it.

Revision history for this message
Darin Miller (darinmiller) wrote :

Kernel compiled and I am familiar with installing my own kernel. I just have never applied or reverted a patch.

Please post the reversion and respective instructions.

Revision history for this message
Mario Limonciello (superm1) wrote :

Save the below as a patch file and then apply using "patch -p1 < FILE". Build your kernel and see if it has helped.

diff --git a/drivers/acpi/acpica/exregion.c b/drivers/acpi/acpica/exregion.c
index 8907b8bf4267..ca060ec6936e 100644
--- a/drivers/acpi/acpica/exregion.c
+++ b/drivers/acpi/acpica/exregion.c
@@ -44,7 +44,6 @@ acpi_ex_system_memory_space_handler(u32 function,
  struct acpi_mem_mapping *mm = mem_info->cur_mm;
  u32 length;
  acpi_size map_length;
- acpi_size page_boundary_map_length;
 #ifdef ACPI_MISALIGNMENT_NOT_SUPPORTED
  u32 remainder;
 #endif
@@ -138,25 +137,8 @@ acpi_ex_system_memory_space_handler(u32 function,
   map_length = (acpi_size)
       ((mem_info->address + mem_info->length) - address);

- /*
- * If mapping the entire remaining portion of the region will cross
- * a page boundary, just map up to the page boundary, do not cross.
- * On some systems, crossing a page boundary while mapping regions
- * can cause warnings if the pages have different attributes
- * due to resource management.
- *
- * This has the added benefit of constraining a single mapping to
- * one page, which is similar to the original code that used a 4k
- * maximum window.
- */
- page_boundary_map_length = (acpi_size)
- (ACPI_ROUND_UP(address, ACPI_DEFAULT_PAGE_SIZE) - address);
- if (page_boundary_map_length == 0) {
- page_boundary_map_length = ACPI_DEFAULT_PAGE_SIZE;
- }
-
- if (map_length > page_boundary_map_length) {
- map_length = page_boundary_map_length;
+ if (map_length > ACPI_DEFAULT_PAGE_SIZE) {
+ map_length = ACPI_DEFAULT_PAGE_SIZE;
   }

   /* Create a new mapping starting at the address given */
--
2.34.1

Revision history for this message
Darin Miller (darinmiller) wrote :

I have tried building an unpatched kernel a couple different ways including following the direction from the recommended link (https://itsfoss.com/compile-linux-kernel/) but failed to produce a bootable kernel. The kernel appeared to build correctly as I did not notice any compile errors, but once installed, it failed to boot. I am unsure how/where to check the build logs (I still have the failed session so I can review if needed.)

Following my notes from previous successful kernel builds (to fix sound issues on my Lenovo laptop):

*Ubuntu version:

1) install kernel build tools:
(https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel):

* sudo apt install libncurses-dev gawk flex bison openssl libssl-dev dkms libelf-dev libudev-dev libpci-dev libiberty-dev autoconf git

2) clone the kernel from github and checkout (Rather large, multiple GB's):

* git clone https://github.com/torvalds/linux.git
* cd linux
* git checkout v6.xx

3) load current kernel config and change configuration then run the following scripts/config commands:

* make olddefconfig

4) build the kernel

* make -j 16

But both the v6.8 and v6.9 kernel fail after ~10 to 15min of compiling with this same error:

   make[1]: *** [/home/darin/kernel/linux/Makefile:1919: .] Error 2
   make: *** [Makefile:240: __sub-make] Error 2

I inspected the MakeFile at lines 240 and 1919 but failed to see anything obvious to edit or delete.

Not sure how to proceed from here. I am more than happy to keep trying, but I don't want to waste anyone's time with painful remote troubleshooting.

A side note, I did try patching just for the experience, but the provide patch did not work with either the 6.8 or 6.9 kernel version of exregion.c. However, the patch is clear enough to manually perform the edit and I could manually find the lines that required updatng.

Revision history for this message
Mario Limonciello (superm1) wrote :

Do you have secure boot enabled? if so, turn it off and hopefully the kernel you built should be bootadble.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.