(Nvidia) system freezes when called to suspend since Linux 6.7.0 on Nvidia hardware with modeset

Bug #2078553 reported by Jani Uusitalo
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
New
Medium
Vinicius Peixoto

Bug Description

== Summary ==
On my computer with an ancient Nvidia chipset (Geforce 7025/nForce 630a), running `sysctl suspend` (or suspending from the Gnome menu) causes the system to start suspending, but it freezes halfway, leaving fans and hard drives spinning. There's no way to resume from this frozen state apart from forcing a reboot (with a hardware reset button/poweroff).

== Steps to reproduce ==
* boot with modeset enabled
* run `sysctl suspend`

== What I expect to happen ==
For the system to suspend, shutting down all fans and hard drives.

== What happens ==
The system begins to suspend, but freezes halfway, leaving the display on and fans and hard drives spinning, but the keyboard unresponsive.

== Workaround ==
Disable kernel modesetting by adding "nomodeset" to the kernel commandline.

== Affected kernels ==
Prior to upgrades the system was running HWE kernel 5.15.0, so I tried the 5.15 series, and found that I could now suspend and wake the machine again just as before. I worked my way up the versions:

* 5.15.50: unaffected
* 5.15.165: unaffected
* 5.19.17: unaffected
* 6.4.0: unaffected
* 6.6.0: unaffected
* 6.6.48: unaffected
* 6.7.0: first to fail

I also tried the current newest mainline kernel 6.10.7, and the issue is still present there.

== Background ==
I have an old desktop machine now functioning as a NAS, and yesterday I upgraded it from Ubuntu 20.04 first to 22.04, and then all the way up to 24.04. The upgrade went smoothly, and this is the only issue I've come across since.

In the BIOS settings of the affected machine there are three "suspend mode" alternatives to choose from: "S1 (POS) only", "S3 only" and "Auto". I've always had it on "Auto", but with this issue I also tested both "S1 only" and "S3 only", with no effect.

The issue is also present when booting from the installation media (USB) into a live environment.

I've previously upgraded my laptop to 24.04, and there suspending still works as it did before the ugprade, so this is probably hardware-specific; the laptop is a modern one with all-Intel hardware.

Googling around, I could smelled hints of this being once again related to the troublesome Nvidia chipset, so I tried nomodeset with the stock 6.8.0 kernel (6.8.0-41 currently) and voilà! Suspend and wake were working again.

Well, except for the display, which stayed black. But I couldn't say if this was the way it was before, because the NAS is normally running headless.

ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: linux-image-6.8.0-41-generic 6.8.0-41.41
ProcVersionSignature: Ubuntu 6.8.0-41.41-generic 6.8.12
Uname: Linux 6.8.0-41-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version k6.8.0-41-generic.
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.28.1-0ubuntu3.1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/timer', '/dev/snd/seq', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D2c', '/dev/snd/hwC0D0', '/dev/snd/controlC0', '/dev/snd/by-path'] failed with exit code 1:
CRDA: N/A
Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer'
Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer'
CasperMD5CheckResult: unknown
Date: Sat Aug 31 14:16:51 2024
HibernationDevice: RESUME=UUID=2faf9ef0-28e5-490f-8c62-376d78bf29b3
MachineType: System manufacturer System Product Name
ProcFB: 0 simpledrmdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.8.0-41-generic root=UUID=f2668afb-32fb-4af1-9f29-5b4665b43cf3 ro consoleblank=0 nomodeset
RelatedPackageVersions:
 linux-restricted-modules-6.8.0-41-generic N/A
 linux-backports-modules-6.8.0-41-generic N/A
 linux-firmware 20240318.git3b128b60-0ubuntu2.2
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: no
  Hard blocked: no
SourcePackage: linux
UpgradeStatus: Upgraded to noble on 2024-08-30 (1 days ago)
dmi.bios.date: 08/23/2010
dmi.bios.release: 8.14
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1804
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: M2N68-AM Plus
dmi.board.vendor: ASUSTeK Computer INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1804:bd08/23/2010:br8.14:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKComputerINC.:rnM2N68-AMPlus:rvrRevX.0x:cvnChassisManufacture:ct3:cvrChassisVersion:skuToBeFilledByO.E.M.:
dmi.product.family: To Be Filled By O.E.M.
dmi.product.name: System Product Name
dmi.product.sku: To Be Filled By O.E.M.
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer
mtime.conffile..etc.init.d.apport: 2024-07-22T17:59:07

Revision history for this message
Jani Uusitalo (uusijani) wrote :
Changed in linux (Ubuntu):
assignee: nobody → Vinicius Peixoto (vpeixoto)
Revision history for this message
Vinicius Peixoto (vpeixoto) wrote :

Hi Jani,

Thanks for taking the time to submit the bug report. I suspect this is happening due to a kernel panic in the nouveau driver (which I can't reproduce on my Nvidia GPUs). Can you please try booting without `nomodeset` to reproduce the issue when suspending, and then reboot (with `nomodeset`) in order to collect the logs with

    journalctl --boot=-1 > log.txt

However, I suspect the panic log might not make it to the disk in time (since you're halfway through the suspend process), so it would be great if we had access to a serial console. Do you happen to have a serial port PCIe card, or an RS232-USB adapter, by any chance?

Also, I noticed you tried booting 6.10.7 (latest upstream stable, I'm assuming), can you also try 6.11-rc6 [1][2], just in case?

Thanks,
Vinicius

[1] https://wiki.ubuntu.com/Kernel/MainlineBuilds
[2] https://kernel.ubuntu.com/mainline/?C=N;O=D

Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Jani Uusitalo (uusijani) wrote :

Hi Vinicius, thanks for looking into this!

I'll have to postpone further testing until the next time there's an issue, as my backups rely on the NAS (and it's now back in operation with the nomodeset workaround). Taking it down is also physically fairly involved, as I prefer to unplug all the data drives, to spare them from repeated powercycling during testing.

However, I'm about 80 % sure I checked the journal for the failed suspends, and it just stopped there, resuming thereafter with just the boot messages from the next boot onward.

Also, I have TTY1 configured to display the journal "live" (I'll attach my /<email address hidden>/override.conf below), and I *did* try suspending with that TTY showing the journal (`sleep 5 && systemctl suspend` on TTY2, then switching to TTY1 before the timeout); I'll attach a photo of that view when the suspend had resulted in this freezing issue.

Unfortunately I don't have a cable to do debugging over a serial console.

Revision history for this message
Jani Uusitalo (uusijani) wrote :
Revision history for this message
Vinicius Peixoto (vpeixoto) wrote :

Hi Jani, thanks for the update. Glad to hear that the `nomodeset` workaround is working for you. Please let me know if you run into other problems and/or are able to run tests again, as we will need to bisect the cause of this issue on your hardware (since I couldn't reproduce it on any of my NVIDIA cards, and yours is a little bit tricky to find nowadays).

Juerg Haefliger (juergh)
tags: added: kernel-daily-bug
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.