Screen does not wake up from suspend/does not wake up

Bug #1953674 reported by Ciro Santilli 六四事件 法轮功
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-470 (Ubuntu)
Confirmed
Undecided
Unassigned
nvidia-graphics-drivers-510 (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

After upgrading to 21.10 from 21.04 this problem started happening.

It does not happen every time I suspend, and it usually happens after suspending for a longer time. After suspending for a longer time, it tends to happen most of the time.

I can SSH into the computer, so it is just the display that is not turning back on correctly.

Lenovo Thinkpad P51 with an NVIDIA quadro Quadro M1200

journalctl -o short-precise -k -b -1 contained the following messages of interest, which I could not find in other reports:

```
nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
[24307.640014] NVRM: GPU at PCI:0000:01:00: GPU-18af74bb-7c72-ff70-e447-87d48378ea20
[24307.640018] NVRM: Xid (PCI:0000:01:00): 79, pid=8828, GPU has fallen off the bus.
[24307.640021] NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
[24328.054022] nvidia-modeset: ERROR: GPU:0: The requested configuration of display devices (LGD (DP-4)) is not supported on this GPU.
[repeats several more times]
[24328.056767] nvidia-modeset: ERROR: GPU:0: The requested configuration of display devices (LGD (DP-4)) is not supported on this GPU.
[24328.056951] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:0:0:0x0000000f
[24328.056955] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:1:0:0x0000000f
[24328.056959] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:2:0:0x0000000f
[24328.056962] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000927c:3:0:0x0000000f
[24328.056983] nvidia-modeset: ERROR: GPU:0: DP-4: Failed to disable DisplayPort audio stream-0
[24328.056992] nvidia-modeset: ERROR: GPU:0: Failed to query display engine channel state: 0x0000947d:0:0:0x0000000f
```

I also noticed a

sudo cat /var/crash/_usr_sbin_gdm3.0.uploaded

with:

5bed8a34-5852-11ec-b48c-fa163e102db1

Revision history for this message
Ciro Santilli 六四事件 法轮功 (cirosantilli) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvidia-graphics-drivers-470 (Ubuntu):
status: New → Confirmed
Revision history for this message
Ciro Santilli 六四事件 法轮功 (cirosantilli) wrote :

Repro on 510. Stack traces that happens some times:

Feb 09 06:30:47.610121 ciro-p51 kernel: WARNING: CPU: 0 PID: 18016 at /var/lib/dkms/nvidia/510.47.03/build/nvidia/nv.c:3935 nv_restore_user_channels+0xce/0xe0 [nvidia]

Feb 09 06:30:47.610508 ciro-p51 kernel: Call Trace:
Feb 09 06:30:47.610527 ciro-p51 kernel: <TASK>
Feb 09 06:30:47.610547 ciro-p51 kernel: nv_set_system_power_state+0x22b/0x3e0 [nvidia]
Feb 09 06:30:47.610566 ciro-p51 kernel: nv_procfs_write_suspend+0xe9/0x140 [nvidia]
Feb 09 06:30:47.610598 ciro-p51 kernel: proc_reg_write+0x5a/0x90
Feb 09 06:30:47.610619 ciro-p51 kernel: ? __cond_resched+0x1a/0x50
Feb 09 06:30:47.610637 ciro-p51 kernel: vfs_write+0xc3/0x250
Feb 09 06:30:47.610658 ciro-p51 kernel: ksys_write+0x67/0xe0
Feb 09 06:30:47.610675 ciro-p51 kernel: __x64_sys_write+0x19/0x20
Feb 09 06:30:47.610693 ciro-p51 kernel: do_syscall_64+0x61/0xb0
Feb 09 06:30:47.610709 ciro-p51 kernel: ? exit_to_user_mode_prepare+0x37/0xb0
Feb 09 06:30:47.610728 ciro-p51 kernel: ? syscall_exit_to_user_mode+0x27/0x50
Feb 09 06:30:47.610746 ciro-p51 kernel: ? __x64_sys_newfstatat+0x1c/0x20
Feb 09 06:30:47.610765 ciro-p51 kernel: ? do_syscall_64+0x6e/0xb0
Feb 09 06:30:47.610796 ciro-p51 kernel: ? syscall_exit_to_user_mode+0x27/0x50
Feb 09 06:30:47.610813 ciro-p51 kernel: ? do_syscall_64+0x6e/0xb0
Feb 09 06:30:47.610842 ciro-p51 kernel: ? asm_exc_page_fault+0x8/0x30
Feb 09 06:30:47.610860 ciro-p51 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae

Feb 09 06:30:47.611093 ciro-p51 kernel: WARNING: CPU: 0 PID: 18016 at /var/lib/dkms/nvidia/510.47.03/build/nvidia/nv.c:4152 nv_set_system_power_state+0x2d0/0x3e0 [nvidia]

Feb 09 06:30:47.611469 ciro-p51 kernel: nv_procfs_write_suspend+0xe9/0x140 [nvidia]
Feb 09 06:30:47.611484 ciro-p51 kernel: proc_reg_write+0x5a/0x90
Feb 09 06:30:47.611499 ciro-p51 kernel: ? __cond_resched+0x1a/0x50
Feb 09 06:30:47.611517 ciro-p51 kernel: vfs_write+0xc3/0x250
Feb 09 06:30:47.611533 ciro-p51 kernel: ksys_write+0x67/0xe0
Feb 09 06:30:47.611550 ciro-p51 kernel: __x64_sys_write+0x19/0x20
Feb 09 06:30:47.611568 ciro-p51 kernel: do_syscall_64+0x61/0xb0
Feb 09 06:30:47.611597 ciro-p51 kernel: ? exit_to_user_mode_prepare+0x37/0xb0
Feb 09 06:30:47.611615 ciro-p51 kernel: ? syscall_exit_to_user_mode+0x27/0x50
Feb 09 06:30:47.611632 ciro-p51 kernel: ? __x64_sys_newfstatat+0x1c/0x20
Feb 09 06:30:47.611647 ciro-p51 kernel: ? do_syscall_64+0x6e/0xb0
Feb 09 06:30:47.611665 ciro-p51 kernel: ? syscall_exit_to_user_mode+0x27/0x50
Feb 09 06:30:47.611680 ciro-p51 kernel: ? do_syscall_64+0x6e/0xb0
Feb 09 06:30:47.611698 ciro-p51 kernel: ? asm_exc_page_fault+0x8/0x30
Feb 09 06:30:47.611715 ciro-p51 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae

Revision history for this message
Ciro Santilli 六四事件 法轮功 (cirosantilli) wrote :

I should also mention, I'm seeing a bunch of non-nvidia ACPI errors of type:

Feb 09 06:58:43.863108 ciro-p51 kernel: ACPI Warning: \_SB.PCI0.PEG0.PEGP._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20210331/nsarguments-61)
Feb 09 06:58:44.131354 ciro-p51 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.LPCB.EC.HKEY.DEVT.PEGS], AE_NOT_FOUND (20210331/psargs-330)
Feb 09 06:58:44.131544 ciro-p51 kernel:
Feb 09 06:58:44.131575 ciro-p51 kernel: No Local Variables are initialized for Method [DEVT]
Feb 09 06:58:44.131609 ciro-p51 kernel:
Feb 09 06:58:44.131631 ciro-p51 kernel: Initialized Arguments for Method [DEVT]: (1 arguments defined for method invocation)
Feb 09 06:58:44.131663 ciro-p51 kernel: Arg0: 00000000e4c8db84 <Obj> Integer 00000000000000D3
Feb 09 06:58:44.131689 ciro-p51 kernel:
Feb 09 06:58:44.131715 ciro-p51 kernel: ACPI Error: Aborting method \_SB.PCI0.LPCB.EC.HKEY.DEVT due to previous error (AE_NOT_FOUND) (20210331/psparse-529)
Feb 09 06:58:44.131739 ciro-p51 kernel: ACPI Error: Aborting method \_SB.PCI0.LPCB.EC._Q4F due to previous error (AE_NOT_FOUND) (20210331/psparse-529)
Feb 09 06:58:44.335085 ciro-p51 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.LPCB.EC.HKEY.DEVT.PEGS], AE_NOT_FOUND (20210331/psargs-330)
Feb 09 06:58:44.335230 ciro-p51 kernel:
Feb 09 06:58:44.335459 ciro-p51 kernel: No Local Variables are initialized for Method [DEVT]

that could be potential root cause. Gonna try to disable nvidia now and see.

Revision history for this message
Ciro Santilli 六四事件 法轮功 (cirosantilli) wrote :

Dumping a bit more debug log before I forget:

I disabled NVIDIA from the software & updates. If I try to suspend:

- it immediately wakes up into lock screen
- I can't login anymore: after password, goes into black kernel log screen as in boot

That was in Wayland, the new default. So I blacklist wayland, reboot to X11, and now when trying to suspend:

- it immediately wakes up into lock screen
- but I can login

Possible threads:

- https://askubuntu.com/questions/1151709/suspend-not-working-in-ubuntu-18-04-and-19-04
- https://askubuntu.com/questions/1133919/ubuntu-18-04-2-immediately-wakes-up-from-suspend

Notably https://askubuntu.com/a/1363662/52975 says CUDA might still be a problem! So I now purged Everything from NVIDIA, I noticed that I had nvidia-opencl-dev:amd64 installed ii, and the graphics stuff installed rc, and immediately after purge suspend works again on X11!

Revision history for this message
Ciro Santilli 六四事件 法轮功 (cirosantilli) wrote :

Then, re-enabled Wayland with NVIDIA purged, and suspend works there.

Then, re-installed and reenabled NVIDIA 510 (which blakclists Wayland and puts me in X11 since no Wayland support yet), and boom problem is back exactly as before, same nv_* traces and ACPI errors (I didn't see the ACPI errors when I had disabled NVIDIA, so it looks like it is causing those as well).

So 100% certain now that the bug is in NVIDIA drivers.

Revision history for this message
Ciro Santilli 六四事件 法轮功 (cirosantilli) wrote :

OK, now noticed that my error lines are exactly the same as: https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-470/+bug/1946303 it's just I wasn't getting the logs consistently before I had updated my firmware. Let's close as dupe.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvidia-graphics-drivers-510 (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.