[amdgpu] Locked Ubuntu is frozen and cannot be unlocked

Bug #1964711 reported by Philipp
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux-hwe-5.13 (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

After locking Ubuntu and leaving it unattended for a while, the operating system cannot be unlocked anymore. No means of user input (e.g. typing on the keyboard, moving the mouse) produce the lock screen and I have to shut down my computer by pressing down the power button. The attached monitor does not receive a signal from the computer during these attempts.

This started to happen several weeks ago and worked without a problem before then.

Due to the nature of this issue, I can neither report any errors occurring nor can I observe any other out of the ordinary things from happening. Thus, I don’t know which package is causing the issue. Also, this issue happens every time I leave the computer unattended for a longer time and lock it. However, locking it and triggering an input immediately after does produce the lock screen, so the lock screen seems to work in principle.

Description: Ubuntu 20.04.4 LTS
Release: 20.04

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Libera.chat.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1964711/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Revision history for this message
Chris Guiver (guiverc) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better.

You've not given details as to your actual install; was it a Ubuntu Desktop 20.04 LTS or Ubuntu Server 20.04 LTS?

You do mention moving mouse; a device not normally attached to servers, so I'm assume (rightly/wrongly) you're asking about a desktop system - but it's best if these details are provided.

Please execute the following command only once, as it will automatically gather debugging information, in a terminal:

apport-collect 1964711

When reporting bugs in the future please use apport by using 'ubuntu-bug' and the name of the package affected. You can learn more about this functionality at https://wiki.ubuntu.com/ReportingBugs.

(Note: this issue could be related to https://bugs.launchpad.net/bugs/1851992)

affects: ubuntu → gdm3 (Ubuntu)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks for the bug report.

Next time the freeze happens please:

1. Wait 10 seconds.

2. Shut down by pressing the power button.

3. Power on and then run:

     journalctl -b-1 > prevboot.txt

4. Attach the resulting text file here.

5. Look in /var/crash for crash files and if found run:

     ubuntu-bug YOURFILE.crash

   Then tell us the ID of the newly-created bug.

6. If step 5 failed then look at https://errors.ubuntu.com/user/ID where ID is the content of file /var/lib/whoopsie/whoopsie-id on the machine. Do you find any links to recent problems on that page? If so then please send the links to us.

tags: added: focal
Changed in gdm3 (Ubuntu):
status: New → Incomplete
affects: gdm3 (Ubuntu) → ubuntu
Revision history for this message
Philipp (phrudloff) wrote :

Around 8 this morning (that’s an hour ago from me writing this), I started my computer and logged in. I then locked the computer without starting any programs manually. After returning now (an hour later), the same issue can be observed. I shut down the computer manually (by pressing down the power button), started the computer again and produced the prevboot.txt file as instructed.

The /var/crash directory has no files for this time frame.

Revision history for this message
Daniel van Vugt (vanvugt) wrote (last edit ):

Thanks. It *looks* like the amdgpu kernel driver didn't survive sleeping/resuming and then nothing could use the GPU anymore:

Mär 17 08:00:57 MOONBASE kernel: [drm] free PSP TMR buffer
Mär 17 08:01:02 MOONBASE kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000000000).
Mär 17 08:01:02 MOONBASE kernel: [drm] PSP is resuming...
Mär 17 08:01:02 MOONBASE kernel: [drm] reserve 0xa00000 from 0x800f400000 for PSP TMR
Mär 17 08:01:02 MOONBASE kernel: amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
Mär 17 08:01:02 MOONBASE kernel: amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
Mär 17 08:01:02 MOONBASE kernel: amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Mär 17 08:01:02 MOONBASE kernel: amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
Mär 17 08:01:05 MOONBASE gnome-shell[2532]: amdgpu: The CS has been rejected, see dmesg for more information (-62).
Mär 17 08:01:05 MOONBASE gnome-shell[2532]: amdgpu: The CS has been rejected, see dmesg for more information (-22).
Mär 17 08:01:05 MOONBASE kernel: amdgpu 0000:03:00.0: amdgpu: RunBtc failed!
Mär 17 08:01:05 MOONBASE kernel: amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
Mär 17 08:01:05 MOONBASE kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
Mär 17 08:01:05 MOONBASE kernel: amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
Mär 17 08:01:06 MOONBASE tracker-store[6301]: OK
Mär 17 08:01:06 MOONBASE systemd[2205]: tracker-store.service: Succeeded.
Mär 17 08:01:06 MOONBASE /usr/lib/gdm3/gdm-x-session[2319]: amdgpu: The CS has been rejected, see dmesg for more information (-22).
Mär 17 08:02:00 MOONBASE gnome-shell[2532]: amdgpu: The CS has been rejected, see dmesg for more information (-22).
Mär 17 08:02:01 MOONBASE /usr/lib/gdm3/gdm-x-session[2319]: amdgpu: The CS has been rejected, see dmesg for more information (-22).

So I suggest this is a kernel bug in the 'amdgpu' driver. Your best options might be to try:

 * Ubuntu 22.04 (preview) http://cdimage.ubuntu.com/daily-live/current/ or
 * Other kernels: https://kernel.ubuntu.com/~kernel-ppa/mainline/?C=M;O=D

summary: - Locked Ubuntu is frozen and cannot be unlocked
+ [amdgpu] Locked Ubuntu is frozen and cannot be unlocked
tags: added: amdgpu
affects: ubuntu → linux-hwe-5.13 (Ubuntu)
Changed in linux-hwe-5.13 (Ubuntu):
status: Incomplete → New
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-hwe-5.13 (Ubuntu):
status: New → Confirmed
Revision history for this message
Philipp (phrudloff) wrote :

I’ve done the following things now:

Unsuccessful:

Upgrade from Ubuntu 20.04 to 21.04. This did not solve the issue.

Successful:

Manually uninstall and install the latest amdgpu package from https://www.amd.com/en/support/kb/release-notes/rn-amdgpu-unified-linux-21-20. Specifically, I downloaded amdgpu-pro-21.20-1271047-ubuntu-20.04.tar.xz and installed it via the amdgpu-install binary. This seems to have solved the issue. I reckon there is a good chance that it was rather uninstalling and reinstalling the driver than installing a specific version of it, but that I can’t tell with certainty.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks, that's unsurprising but also inconclusive. I think the goal is for us to use the built-in amdgpu kernel driver in future.

If you have time then please try:
 * Ubuntu 22.04 (preview) http://cdimage.ubuntu.com/daily-live/current/ or
 * Newer kernels: https://kernel.ubuntu.com/~kernel-ppa/mainline/?C=M;O=D

Changed in linux-hwe-5.13 (Ubuntu):
status: Confirmed → Incomplete
tags: added: resume suspend-resume
Revision history for this message
Philipp (phrudloff) wrote (last edit ):

After the things I did in https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.13/+bug/1964711/comments/7, I am now facing a worse problem.

My computer no longer recognizes my display correctly. Since the boot screen, the resolution is the minimum available (1024x768). I think my system no longer selects the correct driver for my graphics card.

Running "sudo lshw -c video" yields the following:

```
  *-display UNCLAIMED
       description: VGA compatible controller
       product: Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
       vendor: Advanced Micro Devices, Inc. [AMD/ATI]
       physical id: 0
       bus info: pci@0000:03:00.0
       version: ca
       width: 64 bits
       clock: 33MHz
       capabilities: pm pciexpress msi vga_controller bus_master cap_list
       configuration: latency=0
       resources: memory:40000000-4fffffff memory:50000000-501fffff ioport:3000(size=256) memory:50300000-5037ffff memory:c0000-dffff
```

Running "lsmod | grep amd" prints nothing (it used to before) and "dkms status amdgpu" also prints nothing. I tried uninstalling the drivers I installed and also tried installing drivers from different source. I tried to look for issues in my BIOS settings. All to no avail. If I don’t find a solution to this, I will resort to completely re-installing Ubuntu 20.04. I have luckily completed a full backup before.

**Update**:

Running "sudo modprobe amdgpu" seems to load the amdgpu driver correctly. Afterwards, I can log-in while the display being correctly connected and using the right resolution. I then proceeded to check the "/etc/modprobe.d" directory and found that there was a "/etc/modprobe.d/blacklist-amdgpu.conf" file. I moved it out of that directory and restarted. While my boot screen for my encrypted disk now still uses the lower resolution, at least my login screen is back to the before state.

I’d love to know what happened here and whether I can get the correct driver to be loaded on boot, too.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

blacklist-amdgpu.conf doesn't appear to be installed by default in Ubuntu. My best guess is that it might have come from installing a proprietary driver package from AMD at some point?

Revision history for this message
Philipp (phrudloff) wrote :

Yes, I believe having installed a proprietary driver could’ve caused this. I didn’t realize this at the time of installing the driver, but I think the file name "amdgpu-pro-21.20-1271047-ubuntu-20.04" clearly indicates that this is in fact AMD’s proprietary driver. Curiously, the issue didn’t happen immediately after installing this driver and rebooting. I used the computer for at least two sessions after this and everything was seemingly fine.

In any case, things are mostly in order now. I have yet to test if my initially reported issue of locking the computer crashing the driver is still resolved. I will get back with an updated report on this.

Revision history for this message
Philipp (phrudloff) wrote :

I have now tested whether the issue is still resolved and it luckily is. So both after installing a version of amdgpu-pro and also after uninstalling this proprietary driver again, unlocking the system works again.

Once the stable version of Ubuntu 22.04 is published, I will upgrade and try again. At this point, I don’t want to try a pre-release version to avoid having my only personal computer become inoperable.

Also, I have yet to find a way to restore the resolution during the boot process. Specifically, the default resolution is used during the cryptsetup decryption stage and before (when only the mainboard manufacturer logo is shown). The BIOS configuration seems to be in order and is set to use the primary PCIe slot to which my monitor is connected. It’s not terrible that this is so, but I’d like to fix it. My internet searches have been unsuccessful though.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux-hwe-5.13 (Ubuntu) because there has been no activity for 60 days.]

Changed in linux-hwe-5.13 (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.