[nvidia] [amdgpu] Xorg randomly freezes

Bug #1906449 reported by Kuroš Taheri-Golværzi
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-440 (Ubuntu)
New
Undecided
Unassigned
xfce4 (Ubuntu)
New
Undecided
Unassigned
xorg-server (Ubuntu)
New
Undecided
Unassigned
xserver-xorg-video-amdgpu (Ubuntu)
New
Undecided
Unassigned

Bug Description

Initially, I thought it only happened during file transfers (because that's when it happens most often), but I later found that it can happen at any time, because I just turned my computer on, went to go make a coffee, have a cigarette, and when I came back, the entire screen, desktop, system, etc had frozen again. I always need to hold the power button down to do an emergency restart, because I can't even use Alt+F# to get into the TTY.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: xorg 1:7.7+19ubuntu14
ProcVersionSignature: Ubuntu 5.4.0-56.62-generic 5.4.73
Uname: Linux 5.4.0-56-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
.proc.driver.nvidia.gpus.0000.01.00.0: Error: [Errno 21] Is a directory: '/proc/driver/nvidia/gpus/0000:01:00.0'
.proc.driver.nvidia.registry: Binary: ""
.proc.driver.nvidia.suspend: suspend hibernate resume
.proc.driver.nvidia.suspend_depth: default modeset uvm
.proc.driver.nvidia.version:
 NVRM version: NVIDIA UNIX x86_64 Kernel Module 440.95.01 Thu May 28 07:03:08 UTC 2020
 GCC version: gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)
ApportVersion: 2.20.11-0ubuntu27.13
Architecture: amd64
BootLog: Error: [Errno 13] Permission denied: '/var/log/boot.log'
CasperMD5CheckResult: skip
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
CompositorRunning: None
CurrentDesktop: XFCE
Date: Tue Dec 1 12:28:44 2020
DistUpgraded: Fresh install
DistroCodename: focal
DistroVariant: ubuntu
DkmsStatus: nvidia, 440.95.01, 5.4.0-56-generic, x86_64: installed
ExtraDebuggingInterest: Yes
GpuHangFrequency: Continuously
GpuHangReproducibility: Seems to happen randomly
GpuHangStarted: Immediately after installing this version of Ubuntu
GraphicsCard:
 NVIDIA Corporation TU106 [GeForce RTX 2060] [10de:1f15] (rev a1) (prog-if 00 [VGA controller])
   Subsystem: ASUSTeK Computer Inc. Device [1043:1e21]
 Advanced Micro Devices, Inc. [AMD/ATI] Renoir [1002:1636] (rev f0) (prog-if 00 [VGA controller])
   Subsystem: ASUSTeK Computer Inc. Renoir [1043:1e21]
InstallationDate: Installed on 2020-12-01 (0 days ago)
InstallationMedia: Xubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 (20200731)
MachineType: ASUSTeK COMPUTER INC. TUF Gaming FA506IV_TUF506IV
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-56-generic root=UUID=11577769-570c-4297-a9b2-35fb3a30f6bf ro quiet splash vt.handoff=7
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 07/02/2020
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: FA506IV.309
dmi.board.asset.tag: ATN12345678901234567
dmi.board.name: FA506IV
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: 1.0
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: ASUSTeK COMPUTER INC.
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrFA506IV.309:bd07/02/2020:svnASUSTeKCOMPUTERINC.:pnTUFGamingFA506IV_TUF506IV:pvr1.0:rvnASUSTeKCOMPUTERINC.:rnFA506IV:rvr1.0:cvnASUSTeKCOMPUTERINC.:ct10:cvr1.0:
dmi.product.family: TUF Gaming FA506IV
dmi.product.name: TUF Gaming FA506IV_TUF506IV
dmi.product.version: 1.0
dmi.sys.vendor: ASUSTeK COMPUTER INC.
nvidia-settings:
 ERROR: Unable to load info from any available system

 ERROR: Unable to load info from any available system
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.101-2
version.libgl1-mesa-dri: libgl1-mesa-dri 20.0.8-0ubuntu1~20.04.1
version.libgl1-mesa-glx: libgl1-mesa-glx N/A
version.nvidia-graphics-drivers: nvidia-graphics-drivers-* N/A
version.xserver-xorg-core: xserver-xorg-core 2:1.20.8-2ubuntu2.6
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20200226-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.16-1

Revision history for this message
Kuroš Taheri-Golværzi (ktaherig) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It sounds like some part of the system has crashed. To help us find the cause of the crash please follow these steps:

1. Look in /var/crash for crash files and if found run:
    ubuntu-bug YOURFILE.crash
Then tell us the ID of the newly-created bug.

2. If step 1 failed then look at https://errors.ubuntu.com/user/ID where ID is the content of file /var/lib/whoopsie/whoopsie-id on the machine. Do you find any links to recent problems on that page? If so then please send the links to us.

3. If step 2 also failed then apply the workaround from bug 994921, reboot, reproduce the crash, and retry step 1.

Please take care to avoid attaching .crash files to bugs as we are unable to process them as file attachments. It would also be a security risk for yourself.

affects: xorg (Ubuntu) → xorg-server (Ubuntu)
Changed in xorg-server (Ubuntu):
status: New → Incomplete
Revision history for this message
Kuroš Taheri-Golværzi (ktaherig) wrote :

When I tried Step 1, I got an error messaged saying that Ubuntu 20.04 has experienced and error and asking if I'd like to report it to the developers, and when I clicked Yes, it gave another error message saying that "the problem happened with /usr/lib/xorg/Xorg which changed since the crash occurred". I copied the *.crash file to a USB so that I can send it on my older computer (which I'm typing this report on now). Would you like me to paste it here?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Please either go to step 2 in comment #2, or update the system and wait until a new crash file is produced.

Revision history for this message
Kuroš Taheri-Golværzi (ktaherig) wrote :
Revision history for this message
Kuroš Taheri-Golværzi (ktaherig) wrote :

Update (which I hope turns out to be helpful):

While I was reading through my own error logs, I saw "Guake", so I decided to check and see if there have been any problems reported with Guake Terminal. I found this:
https://github.com/Guake/guake/issues/1753
and this:
https://github.com/Guake/guake/issues/1749#issuecomment-632189285
and so I tried the tips in the comments. I disabled "Automatically save session..." in the "Preferences" and I disabled the activation of Guake at startup in my system settings, and the problem still persisted. So now, I can say with certainty that Guake isn't the problem. I'm almost entirely sure that it's an issue with Xorg, because the exact same problem happened with the screen/computer freezing/locking up about 5 minutes into having it turned on without my doing anything at all.

Also, this is unrelated to the bug, but I can't find contact info on the official Ubuntu site, so I thought perhaps I'd ask here:

How can I become a Linux Developer and contribute to Ubuntu? I really love this project, and I've developed a huge love affair with Linux as a whole. I've gathered that I need to become fully fluent in C, since Linux itself is written in C:
https://github.com/torvalds/linux
Plus, I'm currently studying for, first, the LFCSA, and then the LFCE.
I've tried looking for answers at:
> https://askubuntu.com/questions/419042/how-do-i-register-as-a-ubuntu-developer
> https://wiki.ubuntu.com/UbuntuDevelopers
> https://discourse.ubuntu.com/
> https://wiki.ubuntu.com/Kernel/SourceCode
> https://code.launchpad.net/ubuntu
but I'm having difficulty finding my way around. What skill set should I learn first in order to prepare and become worthy of working on Ubuntu as a proper developer? How should I get started?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It might be a kernel driver problem. To find out please:

1. Reproduce a freeze again.

2. Reboot.

3. Open a Terminal and run:

   journalctl -b-1 > prevboot.txt

   and attach the resulting text file here.

tags: added: amdgpu hybrid nvidia
summary: - Xorg randomly freezes
+ [nvidia] [amdgpu] Xorg randomly freezes
Revision history for this message
Kuroš Taheri-Golværzi (ktaherig) wrote :
Revision history for this message
Kuroš Taheri-Golværzi (ktaherig) wrote :

I apologize. I'd misread your instructions as "-b-l" (as in, L, instead of the number one) and I got an error, so I assumed that I was supposed to pass "-b -l". When I realized my mistake, I reproduced the error again, and then rebooted again, and this time, I passed "-b-1" (with the number one). Here's the correct file with the correct output.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thanks. Good news at least that I don't see any kernel driver crashes there.

Changed in xorg-server (Ubuntu):
status: Incomplete → New
affects: xorg-server (Ubuntu) → nvidia-graphics-drivers-440 (Ubuntu)
Revision history for this message
Robert Jansen (rj6603) wrote :
Download full text (8.0 KiB)

Same problem here. Reproducable by switching virtual terminals a few times with <ctrl><alt><F#>.
The only thing the system responds to is a ping (but no ssh) and a <sysrq>REISUB.
This only happens to me on kernel 5.4.0-56-generic. If I revert to the previous kernel, 5.4.0-54-generic, the freeze does not happen.

After every freeze journalctl always shows a SIMD error that seems to cause the freeze.
Below are two examples:

dec 03 17:23:05 zen kernel: simd exception: 0000 [#1] SMP NOPTI
dec 03 17:23:05 zen kernel: CPU: 5 PID: 1195 Comm: Xorg Tainted: G OE 5.4.0-56-generic #62-Ubuntu
dec 03 17:23:05 zen kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C52/B450M-A PRO MAX (MS-7C52), BIOS 3.40 01/22/2020
dec 03 17:23:05 zen kernel: RIP: 0010:CalculateDelayAfterScaler.constprop.0+0xb2/0x320 [amdgpu]
dec 03 17:23:05 zen kernel: Code: 10 45 b0 f2 45 0f 59 84 24 08 0d 00 00 f2 44 0f 5e 45 d0 f2 48 0f 2a c8 f2 0f 59 c8 66 0f ef c0 f2 41 0f 5a 84 24 60 17 00 00 <f2> 41 0f 5e c8 f2 0f 5c 4d c8 f2 0f 5a c9 e8 7b df f7 ff f2 0f 10
dec 03 17:23:05 zen kernel: RSP: 0018:ffffa135419cf708 EFLAGS: 00010202
dec 03 17:23:05 zen kernel: RAX: 0000000000003c00 RBX: 0000000000000002 RCX: 0000000000000000
dec 03 17:23:05 zen kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff8e2cb16c1f38
dec 03 17:23:05 zen kernel: RBP: ffffa135419cf780 R08: 0000000000000002 R09: 0000000000000f00
dec 03 17:23:05 zen kernel: R10: 0000000000000001 R11: ffff8e2cb16c53b8 R12: ffff8e2cb16c1f38
dec 03 17:23:05 zen kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
dec 03 17:23:05 zen kernel: FS: 00007f43ee050a80(0000) GS:ffff8e2d6e940000(0000) knlGS:0000000000000000
dec 03 17:23:05 zen kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
dec 03 17:23:05 zen kernel: CR2: 00007fc926c8b3e0 CR3: 00000007e6050000 CR4: 0000000000340ee0
dec 03 17:23:05 zen kernel: Call Trace:

and

dec 04 06:53:51 zen kernel: simd exception: 0000 [#1] SMP NOPTI
dec 04 06:53:51 zen kernel: CPU: 0 PID: 5889 Comm: kworker/0:2 Tainted: G OE 5.4.0-56-generic #62-Ubuntu
dec 04 06:53:51 zen kernel: Hardware name: Micro-Star International Co., Ltd. MS-7C52/B450M-A PRO MAX (MS-7C52), BIOS 3.40 01/22/2020
dec 04 06:53:51 zen kernel: Workqueue: events console_callback
dec 04 06:53:51 zen kernel: RIP: 0010:CalculateDelayAfterScaler.constprop.0+0xb2/0x320 [amdgpu]
dec 04 06:53:51 zen kernel: Code: 10 45 b0 f2 45 0f 59 84 24 08 0d 00 00 f2 44 0f 5e 45 d0 f2 48 0f 2a c8 f2 0f 59 c8 66 0f ef c0 f2 41 0f 5a 84 24 60 17 00 00 <f2> 41 0f 5e c8 f2 0f 5c 4d c8 f2 0f 5a c9 e8 7b df f7 ff f2 0f 10
dec 04 06:53:51 zen kernel: RSP: 0018:ffffad2c42a77618 EFLAGS: 00010202
dec 04 06:53:51 zen kernel: RAX: 0000000000000f00 RBX: 0000000000000002 RCX: 0000000000000000
dec 04 06:53:51 zen kernel: RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9e845b851f38
dec 04 06:53:51 zen kernel: RBP: ffffad2c42a77690 R08: 0000000000000002 R09: 0000000000000f00
dec 04 06:53:51 zen kernel: R10: 0000000000000001 R11: ffff9e845b8553b8 R12: ffff9e845b851f38
dec 04 06:53:51 zen kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
dec 04 06:53:51 zen kernel: ...

Read more...

Revision history for this message
Kuroš Taheri-Golværzi (ktaherig) wrote :

I tried installing nvidia 450 and 455, and on both tries, I get the same issue. When I try setting the driver settings to use nouveau instead, the GUI doesn't even load. It's just a black screen (that resembles a TTY) with the text:

such-and-such / such-and-such blocks
ucsi_acpi usbc000:00: ppm init failed -110

An so then I need to use <Ctrl+Alt+F#> to go into the TTY, reinstall nvidia, and and at least solves *that* problem and I can get back into the GUI. But then it's the same screen-freeze issue again.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I just noticed your AMD GPU is a "Renoir" which isn't fully supported until kernel 5.5, and you have kernel 5.4. So it might be useful to try a newer kernel. Please try:

  sudo apt install linux-generic-hwe-20.04-edge

and then reboot.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.