[nvidia] Screen freeze (GPU MMU faults reported by the kernel driver) on Quadro K620

Bug #1894454 reported by Murat Gokmen on 2020-09-06
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-440 (Ubuntu)
Undecided
Unassigned
nvidia-graphics-drivers-450 (Ubuntu)
Undecided
Unassigned

Bug Description

Ubuntu 20.04 freezes randomly

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: xorg 1:7.7+19ubuntu14
ProcVersionSignature: Ubuntu 5.4.0-45.49-generic 5.4.55
Uname: Linux 5.4.0-45-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
.proc.driver.nvidia.capabilities.gpu0: Error: [Errno 21] Bir dizin: '/proc/driver/nvidia/capabilities/gpu0'
.proc.driver.nvidia.capabilities.mig: Error: [Errno 21] Bir dizin: '/proc/driver/nvidia/capabilities/mig'
.proc.driver.nvidia.gpus.0000.01.00.0: Error: [Errno 21] Bir dizin: '/proc/driver/nvidia/gpus/0000:01:00.0'
.proc.driver.nvidia.registry: Binary: ""
.proc.driver.nvidia.suspend: suspend hibernate resume
.proc.driver.nvidia.suspend_depth: default modeset uvm
.proc.driver.nvidia.version:
 NVRM version: NVIDIA UNIX x86_64 Kernel Module 450.66 Wed Aug 12 19:42:48 UTC 2020
 GCC version: gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)
ApportVersion: 2.20.11-0ubuntu27.8
Architecture: amd64
BootLog: Error: [Errno 13] Erişim engellendi: '/var/log/boot.log'
CasperMD5CheckResult: skip
CompositorRunning: None
CurrentDesktop: ubuntu:GNOME
Date: Mon Sep 7 01:22:46 2020
DistUpgraded: Fresh install
DistroCodename: focal
DistroVariant: ubuntu
DkmsStatus:
 nvidia, 450.66, 5.4.0-45-generic, x86_64: installed
 virtualbox, 6.1.10, 5.4.0-42-generic, x86_64: installed
 virtualbox, 6.1.10, 5.4.0-45-generic, x86_64: installed
ExtraDebuggingInterest: Yes, if not too technical
GpuHangFrequency: Several times a day
GpuHangReproducibility: Seems to happen randomly
GpuHangStarted: Since a couple weeks or more
GraphicsCard:
 NVIDIA Corporation GM107GL [Quadro K620] [10de:13bb] (rev a2) (prog-if 00 [VGA controller])
   Subsystem: NVIDIA Corporation GM107GL [Quadro K620] [10de:1098]
InstallationDate: Installed on 2020-04-26 (133 days ago)
InstallationMedia: Ubuntu 20.04 LTS "Focal Fossa" - Release amd64 (20200423)
MachineType: Dell Inc. Precision Tower 3620
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=tr_TR.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-45-generic root=UUID=5abcce6c-0eb6-4ef4-bf1b-0a62c2924a2d ro quiet splash vt.handoff=7
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 03/25/2020
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.15.0
dmi.board.name: 0MWYPT
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 3
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.15.0:bd03/25/2020:svnDellInc.:pnPrecisionTower3620:pvr:rvnDellInc.:rn0MWYPT:rvrA00:cvnDellInc.:ct3:cvr:
dmi.product.family: Precision
dmi.product.name: Precision Tower 3620
dmi.product.sku: 06B7
dmi.sys.vendor: Dell Inc.
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.101-2
version.libgl1-mesa-dri: libgl1-mesa-dri 20.0.8-0ubuntu1~20.04.1
version.libgl1-mesa-glx: libgl1-mesa-glx 20.0.8-0ubuntu1~20.04.1
version.nvidia-graphics-drivers: nvidia-graphics-drivers-* N/A
version.xserver-xorg-core: xserver-xorg-core 2:1.20.8-2ubuntu2.3
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20200226-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.16-1

Murat Gokmen (gokmen-murat) wrote :
Daniel van Vugt (vanvugt) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It sounds like some part of the system has crashed. To help us find the cause of the crash please follow these steps:

1. Look in /var/crash for crash files and if found run:
    ubuntu-bug YOURFILE.crash
Then tell us the ID of the newly-created bug.

2. If step 1 failed then look at https://errors.ubuntu.com/user/ID where ID is the content of file /var/lib/whoopsie/whoopsie-id on the machine. Do you find any links to recent problems on that page? If so then please send the links to us.

3. If step 2 also failed then apply the workaround from bug 994921, reboot, reproduce the crash, and retry step 1.

Please take care to avoid attaching .crash files to bugs as we are unable to process them as file attachments. It would also be a security risk for yourself.

tags: added: nvidia
summary: - Xorg freeze
+ [nvidia] Screen freeze
affects: xorg (Ubuntu) → ubuntu
Changed in ubuntu:
status: New → Incomplete

Also, next time the freeze happens, please reboot and then run:

  journalctl -b-1 > prevboot.txt

and attach the resulting text file here.

Murat Gokmen (gokmen-murat) wrote :

Thank you for your quick action. I think, I was not able to obtain an ID by step 1. However in the second step I came to the following conclusion:

https://errors.ubuntu.com/user/fe7ac6f4c2155acaa814a070f28ac3f52faf07f0e1118b3ae913313b06bb0e49a210212314e57aefd531035386d863432d61c727ae689af99164acf22dc685dc

And prevboot.txt file is attached as requested. Thanks.

Daniel van Vugt (vanvugt) wrote :

Thanks. It appears your previous session crashed due to an Nvidia hardware/driver problem:

Eyl 08 14:05:52 gokmen-precision-tower-3620 kernel: NVRM: GPU at PCI:0000:01:00: GPU-e87e8c4e-fc92-319a-6813-3f0cdc7d250e
Eyl 08 14:05:52 gokmen-precision-tower-3620 kernel: NVRM: GPU Board Serial Number: 0420617011626
Eyl 08 14:05:52 gokmen-precision-tower-3620 kernel: NVRM: Xid (PCI:0000:01:00): 31, pid=2071, Ch 00000020, intr 10000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_RAST faulted @ 0x0_00000000. Fault is of type FAULT_PDE ACCESS_TYPE_READ
Eyl 08 14:05:52 gokmen-precision-tower-3620 kernel: NVRM: Xid (PCI:0000:01:00): 31, pid=478, Ch 00000002, intr 10000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_RAST faulted @ 0x0_00000000. Fault is of type FAULT_PDE ACCESS_TYPE_READ
Eyl 08 14:05:52 gokmen-precision-tower-3620 /usr/lib/gdm3/gdm-x-session[1925]: (EE) NVIDIA(0): The NVIDIA X driver has encountered an error; attempting to
Eyl 08 14:05:52 gokmen-precision-tower-3620 kernel: NVRM: Xid (PCI:0000:01:00): 69, pid=1925, Class Error: ChId 0018, Class 0000b097, Offset 000008d0, Data 00343410, ErrorCode 0000000c
Eyl 08 14:05:52 gokmen-precision-tower-3620 /usr/lib/gdm3/gdm-x-session[1925]: (EE) NVIDIA(0): recover...
Eyl 08 14:05:57 gokmen-precision-tower-3620 /usr/lib/gdm3/gdm-x-session[1925]: (WW) NVIDIA: Wait for channel idle timed out.
Eyl 08 14:05:58 gokmen-precision-tower-3620 kernel: NVRM: Xid (PCI:0000:01:00): 31, pid=1925, Ch 0000001a, intr 10000000. MMU Fault: ENGINE GRAPHICS HUBCLIENT_FE faulted @ 0x1_01fa0000. Fault is of type FAULT_UNSUPPORTED_APERTURE ACCESS_TYPE_WRITE
Eyl 08 14:05:59 gokmen-precision-tower-3620 kernel: NVRM: Xid (PCI:0000:01:00): 32, pid=1226, Channel ID 00000001 intr 80060000
Eyl 08 14:05:59 gokmen-precision-tower-3620 kernel: NVRM: Xid (PCI:0000:01:00): 32, pid=1226, Channel ID 00000001 intr 00020000

affects: ubuntu → nvidia-graphics-drivers-450 (Ubuntu)
Changed in nvidia-graphics-drivers-450 (Ubuntu):
status: Incomplete → New
Daniel van Vugt (vanvugt) wrote :

I suggest opening the 'Additional Drivers' app and trying to downgrade to driver version 440 instead of 450.

Murat Gokmen (gokmen-murat) wrote :

Thank you for your suggestion. I was using driver version 440 when desktop freezing first started. I disabled all power savings, adjusted the nvidia gpu performance to maximum and upgraded the driver to latest stable version based on user experiences and recommendations in various forums.

I didn't paid attention to exactly after which action this problem started, it was a mistake of mine. I'm constantly updating my system. i suspect it started after a new kernel update.

I've downgraded the driver upon your suggestion. I still have all power savings disabled and gpu performance adjusted to maximum. I'll observe and update this issue in case desktop freezing continues.

Meanwhile I attached a video showing the freezing instant. Desktop freezes like this, keyboard and mouse become unresponsive, sound of playing media file continues to be heard for a while.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvidia-graphics-drivers-450 (Ubuntu):
status: New → Confirmed
Diego Rodriguez (chili-man) wrote :
Download full text (5.4 KiB)

I just updated to the Nvidia 450 drivers today as part of the regular updates released today.

My computer froze when the Chrome and Brave browsers both segfaulted:

```
Sep 21 18:55:50 nopal /usr/lib/gdm3/gdm-x-session[6431]: (WW) NVIDIA(0): WAIT (2-S, 17, 0x643f79, 0x00003f70, 0x000049d4)
Sep 21 18:55:53 nopal kernel: GpuWatchdog[12519]: segfault at 0 ip 000055f768df0182 sp 00007f910e3d3390 error 6 in brave[55f763d38000+8372000]
Sep 21 18:55:53 nopal kernel: Code: 89 de e8 81 a4 75 ff 80 7d c7 00 79 09 48 8b 7d b0 e8 52 2e 62 fe 41 8b 84 24 e0 00 00 00 89 45 b0 48 8d 7d b0 e8 4e 36 4e fb <c7> 04 25 00 00 00 00 37 13 00 00 48 83 c4 48 5b 41 5c 41 5d 41 5e
Sep 21 18:55:53 nopal systemd[1]: Starting Process error reports when automatic reporting is enabled...
Sep 21 18:55:57 nopal /usr/lib/gdm3/gdm-x-session[6431]: (WW) NVIDIA(0): WAIT (1-S, 17, 0x643f79, 0x00003f70, 0x000049d4)
Sep 21 18:55:58 nopal whoopsie-upload-all[72517]: Collecting info for /var/crash/_opt_brave.com_brave_brave.1000.crash...
Sep 21 18:55:58 nopal whoopsie-upload-all[72517]: Marking /var/crash/_opt_brave.com_brave_brave.1000.crash for whoopsie upload
Sep 21 18:55:58 nopal whoopsie-upload-all[72517]: All reports processed
Sep 21 18:55:58 nopal systemd[1]: apport-autoreport.service: Succeeded.
Sep 21 18:55:58 nopal systemd[1]: Finished Process error reports when automatic reporting is enabled.
Sep 21 18:55:59 nopal whoopsie[2931]: [18:55:59] Parsing /var/crash/_opt_brave.com_brave_brave.1000.crash.
Sep 21 18:55:59 nopal whoopsie[2931]: [18:55:59] Uploading /var/crash/_opt_brave.com_brave_brave.1000.crash.
Sep 21 18:56:00 nopal whoopsie[2931]: [18:56:00] Sent; server replied with: No error
Sep 21 18:56:00 nopal whoopsie[2931]: [18:56:00] Response code: 200
Sep 21 18:56:00 nopal whoopsie[2931]: [18:56:00] Reported OOPS ID 6197b784-fc6e-11ea-a8fa-fa163e102db1
Sep 21 18:56:00 nopal systemd[1]: Starting Process error reports when automatic reporting is enabled...
Sep 21 18:56:00 nopal whoopsie-upload-all[73389]: /var/crash/_opt_brave.com_brave_brave.1000.crash already marked for upload, skipping
Sep 21 18:56:00 nopal whoopsie-upload-all[73389]: All reports processed
Sep 21 18:56:00 nopal systemd[1]: apport-autoreport.service: Succeeded.
Sep 21 18:56:00 nopal systemd[1]: Finished Process error reports when automatic reporting is enabled.
Sep 21 18:56:00 nopal /usr/lib/gdm3/gdm-x-session[6431]: (WW) NVIDIA(0): WAIT (2-S, 17, 0x643f5b, 0x00003f70, 0x000049d4)
Sep 21 18:56:02 nopal kernel: GpuWatchdog[10901]: segfault at 0 ip 000055bcb0e42a02 sp 00007fac3fb8a490 error 6 in chrome[55bcac5a8000+7bf3000]
Sep 21 18:56:02 nopal kernel: Code: 89 de e8 c1 8e 6f ff 80 7d c7 00 79 09 48 8b 7d b0 e8 42 e9 6b fe 41 8b 84 24 e0 00 00 00 89 45 b0 48 8d 7d b0 e8 ce df 9c fb <c7> 04 25 00 00 00 00 37 13 00 00 48 83 c4 48 5b 41 5c 41 5d 41 5e
Sep 21 18:56:02 nopal systemd[1]: Starting Process error reports when automatic reporting is enabled...
Sep 21 18:56:06 nopal whoopsie-upload-all[73402]: Collecting info for /var/crash/_opt_google_chrome_chrome.1000.crash...
Sep 21 18:56:06 nopal whoopsie-upload-all[73402]: Marking /var/crash/_opt_google_chrome_chrome.1000.crash for whoopsie upload
Sep 21 18...

Read more...

Daniel van Vugt (vanvugt) wrote :

Diego,

Please open your own new bug about that.

Daniel (m4hakala) wrote :

Bump, any solution? I'm experiencing the same issue.

Daniel van Vugt (vanvugt) wrote :

Daniel,

Please open your own bug by running:

  ubuntu-bug gnome-shell

so we can investigate your issue.

Changed in nvidia-graphics-drivers-440 (Ubuntu):
status: New → Confirmed
summary: - [nvidia] Screen freeze
+ [nvidia] Screen freeze (GPU MMU faults reported by the kernel driver)
summary: - [nvidia] Screen freeze (GPU MMU faults reported by the kernel driver)
+ [nvidia] Screen freeze (GPU MMU faults reported by the kernel driver) on
+ Quadro K620
To post a comment you must log in.