[nvidia] Display freeze and kernel crash on GeForce GTX 1060 6GB

Bug #1887294 reported by fa2k on 2020-07-12
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-440 (Ubuntu)
Undecided
Unassigned
nvidia-graphics-drivers-450 (Ubuntu)
Undecided
Unassigned

Bug Description

The display is completely frozen, and no signal is output to the monitors. There is also no signal when switching to a console (Ctr-Alt-F1). I have normal access via SSH.

It happens about once in 14 days. Only happens when turning the display off or on (about once in 15 screen power cycles).

nvidia-smi still works (via SSH; without any options)

DISPLAY=:1 xset dpms force on

hangs indefinitely.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: xorg 1:7.7+19ubuntu14
ProcVersionSignature: Ubuntu 5.4.0-40.44-generic 5.4.44
Uname: Linux 5.4.0-40-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair nvidia_modeset nvidia
.proc.driver.nvidia.gpus.0000.01.00.0: Error: [Errno 21] Is a directory: '/proc/driver/nvidia/gpus/0000:01:00.0'
.proc.driver.nvidia.registry: Binary: ""
.proc.driver.nvidia.suspend: suspend hibernate resume
.proc.driver.nvidia.suspend_depth: default modeset uvm
.proc.driver.nvidia.version:
 NVRM version: NVIDIA UNIX x86_64 Kernel Module 440.100 Fri May 29 08:45:51 UTC 2020
 GCC version:
ApportVersion: 2.20.11-0ubuntu27.3
Architecture: amd64
BootLog:

CasperMD5CheckResult: skip
CompizPlugins: No value set for `/apps/compiz-1/general/screen0/options/active_plugins'
Date: Sun Jul 12 17:47:58 2020
DistUpgraded: 2020-05-17 13:48:43,365 DEBUG entry '# deb [arch=amd64] https://repo.skype.com/deb stable main # disabled on upgrade to focal' was disabled (unknown mirror)
DistroCodename: focal
DistroVariant: ubuntu
DkmsStatus:
 nvidia, 440.100, 5.4.0-39-generic, x86_64: installed
 nvidia, 440.100, 5.4.0-40-generic, x86_64: installed (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!) (WARNING! Diff between built and installed module!)
 v4l2loopback, 0.12.3, 5.4.0-39-generic, x86_64: installed
 v4l2loopback, 0.12.3, 5.4.0-40-generic, x86_64: installed
ExtraDebuggingInterest: Yes
GpuHangFrequency: Very infrequently
GraphicsCard:
 NVIDIA Corporation GP106 [GeForce GTX 1060 6GB] [10de:1c03] (rev a1) (prog-if 00 [VGA controller])
   Subsystem: ASUSTeK Computer Inc. GP106 [GeForce GTX 1060 6GB] [1043:85ae]
InstallationDate: Installed on 2016-07-27 (1445 days ago)
InstallationMedia: Ubuntu 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.1)
MachineType: System manufacturer System Product Name
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.4.0-40-generic root=/dev/mapper/ubuntu--vg-root ro splash vt.handoff=7
SourcePackage: xorg
Symptom: display
Title: Xorg freeze
UpgradeStatus: Upgraded to focal on 2020-05-17 (56 days ago)
dmi.bios.date: 04/17/2014
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 3601
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: P8C WS
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr3601:bd04/17/2014:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnP8CWS:rvrRev1.xx:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: System Product Name
dmi.product.sku: SKU
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer
version.compiz: compiz 1:0.9.14.1+20.04.20200211-0ubuntu1
version.libdrm2: libdrm2 2.4.101-2
version.libgl1-mesa-dri: libgl1-mesa-dri 20.0.8-0ubuntu1~20.04.1
version.libgl1-mesa-glx: libgl1-mesa-glx 20.0.8-0ubuntu1~20.04.1
version.nvidia-graphics-drivers: nvidia-graphics-drivers-* N/A
version.xserver-xorg-core: xserver-xorg-core 2:1.20.8-2ubuntu2.1
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati N/A
version.xserver-xorg-video-intel: xserver-xorg-video-intel N/A
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau N/A
xserver.bootTime: Tue Jul 7 22:02:38 2020
xserver.configfile: default
xserver.logfile: /var/log/Xorg.0.log
xserver.outputs:

xserver.version: 2:1.20.8-2ubuntu2.1

fa2k (pmb) wrote :
Daniel van Vugt (vanvugt) wrote :

It appears you have a kernel crash in the Nvidia driver:

[412515.698236] Call Trace:
[412515.698244] ? _nv002628kms+0x3aa/0x1f70 [nvidia_modeset]
[412515.698246] ? __alloc_pages_nodemask+0x173/0x320
[412515.698248] ? alloc_pages_current+0x87/0xe0
...

tags: added: nvidia
summary: - Display freeze
+ [nvidia] Display freeze
affects: xorg (Ubuntu) → nvidia-graphics-drivers-440 (Ubuntu)

It occurred once more, after about 19 days uptime.

There's an entry about Xorg in the dmesg that could be interesting, and the full dmesg is attached.

[1638952.386233] Xorg: page allocation failure: order:4, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0

Daniel van Vugt (vanvugt) wrote :
fa2k (pmb) wrote :

Done. There's been no more crash on 440, and I'll post here if I get a crash on 450.

fa2k (pmb) wrote :

It happened again, I've attached dmesg and the output of "modinfo nvidia", to show the current nvidia version.

fa2k (pmb) wrote :
summary: - [nvidia] Display freeze
+ [nvidia] Display freeze and kernel crash on GeForce GTX 1060 6GB
fa2k (pmb) wrote :

One more crash on driver 450.57-0ubuntu0~0.20.04.4. I lost the dmesg file, but it's pretty similar. Now I will update to 450.66-0ubuntu0~0.20.04.2.

fa2k (pmb) wrote :

I got the same issue on 455.23.05-0ubuntu1.

I also have some other problems on that new driver, mainly related to games. Sorry for the off-topic question, but would it be useful to report those problems, and if so, where?

having freeze with gtx 650 ti.
i can't even start the session most of times.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvidia-graphics-drivers-440 (Ubuntu):
status: New → Confirmed
Changed in nvidia-graphics-drivers-450 (Ubuntu):
status: New → Confirmed
Aleksander Demko (ademko) wrote :

Can confirm with 20.04 (not new though, this happened in 18.04 too):

- GTX 1050 Ti, drivers 450.80.02
- Kernel 5.4.0-52-generic #57-Ubuntu SMP Thu Oct 15 10:57:00 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
- dmesg attached
- I have to ssh into the machine and shut it down.

For me, it always seem to happen when I walk away after locking my screen and turning off my monitor (by turning it off at the power bar - an old work around for failing-to-sleep-monitor issues, of all things)

Note my keyboard and mice a plugged into the hub, so you'll see usb disconnects in the dmesg.

fa2k (pmb) wrote :

I don't know how clear it was before, but it's the same for me -- it happens when I connect or disconnect a monitor. It's a TV screen, and doesn't turn off automatically, so I make it turn on and off (using an RS232-based protocol over a separate cable). When turned off, the TV appears as disconnected from the computer, the desktop layout changes, etc.

The problem happens whether or not I have another, normal monitor connected.

One difference is that it won't actually shut down if I issue the shutdown command over SSH (or press the power button), it starts the shutdown process but then hangs.

This should be a pretty normal use case for laptops, so I'm surprised that not more people see it. Maybe it appears on desktop GPUs only, or that it resets when the computer is suspended (as would often happen after disconnecting a monitor laptops).

fa2k (pmb) wrote :

It's been 21 days without a crash. I think this problem is solved for me.

I may have updated system files since the last reboot, so I try to put the version of the running code:

$ cat /sys/module/nvidia/version
450.102.04

$ uname -a
Linux blackhole 5.4.0-60-generic #67-Ubuntu SMP Tue Jan 5 18:31:36 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Thanks to the people who have fixed it!

Changed in nvidia-graphics-drivers-450 (Ubuntu):
status: Confirmed → Fix Released
Changed in nvidia-graphics-drivers-440 (Ubuntu):
status: Confirmed → Won't Fix
To post a comment you must log in.