Kernel crash in nvidia_modeset hangs the whole graphic system (page allocation failure)

Bug #1897659 reported by Facundo Batista on 2020-09-29
36
This bug affects 5 people
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-450 (Ubuntu)
Undecided
Unassigned
nvidia-graphics-drivers-455 (Ubuntu)
Undecided
Unassigned

Bug Description

This happens to me around once per week. The first line I found in syslog is:

Xorg: page allocation failure: order:5, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0

Attached is the whole syslog extract for the problem.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: xorg 1:7.7+19ubuntu14
ProcVersionSignature: Ubuntu 5.4.0-48.52-generic 5.4.60
Uname: Linux 5.4.0-48-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair nvidia_modeset nvidia
.proc.driver.nvidia.capabilities.gpu0: Error: [Errno 21] Es un directorio: '/proc/driver/nvidia/capabilities/gpu0'
.proc.driver.nvidia.capabilities.mig: Error: [Errno 21] Es un directorio: '/proc/driver/nvidia/capabilities/mig'
.proc.driver.nvidia.gpus.0000.07.00.0: Error: [Errno 21] Es un directorio: '/proc/driver/nvidia/gpus/0000:07:00.0'
.proc.driver.nvidia.registry: Binary: ""
.proc.driver.nvidia.suspend: suspend hibernate resume
.proc.driver.nvidia.suspend_depth: default modeset uvm
.proc.driver.nvidia.version:
 NVRM version: NVIDIA UNIX x86_64 Kernel Module 450.66 Wed Aug 12 19:42:48 UTC 2020
 GCC version: gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)
ApportVersion: 2.20.11-0ubuntu27.9
Architecture: amd64
BootLog: Error: [Errno 13] Permiso denegado: '/var/log/boot.log'
CasperMD5CheckResult: skip
CompositorRunning: None
CurrentDesktop: KDE
Date: Mon Sep 28 23:58:11 2020
DistUpgraded: Fresh install
DistroCodename: focal
DistroVariant: ubuntu
DkmsStatus: nvidia, 450.66, 5.4.0-48-generic, x86_64: installed
ExtraDebuggingInterest: No
GraphicsCard:
 NVIDIA Corporation GK107 [GeForce GT 740] [10de:0fc8] (rev a1) (prog-if 00 [VGA controller])
   Subsystem: NVIDIA Corporation GK107 [GeForce GT 740] [10de:1099]
InstallationDate: Installed on 2020-07-11 (79 days ago)
InstallationMedia: Kubuntu 20.04 LTS "Focal Fossa" - Release amd64 (20200423)
MachineType: System manufacturer System Product Name
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-48-generic root=UUID=3d184b61-094a-475b-b817-3f588547fea1 ro quiet splash vt.handoff=7
SourcePackage: xorg
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 03/07/2019
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 4602
dmi.board.asset.tag: Default string
dmi.board.name: PRIME A320M-K
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 3
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr4602:bd03/07/2019:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKCOMPUTERINC.:rnPRIMEA320M-K:rvrRevX.0x:cvnDefaultstring:ct3:cvrDefaultstring:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: System Product Name
dmi.product.sku: SKU
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.101-2
version.libgl1-mesa-dri: libgl1-mesa-dri 20.0.8-0ubuntu1~20.04.1
version.libgl1-mesa-glx: libgl1-mesa-glx N/A
version.nvidia-graphics-drivers: nvidia-graphics-drivers-* N/A
version.xserver-xorg-core: xserver-xorg-core 2:1.20.8-2ubuntu2.4
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20200226-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.16-1

Facundo Batista (facundo) wrote :
summary: - Xorg memory issues hangs the whole graphic system
+ Memory issues hangs the whole graphic system
affects: xorg (Ubuntu) → nvidia-graphics-drivers-450 (Ubuntu)
tags: added: nvidia

The behaviour is that I stand up from my machine, and comeback a couple of hours later (desktop machine, no sleep or hibernation involved). When I come back it's unresponsive. It doesn't provide signal to the monitor, the keyboard "numlock light" doesn't even work.

However I can ssh into it just fine, and work just fine inside. But no graphic system, and no way to really restart any process to get it back (everything I tried, at least).

I normally end up issuing a `sudo shutdown now` (after sshing inside), which makes it work a while but DO NOT turn the machine off (it looks that whatever is hung prevents that).

We can't debug or fix the Nvidia driver directly because it is closed source. I suggest the quickest solution might be to downgrade to the 440 driver in the 'Additional Drivers' app.

summary: - Memory issues hangs the whole graphic system
+ Kernel crash in nvidia_modeset hangs the whole graphic system
summary: - Kernel crash in nvidia_modeset hangs the whole graphic system
+ Kernel crash in nvidia_modeset hangs the whole graphic system (GeForce
+ GT 740)
Facundo Batista (facundo) wrote :

I've installed nvidia-graphics-drivers-455 from https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa and I'm still getting the same crash.

As indicated by Alberto Milone, I'm attaching here the nvidia report when crashed.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvidia-graphics-drivers-450 (Ubuntu):
status: New → Confirmed
Changed in nvidia-graphics-drivers-455 (Ubuntu):
status: New → Confirmed
summary: - Kernel crash in nvidia_modeset hangs the whole graphic system (GeForce
- GT 740)
+ Kernel crash in nvidia_modeset hangs the whole graphic system (page
+ allocation failure)
To post a comment you must log in.