[nvidia] Repeated screen freezes with GeForce GT 640 (GK107)

Bug #1892973 reported by ebsf on 2020-08-26
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
gnome-shell (Ubuntu)
Undecided
Unassigned
nvidia-graphics-drivers-440 (Ubuntu)
Undecided
Unassigned
nvidia-graphics-drivers-450 (Ubuntu)
Undecided
Unassigned

Bug Description

Ubuntu 20.04.01, gnome-shell 3.36.4, nvidia-driver-450, nvidia-driver-440. Both drivers expressly support the hardware. All varieties of desktop environment (Xorg, Gnome Classic, Ubuntu)

Bug summary

Screen hangs randomly and iretrievably with the only possibility of recovery being a a hard system reset. Sometimes this is triggered by activity (launching or using gnome-settings; resizing a window; using Nautilus), sometimes, not. No session lasts more than five minutes.
No combination of keystrokes will yield a terminal of any kind.
The hang also interrupts System Monitor output, so no information regarding use of system resources is available.

Steps to reproduce

1. Install gnome-session.
2. Reboot.
3. Use graphical applications.

What happened]

The screen froze irretrievably.

Screen hangs randomly and iretrievably with the only possibility of recovery being a a hard system reset. Sometimes this is triggered by activity (launching or using gnome-settings; resizing a window; using Nautilus), sometimes, not. No session lasts more than five minutes.

No combination of keystrokes will yield a terminal of any kind.

The hang also interrupts System Monitor output, so no information regarding use of system resources is available.

A hard system reset and consequent reboot yields a variety of outcomes. Sometimes, a normal gdm login. Sometimes, a black screen with an inverted-black mouse cursor/pointer/arrow that moves. Sometimes an entirely black screen. Sometimes, not even a hard system reset is sufficient and the system has to be booted into recovery mode, the existing driver removed, and another installed, before rebooting again.

With nvidia-driver-440, fewer hangs but the monitor resolution can't be set to its capacity.
I don't get it. GNOME hangs have been extensively documented for ten years. The drivers expressly support the hardware. This package is part of an Ubuntu LTS release. Is gnome-session intended to be serious, functional, production software subjected to rigorous and competent quality control? This is a serious question because I have a business to run and can't be sucked down some random technical rabbit hole just to do daily work.

Is GNOME just a cute code project intended as a resume line for people looking for real work?
What kind of quality control processes are in place that allow ancient failures, extensively reported, to persist?

Is anyone at GNOME able and willing to put on their big-boy pants to get a reliable package suitable for production deployments, released?

The command

cat /var/log/syslog | grep gnome-session

reveals fundamental errors in implementing systemd syntax and references/calls to nonexistent binaries. See the gnome-sesion units in /usr/lib/systemd/user. Maybe getting this right would be a first step to helping the developers understand the environment better. Please see man systemd.

Source is, astonishingly, a hybrid of C and Javascript, which doubtless presents a QC nightmare. Has anyone given this any thought?

What does it take to get this package to work today?
- What is the procedure? What commands must be run?
- Why doesn't GNOME simply include these commands in a script, just to make it easier? Does anyone there know how?
- Why doesn't GNOME simply compile these commands into the package so it works in the first place?
- Will a different graphics card matter?
- Which one?
- Why, if the drivers support the existing hardware, according to the documentation if not the function?

Can this package be made to work today?

If not, will this package be ready for production use in the next Ubuntu release (i.e., 20.04.02)?

If not, what alternatives exist for production use?

These are all serious questions. It is astonishing that this package has been released to the public. It getting out the door poses an existential reputational threat to the GNOME project.
Any help in getting this package to work would be most gratefully appreciated.

What did you expect to happen

I expected the screen not to freeze.

ebsf (eb-9) on 2020-08-26
information type: Private Security → Public
tags: added: 20.04 gnome-session gnome-shell nvidia xorg
Daniel van Vugt (vanvugt) wrote :

Next time the hang occurs, please reboot and then immediately run:

  journalctl -b-1 > prevboot.txt
  lspci -kv > lspci.txt
  gsettings list-recursively org.gnome.shell > settings.txt

and attach the resulting text files here.

Please also run this command to complete the bug report:

  apport-collect 1892973

And in future be careful to only discuss a single topic per bug report. You can also save time by reporting future bugs with the 'ubuntu-bug' command that will do much of this automatically.

tags: added: focal
Changed in gnome-shell (Ubuntu):
status: New → Incomplete
Daniel van Vugt (vanvugt) wrote :

Please also check /var/crash for any crash files. If you find some then please turn those into new bugs using the 'ubuntu-bug' command. And tell us the new bug IDs.

Daniel van Vugt (vanvugt) wrote :

Also tracking upstream in gnome-shell#3105 but I won't link it here so this bug can still expire.

no longer affects: gnome-shell
ebsf (eb-9) wrote :

Thanks for responding.

Some delays in responding because of client demands, then because the storage configuration of the machine was incomplete and the debug files couldn't be networked, then because the incessant gnome-session hangs required a hard-reset at nearly every step along the way.

The delays provided some information, however.

First, I booted to recovery mode, removed nvidia-driver-440, installed nvidia-driver-450 (the recommended one, per the ubuntu-drivers devices output), exited recovery, and continued the boot (NOT another hard-reset). Here, the resolution was 1024x768, not the full resolution of the monitor, in logging in under all available desktop managers (Xorg, Classic, and Ubuntu). The machine experienced NO gnome session freezes over 8-10 hours with occasional interaction in between client calls. THEN, I did a normal reboot at the end of the day, specified Xorg on login. The resolution now was back to normal but I immediately right-clicked to launch display settings and this triggered a gnome-session freeze. Through a series of hard-resets, the machine continued several other random freezes.

Today, I had reconfigured storage to, among other things, mount /tmp to /tmpfs, the consequence of which obviously is to flush /tmp on every reboot. The machine experienced no gnome-session hangs. I then ran apt update and apt full-upgrade, which triggered updates of what appeared to be nvidia-driver-450 and all of its dependencies. The machine then promptly experienced a gnome-session hang.

Hard-reset, black screen.

Hard-reset, Xorg login, generate the debug files (attached). Run apport-collect 1892973. Launching Firefox to authorize triggers a gnome-session hang.

Hard-reset, Xorg login. Launching Firefox triggers a gnome-session hang.

Hard-reset, Xorg login. Copy debug files to this machine (a functional Windows laptop) and make this entry.

The requested debug files, to the extent of the machine's ability to generate them, are attached. I ended up generating several prior-boot journal logs for all of the foregoing gnome-session hangs, and attach them in a ZIP archive because the interface will only let me attach one file. The files include:
- prevboot-1 (first prevboot)
- prevboot-2 (the boot preceeding prevboot-1)
- prevboot-1128-* (boots involving Firefox-induced hangs and one preceding)
- prevboot-1135-1 (boot involving a Firefox-induced hang).

Obviously, the machine is incapable of authorizing for apport-collect. I should add that attempting to launch Google Chrome also triggers a gnome-session hang.

No files exist in /var/crash.

ebsf (eb-9) wrote :

Some additional information:

When the boot sequence is
- GRUB boot to recovery mode
- Drop to root prompt
- Remove a driver and install a driver
- Exit root prompt (Ctrl-D)
- Resume normal boot as per the recovery mode menu

Then the display comes up at a degraded resolution (1024x768) but no gnome-session hang occurs in response to some of the reliable triggers (typically, attempting to launch Firefox, Google Chrome, or navigate gnome-settings).

Booting normally (via restart or hard-reset) brings up the display at maximum resolution and gnome-session hangs immediately on one of the reliable triggers.

I noted this previously.

To test whether resolution is the issue, I booted normally. gnome-settings hangs gnome-session, so instead, I ran
```xrandr -s 1024x768```
which was immediately effective. I then accessed gnome-settings and it triggered another gnome-session hang.

The conclusion is that degraded screen resolution itself does not prevent gnome-session hangs.

Reviewing /var/log/syslog and the systemd journal, as well as /var/log/Xorg.0.log reveals that gdm3 recognizes the monitor by manufacturer and model and correctly picks up its configuration capabilities when the system boots normally. This does not occur, however, when one boots first to recovery mode, changes drivers, and resumes normal boot.

To test whether a recovery mode session alone prevented a gnome-session hang, I then booted to recovery mode, did NOT change drivers, then resumed normal boot. Interestingly, the display came up in high resolution notwithstanding the prior xrandr reconfiguration. /var/log/Xorg.0.log reflected that gdm3 did identify the monitor. And, most importantly, the reliable triggers immediately caused a gnome-session hang.

The conclusion, then, is that a recovery mode session alone does not prevent a gnome-session hang. One must change the driver during that recovery mode session, to prevent it (albeit at the cost of degraded screen resolution).

To test whether simply removing and installing the same driver during a recovery mode session would be sufficient to avoid a gnome-session hang, I experimented with that. On resuming normal boot, the screen came up at full resolution and the usual triggers produced a gnome-session hang.

The conclusion, then, is that a change of drivers is necessary during a recovery mode session to prevent a gnome-session hang after resuming normal boot from recovery mode. Again, this is at the cost of degraded screen resolution, and the protection will not persist through a normal boot.

Daniel van Vugt (vanvugt) wrote :

Thanks. This appears to be an Nvidia hardware or driver fault.

Since this is the first time I've ever seen this (other Nvidia users don't seem to encounter the same issue) I would suggest the simplest solution would be to change the graphics card. Otherwise you will need to seek support from Nvidia themselves.

The recurring problems are:

Aug 28 09:42:35 greystone /usr/lib/gdm3/gdm-x-session[2279]: (WW) NVIDIA(0): WAIT (2-S, 17, 0x00e4, 0x00003fb8, 0x00004a1c)
Aug 28 09:42:42 greystone /usr/lib/gdm3/gdm-x-session[2279]: (WW) NVIDIA(0): WAIT (1-S, 17, 0x00e4, 0x00003fb8, 0x00004a1c)
Aug 28 09:42:45 greystone /usr/lib/gdm3/gdm-x-session[2279]: (WW) NVIDIA(0): WAIT (2-S, 17, 0x00e4, 0x00003fb8, 0x00004a1c)

and

Aug 28 09:08:05 greystone /usr/lib/gdm3/gdm-x-session[1681]: (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x000009a8, 0x00001f44)
Aug 28 09:08:12 greystone /usr/lib/gdm3/gdm-x-session[1681]: (EE) NVIDIA(GPU-0): WAIT (1, 8, 0x8000, 0x000009a8, 0x00001f44)
Aug 28 09:08:19 greystone kernel: nvidia-modeset: WARNING: GPU:0: Lost display notification (0:0x00000000); continuing.
Aug 28 09:08:33 greystone kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917d:0:0:954
Aug 28 09:08:44 greystone /usr/lib/gdm3/gdm-x-session[1681]: (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x000009a8, 0x00001f4c)
Aug 28 09:08:51 greystone /usr/lib/gdm3/gdm-x-session[1681]: (EE) NVIDIA(GPU-0): WAIT (1, 8, 0x8000, 0x000009a8, 0x00001f4c)
Aug 28 09:10:03 greystone kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917d:0:0:954
Aug 28 09:12:54 greystone rsyslogd[1394]: -- MARK --
Aug 28 09:13:04 greystone kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917d:0:0:954
Aug 28 09:16:05 greystone kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000917d:0:0:954

Changed in gnome-shell (Ubuntu):
status: Incomplete → Invalid
summary: - gnome-session fails, and fails, and fails yet again
+ [nvidia] Repeated screen freezes with GeForce GT 640 (GK107)
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nvidia-graphics-drivers-440 (Ubuntu):
status: New → Confirmed
Changed in nvidia-graphics-drivers-450 (Ubuntu):
status: New → Confirmed

having same issues with gtx 650 ti, Kubuntu focal fossa, both with KDE and Gnome
Most of times happens in the start of the session and i am forced to force reboot it. If i can start the session it's freezes in a few minutes

ebsf (eb-9) wrote :

Try nvidia-graphics-driver-390.

I am using 390, but i need the most updated drivers to play my games.
And it was working a week ago

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers