[nvidia] HomeOffice & Un-/Docking: X11 crash on resume when docked to multi monitor

Bug #1935951 reported by sva42c8
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-460 (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

X11 crashes on resume and my current desktop is lost and the login manager appears.

I posted a bugreported here https://gitlab.freedesktop.org/xorg/xserver/-/issues/1197
with configu details.

But it seems to be nvidia, but may be a workaround is possible in Ubuntu?
This bug prevents me to use Linux for daily docking /undocking cycles for HomeOffice (works fine with Windows).

[ 1272.639] (II) NVIDIA(0): Setting mode "DP-4: nvidia-auto-select @1920x1080 +0+0 {ViewPortIn=1920x1080, ViewPortOut=1920x1080+0+0}"
[ 1272.639] (WW) NVIDIA(0): Failed to set the display configuration
[ 1272.639] (WW) NVIDIA(0): - Setting a mode on head 1 failed: Invalid surface parameters
[ 1272.639] (WW) NVIDIA(0): were specified
[ 1272.639] (EE) NVIDIA(0): Failed to enter VT (mode initialization failed)
[ 1272.639] (EE)
Fatal server error:
[ 1272.639] (EE) EnterVT failed for screen 0
[ 1272.639] (EE)
[ 1272.639] (EE)
Please consult the The X.Org Foundation support
  at http://wiki.x.org
 for help.
[ 1272.639] (EE) Please also check the log file at "/var/log/Xorg.0.log" for additional information.
[ 1272.639] (EE)
[ 1274.034] (EE) Server terminated with error (1). Closing log file.

I found something similar which could handle the EnterVT failed issue:
https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/443/
but I am not sure if it is already in Ubuntu 20.04 LTS

Hardware:
Hardware used:
Dell Precision 15" 7540, Xeon E-2286M@2.4GHz, 128 GB ECC RAM, NVIDIA Quadro RTX 5000 16GB VRAM
Dockingstations at Home and Offices: WD19DC (240W)
Monitors used for Linux migration Office1: 2x 4K U4320Q, 42.5" @ 96dpi font / No scaling a
Monitors used for Linux migration Office2: 2xU3818DW, 37.5"
Monitors used for Linux migration at home: 3x FullHD Dell monitors

Revision history for this message
sva42c8 (sva42c8) wrote :
description: updated
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It sounds like some part of the system has crashed. To help us find the cause of the crash please follow these steps:

1. Look in /var/crash for crash files and if found run:
    ubuntu-bug YOURFILE.crash
Then tell us the ID of the newly-created bug.

2. If step 1 failed then look at https://errors.ubuntu.com/user/ID where ID is the content of file /var/lib/whoopsie/whoopsie-id on the machine. Do you find any links to recent problems on that page? If so then please send the links to us.

3. If step 2 also failed then apply the workaround from bug 994921, reboot, reproduce the crash, and retry step 1.

Please take care to avoid attaching .crash files to bugs as we are unable to process them as file attachments. It would also be a security risk for yourself.

affects: xorg (Ubuntu) → xorg-server (Ubuntu)
Changed in xorg-server (Ubuntu):
status: New → Incomplete
summary: - HomeOffice & Un-/Docking: X11 crash on resume when docked to multi
- monitor
+ [nvidia] HomeOffice & Un-/Docking: X11 crash on resume when docked to
+ multi monitor
tags: added: focal multimonitor nvidia
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Looks like this is going to be one of {bug 1421808, bug 1787332, bug 1787338} but we will need you to follow the instructions in comment #2 to find out which.

Changed in nvidia-graphics-drivers-460 (Ubuntu):
status: New → Incomplete
no longer affects: xorg-server
no longer affects: xorg-server (Ubuntu)
Revision history for this message
sva42c8 (sva42c8) wrote :

Here the IDs:
https://errors.ubuntu.com/oops/6efe7396-e5f6-11eb-a87b-fa163e983629
https://errors.ubuntu.com/oops/356671ca-e5f7-11eb-a87b-fa163e983629

Just guessing:
The crash files seem not to point to the root cause which seem to be according to Xorg.log.old "(WW) NVIDIA(0): - Setting a mode on head 1 failed: Invalid surface parameters"

In the above crash files, the processes crashed, because X11 has previously has crashed.

The X11 crash on resume happens sporadic and I am not able to find the pattern, but crashes e.g.
after 3-7 docking/undocking cycles .

Revision history for this message
Daniel van Vugt (vanvugt) wrote (last edit ):

Those links don't appear to be related to this bug, they are KDE crashes.

Revision history for this message
sva42c8 (sva42c8) wrote :

I cannot find any X11 specific crash file(s).

But I think these KDE files are related to this bug:
As soon as X11 shows up these messages(crash) (see Xorg.log.old):
 1272.639] (WW) NVIDIA(0): Failed to set the display configuration
[ 1272.639] (WW) NVIDIA(0): - Setting a mode on head 1 failed: Invalid surface parameters
...
"[ 1272.639] (EE) EnterVT failed for screen 0"

the kde sesssion is crashing and then a new Xorg.log is created. and the login manager starts.

I am out of ideas ... might be the nvidia beta drivers might help?

Revision history for this message
sva42c8 (sva42c8) wrote (last edit ):

Just an update:

I can reproduce this problem with Ubuntu 20.04, stock Gnome and latest NVidia beta driver 470.42.01 as suggested by X11 maintainers.

Gnome doesn't crash, but just restarts the desktop.

The var/crash folder is also empty, and EnterVT errors are printed in syslog

I posted at NVIDIA the issue:
https://forums.developer.nvidia.com/t/linux-for-homeoffice-docking-undocking-produces-ee-nvidia-0-failed-to-enter-vt-mode-initialization-failed/183481/2

I observed, that when the monitors and dockingstations are powercycled, the sporadic desktop crash on resume ahppens almost every 2nd time. MIght be the monitors just need just more time the be ready for X11/nvidia??

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Per comment #3 we really just need to know more about the call stack to identify which of the existing bugs this is. If you can't get a formal crash report from the Xorg process then you can just subscribe to {bug 1421808, bug 1787332, bug 1787338}.

Communicating with NVIDIA will also help, thanks!

Revision history for this message
sva42c8 (sva42c8) wrote (last edit ):

Is there anything I can do, to make the "INCOMPLETE" bug in COMPLETE state?

The problem is, that X11 is not crashing or leaving any crash files. It just restarts. From user view it is a crash.
IN case of KDE, there are at least some KDE crash files.

The post in NVIDIA forum also has been created, see my older comment #7
Are there more commands I should execute?

The three bugs seem to be related to Intel Video according to lspci.txt, and mine is NVIDIA originated X11 restart.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It is a crash but also Xorg tries to catch its own crashes which creates this headache for maintainers whereby you often lose all trace of where it crashed. Often but not always, so we do get plenty of proper crash reports from Xorg. In my experience it just takes more attempts and eventually one will leave a file in /var/crash/

Revision history for this message
sva42c8 (sva42c8) wrote (last edit ):

Today I tested also with Ubuntu 21.10 and NVIdia 470.57.02.
Tried docking/undocking more than ten times (for each cycle I suspend, no hotplugging).

Same behavior:
1) No single crash file, but same Xorg.log error from NVIDIA
2) docking / undocking unreliable (X11 restart and loosing the session)

I found bug https://bugs.launchpad.net/oem-priority/+bug/1879893 with exactly same behavior and with almost similar hardware (WD19DC<->WD19TB) and is marked as complete.

Revision history for this message
sva42c8 (sva42c8) wrote (last edit ):

Update:
Tried on monday a fresh Ubuntu 21.04 and Nvidia 470.57.02

Same X11 restart/crash with
"- Setting a mode on head 1 failed: Invalid surface parameters"
(EE) NVIDIA(0): Failed to enter VT (mode initialization failed)

and still no crash file from X11

The system is very stable like with Windows when not doing docking (multiple hibernates, suspends with 90% 128GB ram occupied, and leaving some steam games open) but useless for homeoffice<->office1<->office2 travel for all linux employees.

Reliable Docking is the only remaining blocker finalizing the linux migration :-(

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Try checking the value of:

Settings > Privacy > Diagnostics > Send error reports to Canonical

Revision history for this message
sva42c8 (sva42c8) wrote :

Settings > Privacy > Diagnostics > Send error reports to Canonical:

It was set to Manual, I set it to Automatic.

Meanwhile, I performed more than fifty docking cycles - no single X11 crash file, but it is reproducable bug (Just silent X11 restart with the NVIDIA errors).

Just frustrating, to have a X11 crash file to have complete bug.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I know it's frustrating. Realistically I was only trying to figure out if this is a duplicate of one of {bug 1421808, bug 1787332, bug 1787338}, or if this is a 4th unique bug. Regardless, it looks like an Nvidia driver bug so starting a conversation with Nvidia (as you have) is the way forward:

https://forums.developer.nvidia.com/t/linux-for-homeoffice-docking-undocking-produces-ee-nvidia-0-failed-to-enter-vt-mode-initialization-failed/183481

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for nvidia-graphics-drivers-460 (Ubuntu) because there has been no activity for 60 days.]

Changed in nvidia-graphics-drivers-460 (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.