[modeset][nvidia] X Server session crash with "No space left on device" and then "EnterVT failed for gpu screen 0"

Bug #1897530 reported by Kent Lin
44
This bug affects 17 people
Affects Status Importance Assigned to Milestone
xorg-server (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
Won't Fix
Undecided
Unassigned
Focal
Fix Released
Undecided
Timo Aaltonen
Groovy
Fix Released
Undecided
Unassigned
xorg-server-hwe-18.04 (Ubuntu)
Bionic
Confirmed
Undecided
Timo Aaltonen

Bug Description

[Impact]

On a hybrid machine where NVIDIA is rendering the screen, if a display attached to a dock is removed while on suspend mode, on resume X crashes.

[Test case]

Test hot-unplug during suspend and resume, X shouldn't crash.

[Regression potential]

This replaces a patch we had with two commits that got merged upstream after a fairly extensive review process:

https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/443/

And we ship it in groovy. If there were issues caused by it, they would be around hot-unplugging a dock while on suspend and such.

--

Expectation: on removal of external display during S3, after resume eDP rendering should still continue by Nvidia.
Symptom description: Plug out DP cable when suspension, when resume, X will crash.

1. Before S3 entry, eDP and external panel(via dock) were driven by Intel but NV was rendering for eDP.
2. After S3 entry dock was removed.
3. On resume, Linux kernel tried to find dock but couldn’t find it, so fails resume of TB3 devices and logs confirm that.
4. X fails to enter VT due to Null meta mode.

To address the explanation, X is not failing to enter the VT due to NVIDIA’s NULL MetaMode. It’s the Intel (modesetting) driver that is failing to enter the VT. It’s expected that NVIDIA would have a NULL MetaMode, since there are no displays connected to it. Our driver is pretty robust to head state changes – even if we have to fall back to a NULL MetaMode, we will still enter the VT successfully.

The logs clearly show that the failure is originating from the Intel (modesetting) driver, not the NVIDIA driver. The relevant path in X is driver.c:EnterVT() => drmmode_display.c:drmmode_set_desired_modes(). In this function, the Intel driver tries to set a mode on each of the previously connected/enabled displays, and it tries to find the closest possible mode using xf86OutputFindClosestMode(). This is similar to our driver’s functionality, but unlike us, they don’t support a NULL fallback. If there are no modes supported on one of the displays, it simply fails, which causes X to terminate:

DisplayModePtr mode = xf86OutputFindClosestMode(output, pScrn->currentMode);

if (!mode)
    return FALSE;

Since we removed one of the displays via the dock, it fails. It doesn’t matter if the internal panel is still available, Intel will fail if it can’t re-enable ANY of the previously connected displays.

Technically the modesetting driver isn’t specifically an Intel driver, it’s a generic driver that can be used by any DRM device – it’s just that Intel has chosen to use the modesetting driver as their standard Linux X driver going forward.

CVE References

Revision history for this message
Kent Lin (kent-jclin) wrote :

The “real” fix has been released by Intel and has also been merged.
https://gitlab.freedesktop.org/xorg/xserver/-/issues/1010

Could Canonical consider to revert follow temporary patch.

0001-WIP-modesetting-check-the-kms-state-on-EnterVT.patch

0002-WIP-modesetting-do-not-reset-the-mode-on-disconnecte.patch

Timo Aaltonen (tjaalton)
Changed in xorg-server (Ubuntu Bionic):
status: New → Invalid
Changed in xorg-server-hwe-18.04 (Ubuntu Focal):
status: New → Invalid
Changed in xorg-server-hwe-18.04 (Ubuntu):
status: New → Invalid
Changed in xorg-server-hwe-18.04 (Ubuntu Bionic):
assignee: nobody → Timo Aaltonen (tjaalton)
Changed in xorg-server (Ubuntu Focal):
assignee: nobody → Timo Aaltonen (tjaalton)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

There are lots of similar bug reports and crash reports that might be related to this. Please help to link up and de-duplicate them as appropriate...

https://bugs.launchpad.net/ubuntu/+source/xorg-server?field.searchtext=EnterVT

https://errors.ubuntu.com/?release=Ubuntu%2020.04&package=xorg-server&period=year

tags: added: bionic
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

this is a follow-up for bug 1879893, fixes an issue found in it and replaces the old commit with what was added upstream

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xorg-server - 2:1.20.9-2ubuntu1

---------------
xorg-server (2:1.20.9-2ubuntu1) groovy; urgency=medium

  * Merge from Debian.
    - xfree86-add-drm-modes-on-non-GTF-panels.patch: Dropped, upstream
    - CVE patches dropped, upstream
  * modesetting-do-not-stop-on-entervt.diff: Dropped in favor of two
    upstream commits that got merged. (LP: #1897530)

 -- Timo Aaltonen <email address hidden> Wed, 07 Oct 2020 08:46:52 +0300

Changed in xorg-server (Ubuntu):
status: New → Fix Released
Timo Aaltonen (tjaalton)
description: updated
description: updated
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

Kent, I've pushed this for an SRU, but need to know that if it actually fixes the original bug, please pre-test the patched xserver from ppa:canonical-x/x-staging

Revision history for this message
Kent Lin (kent-jclin) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Bug 1857443 appears to be a duplicate of bug 1791981, so this should also have been a duplicate of bug 1791981...

However since this bug is more advanced we can keep tracking it here.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in xorg-server (Ubuntu Focal):
status: New → Confirmed
Changed in xorg-server-hwe-18.04 (Ubuntu Bionic):
status: New → Confirmed
summary: - The modesetting driver does not gracefully handle missing connectors on
- EnterVT
+ [nvidia] X Server session crash with "No space left on device" and then
+ "EnterVT failed for gpu screen 0"
summary: - [nvidia] X Server session crash with "No space left on device" and then
- "EnterVT failed for gpu screen 0"
+ [modeset][nvidia] X Server session crash with "No space left on device"
+ and then "EnterVT failed for gpu screen 0"
tags: added: nvidia
tags: added: focal
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Kent, or anyone else affected,

Accepted xorg-server into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/xorg-server/2:1.20.8-2ubuntu2.5 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in xorg-server (Ubuntu Focal):
status: Confirmed → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (xorg-server/2:1.20.8-2ubuntu2.5)

All autopkgtests for the newly accepted xorg-server (2:1.20.8-2ubuntu2.5) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

camitk/unknown (amd64)
software-properties/0.98.9.3 (armhf)
aptdaemon/1.1.1+bzr982-0ubuntu32.2 (amd64, armhf)
libsoup2.4/2.70.0-1 (arm64)
openjdk-8/8u275-b01-0ubuntu1~20.04 (armhf)
ubuntu-release-upgrader/1:20.04.28 (armhf)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#xorg-server

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Kent Lin (kent-jclin) wrote :

Test WD19TB with Dell Precision 7750
Kernel: 5.6.0-1035-OEM#37
xorg-server (2:1.20.8-2ubuntu2.5)
NV driver:450.80.02

The issue is fixed.

Revision history for this message
Kent Lin (kent-jclin) wrote :

The comment#12 is tested on Focal.

Kent Lin (kent-jclin)
tags: added: verification-done-focal
removed: verification-needed-focal
Mathew Hodson (mhodson)
Changed in xorg-server (Ubuntu Bionic):
status: Invalid → Won't Fix
no longer affects: xorg-server-hwe-18.04 (Ubuntu Focal)
no longer affects: xorg-server-hwe-18.04 (Ubuntu)
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

There are still some outstanding autopkgtest issues (as mentioned in comment #11). Can anyone take a look at those? I can't release it without resolved test issues.

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

actually, there's a regression filed upstream

https://gitlab.freedesktop.org/xorg/xserver/-/issues/1105

and it's fixed by this

https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/559

so we'd need to amend this SRU with that oneliner. And put it in hirsute/groovy too.

Timo Aaltonen (tjaalton)
Changed in xorg-server (Ubuntu Groovy):
status: New → In Progress
Changed in xorg-server (Ubuntu):
status: Fix Released → In Progress
status: In Progress → Fix Released
Changed in xorg-server (Ubuntu Groovy):
status: In Progress → Invalid
status: Invalid → Fix Released
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

Oh nevermind, that regression is not related to this fix! But I did upload hirsute already with the wrong buglink :/

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xorg-server - 2:1.20.8-2ubuntu2.6

---------------
xorg-server (2:1.20.8-2ubuntu2.6) focal-security; urgency=medium

  * SECURITY UPDATE: out of bounds memory accesses on too short request
    - debian/patches/CVE-2020-14360.patch: check SetMap request length
      carefully in xkb/xkb.c.
    - CVE-2020-14360
  * SECURITY UPDATE: multiple heap overflows
    - debian/patches/CVE-2020-25712.patch: add bounds checks in xkb/xkb.c.
    - CVE-2020-25712

 -- Marc Deslauriers <email address hidden> Mon, 30 Nov 2020 12:56:33 -0500

Changed in xorg-server (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Chris Bainbridge (chris-bainbridge) wrote :

Are there any known regressions/issues from this fix? The actual crash seems to be fixed but there still seems to be some confusion between the laptop display and external monitors, the internal display is active when the lid is closed, and suspend with lid closed messes up the desktop/monitor "fullscreen" size. I'm using a Razer Blade 15 with 3 external monitors.

If I boot the laptop with the lid closed, "Screen Display" settings shows the laptop display is active, and it's possible to drag a window off the screen to where the laptop display is supposed to be (but lid is closed, it's off).

If I then suspend and resume, the screen sizing gets messed up - hitting F11 to fullscreen a window on the 2nd monitor results in it expanding to fill 2 monitors instead of 1. However, if I open the laptop lid, and suspend with the lid open, then F11 fullscreen correctly fills 1 monitor as it should.

Ubuntu 20.04 LTS has the problem. Ubuntu 20.10 works ok (lid closed on boot=no active laptop display, after suspend F11 still works)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.