Alienware 15 R3 boots to black after upgrade to 22.04, Cannot use other TTY interfaces either (ctrl+alt+f2, etc)

Bug #2031198 reported by Bruce Goodwin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-535 (Ubuntu)
Triaged
Medium
Unassigned

Bug Description

System info:
Alienware 15 R3 laptop,
Nvidia Geforce GTX 1060 + Intel HD Graphics 630
full specs @ https://dl.dell.com/content/manual38481186-alienware-15-r3-setup-and-specifications.pdf?language=en-us

I was using the gnome gui with this laptop for years, and I dual boot to Win10, which still works as well as before (insert snark about windows here). Ubuntu/gnome was working with the proprietary nvidia drivers before the upgrade. But...

Disclaimer - I don't know much about the state before the upgrade. It happened overnight while I was not present. Sadly I dont know the ubuntu version (Probably 20.04?), nvidia driver version, if it was wayland or X11, or the kernel version. I wish I had this info for you :(.

Current Behavior:

Any attempt to start a gui turns off the monitor completely. Even the backlight flickers then turns off. I thought maybe I couldn't see anything because the backlight was off, but I tried shining a flashlight at the monitor and still can't see any dim image so it seems that's not it.

When in this state, I cannot use ctrl+alt+F# to switch to a shell.

I can ssh to the machine in this state, so it is booted! But the only way to use the machine directly is to boot in safe mode and get to a root shell (continuing boot in safe mode just leaves me in the same black-screen state).

Things I've tried:
 * switching to nvidia-driver 525, 470, and 390 (non-server variants)
   * Same behavior. - No graphics, backlight or ability to switch to another TTY
 * making Intel graphics primary with `nvidia-prime`
   * Same behavior. - No graphics, backlight or ability to switch to another TTY
 * Disabled wayland in `/etc/gdm3/custom.conf`
   * Same behavior. - No graphics, backlight or ability to switch to another TTY
 * uninstalling nvidia packages (`apt remove --purge '*nvidia*'` and `apt autoremove`) and attempting to use the nouveau driver
   * Backlight stays on, but otherwise the same behavior. - No graphics, or ability to switch to another TTY
   * This makes me think that the issue is something more than just the nvidia driver? Perhaps the 6.2.0 kernel? But I'm not sure what component to put this bug in if that's the case. I'm happy to refile wherever is appropriate.
 * Reinstalling ubuntu completely (this required using safe graphics mode - normal mode yielded the same behavior in the installer)

ProblemType: Bug
DistroRelease: Ubuntu 22.04
Package: nvidia-driver-535 535.86.05-0ubuntu0.22.04.1
ProcVersionSignature: Ubuntu 6.2.0-26.26~22.04.1-generic 6.2.13
Uname: Linux 6.2.0-26-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
CasperMD5CheckResult: pass
Date: Fri Aug 11 22:59:41 2023
InstallationDate: Installed on 2023-08-12 (0 days ago)
InstallationMedia: Ubuntu 22.04.3 LTS "Jammy Jellyfish" - Release amd64 (20230807.2)
ProcEnviron:
 TERM=linux
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: nvidia-graphics-drivers-535
UpgradeStatus: No upgrade log present (probably fresh install)
---
ProblemType: Bug
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
CasperMD5CheckResult: pass
DistUpgraded: Fresh install
DistroCodename: jammy
DistroRelease: Ubuntu 22.04
DistroVariant: ubuntu
ExtraDebuggingInterest: Yes
GraphicsCard:
 Subsystem: Dell HD Graphics 630 [1028:0774]
 NVIDIA Corporation GP106BM [GeForce GTX 1060 Mobile 6GB] [10de:1c60] (rev a1) (prog-if 00 [VGA controller])
   Subsystem: Dell GP106BM [GeForce GTX 1060 Mobile 6GB] [1028:07c0]
InstallationDate: Installed on 2023-08-12 (5 days ago)
InstallationMedia: Ubuntu 22.04.3 LTS "Jammy Jellyfish" - Release amd64 (20230807.2)
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 004: ID 0bda:58c2 Realtek Semiconductor Corp. Integrated Webcam HD
 Bus 001 Device 003: ID 0cf3:e301 Qualcomm Atheros Communications
 Bus 001 Device 002: ID 187c:0530 Alienware Corporation AW1517
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Alienware Alienware 15 R3
Package: xorg 1:7.7+23ubuntu2
PackageArchitecture: amd64
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-6.2.0-26-generic root=UUID=c2da9d36-62d1-49d4-97f6-139ed7798bca ro splash nvidia-drm.modeset=0 vt.handoff=7
ProcVersionSignature: Ubuntu 6.2.0-26.26~22.04.1-generic 6.2.13
Tags: jammy ubuntu
Uname: Linux 6.2.0-26-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin lxd plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 07/21/2020
dmi.bios.release: 1.10
dmi.bios.vendor: Alienware
dmi.bios.version: 1.10.0
dmi.board.name: Alienware 15 R3
dmi.board.vendor: Alienware
dmi.board.version: A00
dmi.chassis.type: 10
dmi.chassis.vendor: Alienware
dmi.chassis.version: Not Specified
dmi.modalias: dmi:bvnAlienware:bvr1.10.0:bd07/21/2020:br1.10:svnAlienware:pnAlienware15R3:pvr1.10.0:rvnAlienware:rnAlienware15R3:rvrA00:cvnAlienware:ct10:cvrNotSpecified:sku0774:
dmi.product.family: Alienware
dmi.product.name: Alienware 15 R3
dmi.product.sku: 0774
dmi.product.version: 1.10.0
dmi.sys.vendor: Alienware
version.compiz: compiz N/A
version.libdrm2: libdrm2 2.4.113-2~ubuntu0.22.04.1
version.libgl1-mesa-dri: libgl1-mesa-dri 23.0.4-0ubuntu1~22.04.1
version.libgl1-mesa-glx: libgl1-mesa-glx N/A
version.xserver-xorg-core: xserver-xorg-core 2:21.1.4-2ubuntu1.7~22.04.1
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev N/A
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:19.1.0-2ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.99.917+git20210115-1
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:1.0.17-2build1

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :
Revision history for this message
Bruce Goodwin (bgoodwin) wrote :

Sigh. I started this bug with `ubuntu-bug` so i hoped there would be more logs attached... Here's output from `journalctl`

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :

and `dmesg`

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :
  • syslog Edit (831.4 KiB, application/octet-stream)

and `/var/log/syslog`

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :

New behavior! + additions to the "things I've tried" list!
* adding nvidia-drm.modeset=0 to the kernel args at the grub menu
  * Backlight stays on, I can switch to ctrl+alt+F2
  * On TTY 2, it looks like abit of kernel log is displayed: "ACPI region does not cover the entire command/response buffer. [mem 0xfed40000-0xfed4087f flags 0x200] vs fed40080 f80"
  * On a whim, I decided to go back to ctrl+alt+F1, which no longer had the backlight on, but pretended there was a (gdm3?) login screen there, and hit enter + <password for default user>.... THIS Got me into x11/gnome!
  * possibly this would have worked all along? Not great because its not apparent you can do anything. but better than...

* adding nvidia-drm.modeset=1 to the kernel args at the grub menu
  * the login screen appears! hit enter + <password for default user> to log in...
  * Back to the old behavior. - No graphics, backlight or ability to switch to another TTY

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :

Extra behavior note:
In the `nvidia-drm.modeset=0`, switched to TTY2 (or any other TTY, really), with the snippet of kernel log displayed, I cannot actually use that tty for anything. There's no login prompt, entering characters at the keyboard does nothing.

I can switch to the other ttys before and after logging in with `nvidia-drm.modeset=0`, but cannot actually use them as terminals; that no-prompt behavior occurs both before and after X login.

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :
Download full text (5.5 KiB)

In the comment #5 `nvidia-drm.modeset=1` case (the one where the login screen appears, but no more graphics after logging in) I noticed that after logging in, the fans go nuts on the laptop. So I sshed in and found what looks like Xorg getting blocked on an nvidia_modeset. This gets repeated in the `dmesg` log

```
[ 140.254729] watchdog: BUG: soft lockup - CPU#2 stuck for 86s! [Xorg:1059]
[ 140.254735] Modules linked in: rfcomm ccm snd_ctl_led snd_hda_codec_realtek snd_hda_codec_generic nvidia_uvm(POE) cmac algif_hash algif_skcipher af_alg bnep intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic nvidia_drm(POE) ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd nvidia_modeset(POE) cryptd binfmt_misc nls_iso8859_1 rapl snd_soc_avs snd_soc_hda_codec snd_hda_ext_core nvidia(POE) hid_generic mei_hdcp mei_pxp intel_rapl_msr snd_soc_core snd_compress ac97_bus snd_hda_codec_hdmi snd_pcm_dmaengine i915 ath10k_pci snd_hda_intel ath10k_core snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec ath snd_hda_core uvcvideo snd_hwdep drm_buddy videobuf2_vmalloc ttm videobuf2_memops snd_pcm videobuf2_v4l2 snd_seq_midi joydev intel_cstate mac80211 snd_seq_midi_event drm_display_helper input_leds snd_rawmidi videodev usbhid btusb dell_wmi btrtl snd_seq cec dell_smbios btbcm dcdbas btintel btmtk
[ 140.254841] videobuf2_common snd_seq_device ledtrig_audio intel_wmi_thunderbolt mxm_wmi wmi_bmof ee1004 dell_wmi_descriptor serio_raw hid mc bluetooth snd_timer processor_thermal_device_pci_legacy cfg80211 rc_core processor_thermal_device snd ecdh_generic processor_thermal_rfim ecc drm_kms_helper libarc4 soundcore processor_thermal_mbox i2c_algo_bit syscopyarea processor_thermal_rapl mei_me intel_rapl_common sysfillrect intel_pch_thermal sysimgblt mei intel_soc_dts_iosf int3403_thermal int340x_thermal_zone intel_hid int3400_thermal mac_hid sparse_keymap acpi_thermal_rel acpi_pad sch_fq_codel msr parport_pc ppdev lp ramoops parport reed_solomon pstore_blk pstore_zone drm efi_pstore ip_tables x_tables autofs4 nvme ahci nvme_core i2c_i801 alx xhci_pci crc32_pclmul psmouse i2c_smbus nvme_common mdio libahci xhci_pci_renesas video wmi
[ 140.254885] CPU: 2 PID: 1059 Comm: Xorg Tainted: P OEL 6.2.0-26-generic #26~22.04.1-Ubuntu
[ 140.254887] Hardware name: Alienware Alienware 15 R3/Alienware 15 R3, BIOS 1.10.0 07/21/2020
[ 140.254889] RIP: 0010:_nv001596kms+0x0/0x80 [nvidia_modeset]
[ 140.254924] Code: 48 48 8b 53 28 e9 e5 fd ff ff 45 31 c0 e9 e6 fc ff ff 49 c7 44 24 48 00 00 00 00 48 8b 53 28 e9 96 fd ff ff 66 0f 1f 44 00 00 <f3> 0f 1e fa 55 48 89 e5 41 55 49 89 fd 41 54 49 89 f4 53 48 8d 5f
[ 140.254925] RSP: 0018:ffffab6d43513a00 EFLAGS: 00000282
[ 140.254927] RAX: ffffffffc55edce0 RBX: ffff8aeb044a0e08 RCX: ffff8aeb14df7608
[ 140.254928] RDX: ffff8aeb0434c808 RSI: ffff8aeb0434c808 RDI: ffff8aeb044a0e08
[ 140.254929] RBP: ffffab6d43513a48 R08: 0000000000000000 R09: 0000000000000000
[ 140.254930] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8aeb0434c808
[ 140.254931] R13: ffff8aeb049c2808 R14: ffff8aeb049c2968 R15: 0000000000000000
[ 140.254933] FS:...

Read more...

no longer affects: xorg (Ubuntu)
Revision history for this message
Daniel van Vugt (vanvugt) wrote (last edit ):

In your original log, plymouthd hit the same Nvidia kernel bug:

  Aug 11 22:41:26 voldemort kernel: watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [plymouthd:322]

So I think there are two problems here:

1. A buggy Nvidia kernel driver that keeps crashing (at least on this system); and

2. An incomplete Nvidia driver install that's trying to use the wrong DDX driver for the Nvidia GPU.

In both cases, please uninstall the nvidia driver and try a different (older) one. You can do so using the 'Additional Drivers' app.

Revision history for this message
Bruce Goodwin (bgoodwin) wrote (last edit ):

Thanks for looking at this, Daniel!

I tried these things with the following results.

* switching to nvidia-driver 525, 470, and 390 (non-server variants)
   * Same behavior. - No graphics, backlight or ability to switch to another TTY
 * uninstalling nvidia packages (`apt remove --purge '*nvidia*'` and `apt autoremove`) and attempting to use the nouveau driver
   * Result: On boot, the screen was black with backlight on. loginctl reported that whatever was there was running on wayland, This is different from all the other experiments I've tried before, where it reports x11. I was unable to use the "just pretend there's a login screen there" trick described in comment #5. Dunno if that's because the wayland login screen is different or if there simply wasnt one this time.

Since completely removing all nvidia drivers and relying on the nouveau drivers also didn't give me a working login screen, i think there's something else (possibly in addition to messed up nvidia drivers) going on.

Revision history for this message
Bruce Goodwin (bgoodwin) wrote : BootLog.txt

apport information

tags: added: apport-collected ubuntu
description: updated
Revision history for this message
Bruce Goodwin (bgoodwin) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Bruce Goodwin (bgoodwin) wrote : Dependencies.txt

apport information

Revision history for this message
Bruce Goodwin (bgoodwin) wrote : DpkgLog.txt

apport information

Revision history for this message
Bruce Goodwin (bgoodwin) wrote : Lspci.txt

apport information

Revision history for this message
Bruce Goodwin (bgoodwin) wrote : Lspci-vt.txt

apport information

Revision history for this message
Bruce Goodwin (bgoodwin) wrote : Lsusb-t.txt

apport information

Revision history for this message
Bruce Goodwin (bgoodwin) wrote : Lsusb-v.txt

apport information

Revision history for this message
Bruce Goodwin (bgoodwin) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Bruce Goodwin (bgoodwin) wrote : ProcCpuinfoMinimal.txt

apport information

Revision history for this message
Bruce Goodwin (bgoodwin) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Bruce Goodwin (bgoodwin) wrote : ProcModules.txt

apport information

Revision history for this message
Bruce Goodwin (bgoodwin) wrote : UdevDb.txt

apport information

Revision history for this message
Bruce Goodwin (bgoodwin) wrote : XorgLog.txt

apport information

Revision history for this message
Bruce Goodwin (bgoodwin) wrote : XorgLogOld.txt

apport information

Revision history for this message
Bruce Goodwin (bgoodwin) wrote : acpidump.txt

apport information

Revision history for this message
Bruce Goodwin (bgoodwin) wrote : Re: Boots to black after upgrade to 22.04, Cannot use other TTY interfaces either (ctrl+alt+f2, etc)

The apport updates above were dumped on boot after removing all nvidia packages (`apt remove --purge '*nvidia*'` and `apt autoremove`) thus attempting to use the nouveau driver. The laptop booted directly to a black screen with the backlight on, and I immediately ssh-ed in and collected logs with `apport-collect`

Then i wanted to see if I'm actually able to login in this state, but the user X session is merely invisible with the procedrue:

1: Check for the login screen's session:
```
publius@voldemort:~$ loginctl list-sessions
SESSION UID USER SEAT TTY
      2 1000 publius pts/0
     c1 128 gdm seat0 tty1

2 sessions listed.
publius@voldemort:~$ loginctl show-session c1
Id=c1
User=128
Name=gdm
Timestamp=Thu 2023-08-17 18:17:43 EDT
TimestampMonotonic=14305206
VTNr=1
Seat=seat0
TTY=tty1
Remote=no
Service=gdm-launch-environment
Scope=session-c1.scope
Leader=1083
Audit=4294967295
Type=wayland
Class=greeter
Active=yes
State=active
IdleHint=yes
IdleSinceHint=1692310965050340
IdleSinceHintMonotonic=316001633
LockedHint=no
```

2: Attempted the "pretend there's a login screen that i just can't see" workaround (Comment#5).

3: Then look to see if the sessions have changed
```
publius@voldemort:~$ loginctl list-sessions
SESSION UID USER SEAT TTY
      2 1000 publius pts/0
      4 1000 publius seat0 tty2

2 sessions listed.
publius@voldemort:~$ loginctl show-session 4
Id=4
User=1000
Name=publius
Timestamp=Thu 2023-08-17 18:24:30 EDT
TimestampMonotonic=421274657
VTNr=2
Seat=seat0
TTY=tty2
Remote=no
Service=gdm-password
Scope=session-4.scope
Leader=1800
Audit=4
Type=wayland
Class=user
Active=yes
State=active
IdleHint=no
IdleSinceHint=0
IdleSinceHintMonotonic=0
LockedHint=no
```

So it looks like yes,
* There is an invisible login screen present
* I can log in at that screen
* the login screen's wayland session (owner `gdm`) is replaced with the user's (owner `publius`) wayland session.
* And neither session can put graphics on the monitor - In the complete absense of nvidia packages/drivers/kernel modules.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

If you're stuck on a black screen instead of the login screen then it's possible the system has booted to the wrong VT. To work around that try:

Ctrl+Alt+F4, Ctrl+Alt+F1, and then wait a few seconds

no longer affects: xorg (Ubuntu)
Revision history for this message
Daniel van Vugt (vanvugt) wrote (last edit ):

Also the kernel parameter 'nvidia-drm.modeset=0' is expected to break things. Please remove it or change it to nvidia-drm.modeset=1

Revision history for this message
Bruce Goodwin (bgoodwin) wrote (last edit ):

Pretty sure it's booting to the VT with the login screen because the login screen is at least receiving keyboard input without switching VTs. (I can log in using the "pretend there's a login screen there, even though it is not displayed" workaround from comment #5) but i did the experiment anyway to be sure. Never assume anything, right?

So I just tried switching around between VTs on boot (while the login was invisible on VT1). Here are the results:

With 'nvidia-drm.modeset=1' : Booted to a black screen with no backlight. I tried VTs 2-12, going back to VT1 in-between. backlight was off on all VTs. I sshed in and plymouthd was blocked on nvidia_modeset so apparently i didn't even get as far as the login screen.

With 'nvidia-drm.modeset=0' : Booted to a black screen with the backlight on. I tried VTs 2-12, going back to VT1 in-between. On VTs 2-12 I can see the tail end of the kernel log, but it is not a usable terminal. There's no login prompt and nothing shows up on typing. Each time I went back to VT1 there was just blackness (backlight stayed on).

Regarding `nvidia-drm.modeset`; I'm happy to switch to 'nvidia-drm.modeset=1` for individual experiments. However, 'nvidia-drm.modeset=0', is the only way I've found so far to get to a working logged-in x session (this gets me no login screen displayed but I can log in and the logged-in x session displays fine). If I use 'nvidia-drm.modeset=1' (or don't specify `nvidia-drm.modeset` at all) I get the soft lockups blocked on `nvidia_modeset` either on plymouthd or the logged-in x session.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The laptop screen should be managed by the Intel GPU so I suspect 'nvidia-drm.modeset=1' should give you a working login screen if only we can get the plymouthd problem out the way. To do that either try:

 * Booting with the 'nosplash' kernel parameter; and/or
 * Booting with no external monitors connected.

Please also provide a current log file:

  journalctl -b0 > journal.txt

because the latest ones attached are at least 4 days old.

Changed in nvidia-graphics-drivers-535 (Ubuntu):
status: New → Incomplete
Revision history for this message
Bruce Goodwin (bgoodwin) wrote :

Methodology note: all of my tests have been with no external monitor plugged in.

OH yes, I can get usually get a working *login* screen with 'nvidia-drm.modeset=1' and nvidia drivers version 535. Occasionally plymouthd will get blocked on nvidia_modeset. Usually I get past that, and the login screen appears. But once I login, the logged-in x session hangs on nvidia_modeset. (why not the login screen, too??! idunno!)

" * Booting with the 'nosplash' kernel parameter..."

AHH! I'll try that right now and report back shortly, with a journalctl output...

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :

No change after replacing `splash` with `nosplash`,

Here's a journalctl output...

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It appears the laptop screen is currently wired to the Nvidia GPU, so that's the first problem. But it's probably switchable (a "hardware mux"). Please look in the BIOS for settings that allow the Intel GPU (iGPU / integrated GPU) to be the primary GPU instead of Nvidia.

The second issue seems to be that the Nvidia driver is rejecting the laptop screen's supported modes:

  (WW) NVIDIA(GPU-0): Mode "1920x1080_60" is invalid.

But that too should be solved by making the Intel GPU the default GPU in the BIOS.

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :

I can't find a way to change the primary GPU in the bios of this machine (I also updated it just now to see if that feature was added but nope.) Since this laptop dual-boots windows, I checked the win nvidia control panel which should have a way to do it on devices that support it. But the option was not there.

I tried using the prime-select tool installed with the nvidia drivers `prime-select intel`. On reboot, every VT was the un-loginable tail-end of the kernel log. gnome login never displayed. I've attached a journalctl log.

As for the 1920x1080_60 screen mode. That's weird. The specs say that both the gpu and display support 60hz, but the displayport connection failed a bandwidth check? Apparently this is systemic. When logged in to a gnome session (comment #5 workaround) and in windows, this display is running at 48hz and there's no option to change it. Even at lower resolutions.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Some googling suggests:

1. Press Fn + F7

2. Reboot.

If the problem is still not fixed after that then it may be because you have a G-Sync panel. Some people report such models cannot be switched to use the integrated GPU but I would still check Nvidia Settings to see if you can disable G-Sync. Overall this is a Dell hardware design issue, and also an Nvidia driver bug for not offering better support for this hardware arrangement.

Changed in nvidia-graphics-drivers-535 (Ubuntu):
status: Incomplete → Triaged
importance: Undecided → Medium
summary: - Boots to black after upgrade to 22.04, Cannot use other TTY interfaces
- either (ctrl+alt+f2, etc)
+ Alienware 15 R3 boots to black after upgrade to 22.04, Cannot use other
+ TTY interfaces either (ctrl+alt+f2, etc)
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Does the BIOS let you disable G-Sync?

Revision history for this message
Bruce Goodwin (bgoodwin) wrote (last edit ):

```
>> Some googling suggests:
>>
>> 1. Press Fn + F7
>>
>> 2. Reboot.

Yep, I found that one too. It doesn't seem to do anything (tried in linux and windows)

>> Does the BIOS let you disable G-Sync?

I haven't noticed that but i'll look right now - and also i'll check the win Nvidia control panel...

>> ...and also an Nvidia driver bug...

I'll see if I can make an Nvidia bug, too. If so i'll post it here.

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :

>> Does the BIOS let you disable G-Sync?

Nope. I can't find any video settings at all in the bios on this machine. It has a system info listing that shows the intel and nvidia gpus. But i can't change anything about graphics.

I tried disabling G-sync in both windows and linux nvidia control panels, but i'm getting the same behavior :(

I thought it could be helpful to grab `journalctl -b0` output for 2 logins:
* Using proprietary nvidia v535 drivers, with drm *en*abled, captured after a full login (the login screen displayed, the x session blocked on nvidia_modeset
* Using proprietary nvidia v535 drivers, with drm *dis*abled, captured after a full login (the login screen didn't display, used the comment#5 hack to log in -- x session displayed fine.)

The only thing I've spotted so far in these logs is that the nvidia driver seems to complete its setup a little earlier with drm enabled-- don't notice any explanation for why the user x session hangs blocked on nvidia_modeset in this configuration, when the gdm3 x session doesn't :(

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :

(and the no-drm variant)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

I don't see "soft lockup" happening anymore in those logs. Also 'Mode "1920x1080_48" is valid.' should be enough to give you working graphics like it is in Windows. So I don't think the black screen part of this bug is something we can blame on Nvidia. More likely I think the issue is Xorg/gnome-shell/mutter preferring to use the first graphics card /dev/dri/card0 which is Intel and not connected to the display. Hence the black screen...

As a workaround please try adding kernel parameter: i915.modeset=0

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :

>> I don't see "soft lockup" happening anymore in those logs

Hmmm. I probably didn't wait long enough for the soft lockup messages to start rolling in. I just did the capture a few moments after the fans went nuts. Here's a new capture where i monitored dmesg until the soft lockup messages began and then ran journalct; :)

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :

>> As a workaround please try adding kernel parameter: i915.modeset=0

No change in user-visible behavior. Here's journalctl captures of the same scenarios, but with the additional i915.modeset=0 kernel param:

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :

And the nodrm variant

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :

Opened an nvidia dev forum thread, closest thing I could find to opening a bug.

https://forums.developer.nvidia.com/t/x-hangs-blocked-on-nvidia-modeset-in-ubuntu-22-02-nvidia-drivers-535/264035

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

It's the same kernel bug again crashing in nvidia_modeset:

Aug 22 11:13:19 voldemort kernel: watchdog: BUG: soft lockup - CPU#7 stuck for 26s! [Xorg:1027]

But i915.modeset=0 appears to be working as intended -- the unused Intel GPU has been hidden (not given a driver) so please keep using i915.modeset=0

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Please try Nvidia driver 525 again, with i915.modeset=0

I'm hoping that will avoid both causes of black screens:
 * 525 might avoid the kernel crashes
 * i915.modeset=0 will avoid the unusable GPU problem

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :

Same behavior with 525/i915.modeset=0, still getting the nvidia_modeset hang with drm enabled

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :

and the nvidia-drm.modeset=0 scenario

Revision history for this message
Bruce Goodwin (bgoodwin) wrote (last edit ):

OH! I ended up updating to the LTS kernel 6.4.12 since nvidia folks didn't wanna bother with anything as old as 6.2, and ...
**UPDATE**
I've rebooted again and now the behavior is the same, except that i can switch to other VTs. i'm getting hangs on nvidia_modeset again with 6.4.12/535

**ORIGINAL/DEPRECATED**
For a few reboots after updating the kernel i got a different error in the logs instead of the nvidia_modeset hang. But I can't repro that any more. I don't think i changed anything other than rebooting

```
[drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to grab modeset ownership

```

 but I almost always get no splash -> a login screen -> logged in x-session. "Almost" because on one rebbot there was some kind of pegged-cpu situation before the login screen appeared, and it was so hosed I couldn't even ssh in to grab logs. It hasn't repro'd again yet.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Interesting that mode "1920x1080_60" becomes valid when you have nvidia-drm.modeset=0. This just means the KMS part of the Nvidia driver is missing G-Sync support. Actually that's probably to be expected that KMS can't support G-Sync. So this tells me for your particular laptop these are mandatory:

  nvidia-drm.modeset=0 i915.modeset=0

or could be simplified now as just:

  nomodeset

Can you provide a log from the working 6.4 kernel?

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

The "Failed to grab modeset ownership" message is tracked in bug 1963805 but won't happen if you stick to the above kernel parameters.

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

In summary, I think you need a newer kernel AND 'nvidia-drm.modeset=0 i915.modeset=0'. Or a newer kernel and 'nomodeset'.

Revision history for this message
Bruce Goodwin (bgoodwin) wrote :

I did happen to capture a nvidia-bug-report.sh log on one occasion where i got the `Failed to grab modeset ownership` error (and in that case both the login and user X sessions worked!)

Attached ;)

Revision history for this message
Daniel van Vugt (vanvugt) wrote :

Since your system works best without KMS "modeset" support you should keep it disabled using:

  nvidia-drm.modeset=0 i915.modeset=0

or

  nomodeset

and then "Failed to grab modeset ownership" will never happen. The log file attached in comment #53 shows nvidia-drm.modeset is incorrectly set to 1 when it should be 0.

To post a comment you must log in.