nvidia driver issues

Bug #1822682 reported by Frank on 2019-04-01
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux-firmware (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Unassigned
Cosmic
Undecided
Unassigned
linux-signed-hwe (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Unassigned
Cosmic
Undecided
Unassigned

Bug Description

Description: Ubuntu 18.04.2 LTS
Release: 18.04
Kernel: Linux 4.18.0-15-generic
Intel Core i7-7700HQ + GeForce 1050Ti
after: sudo update-initramfs -u (i'm using this command after installing fresh Ubuntu because it can't reboot correctly)
I get :
update-initramfs: Generating /boot/initrd.img-4.18.0-16-generic
W: Possible missing firmware /lib/firmware/nvidia/gv100/sec2/sig.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/sec2/image.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/sec2/desc.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/nvdec/scrubber.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/gr/sw_method_init.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/gr/sw_bundle_init.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/gr/sw_nonctx.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/gr/sw_ctx.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/gr/gpccs_sig.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/gr/gpccs_data.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/gr/gpccs_inst.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/gr/gpccs_bl.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/gr/fecs_sig.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/gr/fecs_data.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/gr/fecs_inst.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/gr/fecs_bl.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/acr/ucode_unload.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/acr/ucode_load.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/acr/unload_bl.bin for module nouveau
W: Possible missing firmware /lib/firmware/nvidia/gv100/acr/bl.bin for module nouveau

Had no such trouble on Ubuntu 18.04.01 with older kernel

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: linux-image-4.18.0-15-generic 4.18.0-15.16~18.04.1
ProcVersionSignature: Ubuntu 4.18.0-15.16~18.04.1-generic 4.18.20
Uname: Linux 4.18.0-15-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.6
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Tue Apr 2 00:18:33 2019
InstallationDate: Installed on 2019-04-01 (0 days ago)
InstallationMedia: Ubuntu 18.04.2 LTS "Bionic Beaver" - Release amd64 (20190210)
SourcePackage: linux-signed-hwe
UpgradeStatus: No upgrade log present (probably fresh install)

Frank (alittlebit) wrote :
Frank (alittlebit) wrote :

Adding to it i had warning about missing some nvidia files in nouveau.
It's no problem on Ubuntu 18.04.01 with Kernel 4.15, and after update to Ubuntu 18.04.02 (with the same kernel) it's still no problem.

Anthony Wong (anthonywong) wrote :

It is because of commit in 4.18 kernel:

commit d521097f58bdfdc9966b8d10754074c8524133dd
Author: Ben Skeggs <email address hidden>
AuthorDate: Tue May 8 20:39:48 2018 +1000
Commit: Ben Skeggs <email address hidden>
CommitDate: Fri May 18 15:01:47 2018 +1000

    drm/nouveau/gr/gv100: initial support

    Signed-off-by: Ben Skeggs <email address hidden>

But the messages you got are warnings. What problems do you have, merely warning messages or are there any video issues?
What nvidia gpu do you have, can you run 'lspci -vvnn' and attach the result?

Anthony Wong (anthonywong) wrote :

I see you mentioned GeForce 1050Ti in your bug. The commit I posted above is for GV100 "Volta". So the warning messages shouldn't affect you.
That said, it might be a good idea to add the relevant firmwares to Bionic.

Changed in linux-firmware (Ubuntu):
status: New → Invalid
Frank (alittlebit) wrote :

The problem appears when i start CS:GO in Steam, i got error smth about GLXContext and the game didn't start.
Drivers installed for nvidia were the same as i was installing in previous Ubuntu version (from default Software & Updates -> Additional Drivers (version 390)).

Sometimes after rebooting i got messages about CPU tasks:
rcu_sched self-detected stall on CPU, like this https://askubuntu.com/questions/1114612/rcu-sched-self-detected-stall-on-cpu-watchdog-bug-soft-lockup-cpu3-stuck

I had encrypted system, and after 'sudo update-initramfs -u' after that warning about missing i got warning like these:
W: initramfs-tools configuration sets RESUME=LABEL=swap
| W: but no matching swap device is available.
| I: The initramfs will attempt to resume from /dev/sdb1
| I: (UUID=4dbd12ed-75ac-445e-933a-93df34314795)
| I: Set the RESUME variable to override this.

But it can be fixed by: sudo swapoff -a

Frank (alittlebit) wrote :

After updating kernel to 4.18.5 it's no more warnings and rcu_sched errors, but when i launch csgo from steam i get the same error:
Failed to create GL context; Could not make GL context current: GLXBadContext

Frank (alittlebit) wrote :
Download full text (13.2 KiB)

lspci -vvnn
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [8086:5910] (rev 05)
 Subsystem: Dell Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [1028:07fa]
 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort+ <MAbort+ >SERR- <PERR- INTx-
 Latency: 0
 Capabilities: <access denied>

00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05) (prog-if 00 [Normal decode])
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0
 Interrupt: pin A routed to IRQ 16
 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
 I/O behind bridge: 0000e000-0000efff
 Memory behind bridge: dc000000-dd0fffff
 Prefetchable memory behind bridge: 00000000b0000000-00000000c1ffffff
 Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
 BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
  PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
 Capabilities: <access denied>
 Kernel driver in use: pcieport

00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:591b] (rev 04) (prog-if 00 [VGA controller])
 Subsystem: Dell Device [1028:07fa]
 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
 Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Latency: 0
 Interrupt: pin A routed to IRQ 130
 Region 0: Memory at db000000 (64-bit, non-prefetchable) [size=16M]
 Region 2: Memory at 70000000 (64-bit, prefetchable) [size=256M]
 Region 4: I/O ports at f000 [size=64]
 [virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
 Capabilities: <access denied>
 Kernel driver in use: i915
 Kernel modules: i915

00:04.0 Signal processing controller [1180]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:1903] (rev 05)
 Subsystem: Dell Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [1028:07fa]
 Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
 Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
 Interrupt: pin A routed to IRQ 16
 Region 0: Memory at dd320000 (64-bit, non-prefetchable) [size=32K]
 Capabilities: <access denied>
 Kernel driver in use: proc_thermal
 Kernel modules: processor_thermal_device

00:14.0 USB controller [0c03]: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [8086:a12f] (rev 31) (prog-if 30 [XHCI])
 Subsystem: Dell Sunrise Point-H USB 3.0 xHCI Controller [1028:07fa]
 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
 Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <...

Frank (alittlebit) wrote :

Sometimes after reboot system freezing and i can do nothing excepting hard-reboot.
Same issue on Manjaro Linux after nvidia drivers installing:
Kernel 4.20.17-1
from logs:
[TTM] Buffer eviction failed
nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]

Frank (alittlebit) wrote :

fixed by downloading linux firmware from https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/
then after copy files from nvidia folder to /lib/firmware/nvidia/ everything become work.
Small story about "warninings" not even relating my videocard killing everything around.
nvidia drivers become work, rebooting as should to, no errors i got yet. This proves it was kernel issue. hope it's done.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-firmware (Ubuntu Bionic):
status: New → Confirmed
Changed in linux-firmware (Ubuntu Cosmic):
status: New → Confirmed
Changed in linux-signed-hwe (Ubuntu Bionic):
status: New → Confirmed
Changed in linux-signed-hwe (Ubuntu Cosmic):
status: New → Confirmed
Changed in linux-signed-hwe (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers