[nvidia] GPU has fallen off the bus

Bug #2023585 reported by Umayr Saghir
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
Undecided
Unassigned
nvidia-graphics-drivers-525 (Ubuntu)
New
Undecided
Unassigned

Bug Description

When playing Assassins Creed Unity through Steam, the game will run fine for a short period and then pretty quickly in my experience the screen will go blank, lights on the GPU will turn off and GPU fans will spin at max RPM.

I checked the dmesg logs from that session and saw at the bottom:

```
Jun 12 19:25:09 pikachu kernel: NVRM: GPU at PCI:0000:0b:00: GPU-f888943b-327b-82af-03dd-7c4213dc4788
Jun 12 19:25:09 pikachu kernel: NVRM: Xid (PCI:0000:0b:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Jun 12 19:25:09 pikachu kernel: NVRM: GPU 0000:0b:00.0: GPU has fallen off the bus.
Jun 12 19:25:09 pikachu kernel: nvidia-gpu 0000:0b:00.3: Unable to change power state from D3hot to D0, device inaccessible
Jun 12 19:25:09 pikachu kernel: xhci_hcd 0000:0b:00.2: Unable to change power state from D3hot to D0, device inaccessible
Jun 12 19:25:09 pikachu kernel: xhci_hcd 0000:0b:00.2: Unable to change power state from D3cold to D0, device inaccessible
Jun 12 19:25:09 pikachu kernel: xhci_hcd 0000:0b:00.2: Controller not ready at resume -19
Jun 12 19:25:09 pikachu kernel: xhci_hcd 0000:0b:00.2: PCI post-resume error -19!
Jun 12 19:25:09 pikachu kernel: xhci_hcd 0000:0b:00.2: HC died; cleaning up
Jun 12 19:25:09 pikachu kernel: audit: type=1400 audit(1686594309.980:429): apparmor="DENIED" operation="open" class="file" profile="snap.keepassxc.keepassxc" name="/sys/devices/pci00>
Jun 12 19:25:10 pikachu kernel: nvidia-gpu 0000:0b:00.3: i2c timeout error ffffffff
Jun 12 19:25:10 pikachu kernel: ucsi_ccg 0-0008: i2c_transfer failed -110
```

Further up in the logs I also see the following (in case it's related):

```
[drm:nv_drm_master_set [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000b00] Failed to grab modeset ownership
```

I am using an RTX 2080Ti on driver version 525.105.17.

I have attached the full dmesg log

ProblemType: Bug
DistroRelease: Ubuntu 23.04
Package: nvidia-driver-525 525.105.17-0ubuntu1
ProcVersionSignature: Ubuntu 6.2.0-20.20-generic 6.2.6
Uname: Linux 6.2.0-20-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair nvidia_modeset nvidia
ApportVersion: 2.26.1-0ubuntu2
Architecture: amd64
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
Date: Mon Jun 12 19:35:37 2023
InstallationDate: Installed on 2022-12-06 (187 days ago)
InstallationMedia: Ubuntu 22.10 "Kinetic Kudu" - Release amd64 (20221020)
SourcePackage: nvidia-graphics-drivers-525
UpgradeStatus: Upgraded to lunar on 2023-04-21 (51 days ago)

Revision history for this message
Umayr Saghir (nightmayr) wrote :
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

From what I can tell, this is most likely to be a hardware issue like:

 * The graphics card not making clean contact with the slot.
 * Hardware failure of the motherboard.
 * Hardware failure of the graphics card.

But a bit of googling suggests that other people encountering the same error message over the years have sometimes been able to avoid it by tweaking kernel/driver parameters.

The message "Failed to grab modeset ownership" is unrelated and can be ignored here.

tags: added: nvidia
summary: - GPU has fallen off the bus
+ [nvidia] GPU has fallen off the bus
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 2023585

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Daniel van Vugt (vanvugt) wrote :

See also bug 2028199.

Revision history for this message
Umayr Saghir (nightmayr) wrote :

I decided to disconnect the Nvidia GPU and use a spare AMD GPU and haven't had this occur since. I guess you're probably right that it's a hardware issue but don't know whether the fault was with the GPU or the Motherboard. At the time this was happening I had the Nvidia GPU in the first slot and the AMD GPU in the second PCIe slot but without any power cables running to it from the PSU, so essentially it was off and not being picked up by Ubuntu.

I don't know if the motherboard would be at fault or not in the scenario above if the second slot has a device plugged in but not powered? Initially it looks like the fault of the Nvidia GPU but I haven't tested it in other configurations to definitively say it's a fault with the Nvidia GPU.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.