[NVIDIA] Monitor(s) black out and session freezes. "NVRM: GPU at 0000:01:00.0 has fallen off the bus."

Bug #882710 reported by palewire on 2011-10-27
134
This bug affects 27 people
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers (Ubuntu)
High
Alberto Milone

Bug Description

I run Ubuntu 11.10 on a Dell desktop with a dual use video card that goes into two monitors. They normally work great and I have no complaints. When I first upgraded to 11.10, I would occasionally have both monitors randomly black out, and I'd have to restart the computer to get them back. Though that problem seemed to go away, and I wrote it off to package upgrades that fixed a bug.

Then, after upgraded unity and a bunch of other packages today it started it again. Three times I've had it black out this morning. I don't know why. Though I often use hotkeys to slide between workspaces, and one of the blackouts happened during a slide.

[Next Actions]
* [tseliot] Raise issue with NVIDIA
* [Unity Engineering] Evaluate if twinview could be triggering a bug in unity?
* Attempt to reproduce the bug outside Unity to see if it can be proved to not be Unity-specific
* Identify if there's a way to reproduce the bug deliberately

---
.proc.driver.nvidia.gpus.0: Error: [Errno 21] Is a directory: '/proc/driver/nvidia/gpus/0'
.proc.driver.nvidia.registry: Binary: ""
.proc.driver.nvidia.version:
 NVRM version: NVIDIA UNIX x86 Kernel Module 280.13 Wed Jul 27 16:55:43 PDT 2011
 GCC version: gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3)
.tmp.unity.support.test.0:

ApportVersion: 1.23-0ubuntu3
Architecture: i386
CompizPlugins: [core,bailer,detection,composite,opengl,compiztoolbox,decor,regex,mousepoll,vpswitch,animation,grid,snap,place,resize,session,gnomecompat,move,imgpng,unitymtgrabhandles,wall,fade,workarounds,expo,ezoom,scale,unityshell]
CompositorRunning: compiz
DistUpgraded: Log time: 2011-10-13 15:00:25.754620
DistroCodename: oneiric
DistroRelease: Ubuntu 11.10
DistroVariant: ubuntu
DkmsStatus:
 nvidia-current, 280.13, 2.6.38-11-generic, i686: installed
 nvidia-current, 280.13, 3.0.0-12-generic, i686: installed
 nvidia-current-updates, 280.13, 3.0.0-12-generic, i686: installed
GraphicsCard:
 nVidia Corporation GT218 [GeForce 210] [10de:0a65] (rev a2) (prog-if 00 [VGA controller])
   Subsystem: eVga.com. Corp. Device [3842:1310]
InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Release i386 (20110427.1)
JockeyStatus:
 xorg:nvidia_current - NVIDIA accelerated graphics driver (Proprietary, Disabled, Not in use)
 xorg:nvidia_current_updates - NVIDIA accelerated graphics driver (post-release updates) (Proprietary, Enabled, In use)
MachineType: Dell Inc. OptiPlex 745
NonfreeKernelModules: nvidia
Package: unity 4.24.0-0ubuntu2b1
PackageArchitecture: i386
ProcEnviron:
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.0.0-12-generic root=UUID=7130e4f2-1020-48a0-807c-882bee26d3c3 ro quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 3.0.0-12.20-generic 3.0.4
Tags: oneiric running-unity oneiric running-unity oneiric running-unity ubuntu regression-update compiz-0.9
Uname: Linux 3.0.0-12-generic i686
UpgradeStatus: Upgraded to oneiric on 2011-10-14 (13 days ago)
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare
XorgLogOld:

dmi.bios.date: 08/12/2008
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.6.2
dmi.board.name: 0RF705
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 3
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr2.6.2:bd08/12/2008:svnDellInc.:pnOptiPlex745:pvr:rvnDellInc.:rn0RF705:rvr:cvnDellInc.:ct3:cvr:
dmi.product.name: OptiPlex 745
dmi.sys.vendor: Dell Inc.
version.compiz: compiz 1:0.9.6+bzr20110929-0ubuntu5
version.libdrm2: libdrm2 2.4.26-1ubuntu1
version.libgl1-mesa-dri: libgl1-mesa-dri 7.11-0ubuntu3
version.libgl1-mesa-dri-experimental: libgl1-mesa-dri-experimental N/A
version.libgl1-mesa-glx: libgl1-mesa-glx 7.11-0ubuntu3
version.nvidia-graphics-drivers: nvidia-graphics-drivers N/A
version.xserver-xorg: xserver-xorg 1:7.6+7ubuntu7
version.xserver-xorg-input-evdev: xserver-xorg-input-evdev 1:2.6.0-1ubuntu13
version.xserver-xorg-video-ati: xserver-xorg-video-ati 1:6.14.99~git20110811.g93fc084-0ubuntu1
version.xserver-xorg-video-intel: xserver-xorg-video-intel 2:2.15.901-1ubuntu2
version.xserver-xorg-video-nouveau: xserver-xorg-video-nouveau 1:0.0.16+git20110411+8378443-1

palewire (ben-welsh) wrote :

Here's what I see in my kernal log around the time of crash. Seems very similar to what's reported in this thread: http://forums.nvidia.com/index.php?showtopic=209151

Oct 27 13:12:49 conrad kernel: [ 557.793689] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Oct 27 13:12:49 conrad kernel: [ 557.839448] hda-intel: spurious response 0x0:0x0, last cmd=0x470e00
Oct 27 13:12:49 conrad kernel: [ 557.839454] hda-intel: spurious response 0x9000094:0x0, last cmd=0x470e00
Oct 27 13:12:49 conrad kernel: [ 557.839457] hda-intel: spurious response 0x0:0x0, last cmd=0x470e00
Oct 27 13:12:49 conrad kernel: [ 557.839460] hda-intel: spurious response 0x0:0x0, last cmd=0x470e00
Oct 27 13:12:49 conrad kernel: [ 557.839463] hda-intel: spurious response 0x0:0x0, last cmd=0x470e00
Oct 27 13:12:49 conrad kernel: [ 557.839466] hda-intel: spurious response 0x0:0x1, last cmd=0x10420000
Oct 27 13:12:49 conrad kernel: [ 557.839469] hda-intel: spurious response 0x0:0x1, last cmd=0x10420000
Oct 27 13:12:49 conrad kernel: [ 557.839473] hda-intel: spurious response 0x9000094:0x1, last cmd=0x10420000
Oct 27 13:12:49 conrad kernel: [ 557.839476] hda-intel: spurious response 0x0:0x1, last cmd=0x10420000
Oct 27 13:12:49 conrad kernel: [ 557.839479] hda-intel: spurious response 0x0:0x1, last cmd=0x10420000
Oct 27 13:12:49 conrad kernel: [ 557.839482] hda-intel: spurious response 0x0:0x1, last cmd=0x10420000
Oct 27 13:12:49 conrad kernel: [ 557.839485] hda-intel: spurious response 0x0:0x2, last cmd=0x20420000
Oct 27 13:12:49 conrad kernel: [ 557.839488] hda-intel: spurious response 0x0:0x2, last cmd=0x20420000
Oct 27 13:12:49 conrad kernel: [ 557.839491] hda-intel: spurious response 0x9000094:0x2, last cmd=0x20420000
Oct 27 13:12:49 conrad kernel: [ 557.839494] hda-intel: spurious response 0x0:0x2, last cmd=0x20420000
Oct 27 13:12:49 conrad kernel: [ 557.839497] hda-intel: spurious response 0x0:0x2, last cmd=0x20420000
Oct 27 13:12:49 conrad kernel: [ 557.839500] hda-intel: spurious response 0x0:0x2, last cmd=0x20420000
Oct 27 13:12:49 conrad kernel: [ 557.839503] hda-intel: spurious response 0x0:0x3, last cmd=0x30420000
Oct 27 13:12:49 conrad kernel: [ 557.839506] hda-intel: spurious response 0x0:0x3, last cmd=0x30420000
Oct 27 13:12:49 conrad kernel: [ 557.839509] hda-intel: spurious response 0x9000094:0x3, last cmd=0x30420000
Oct 27 13:12:49 conrad kernel: [ 557.839512] hda-intel: spurious response 0x0:0x3, last cmd=0x30420000
Oct 27 13:12:49 conrad kernel: [ 557.839515] hda-intel: spurious response 0x0:0x3, last cmd=0x30420000
Oct 27 13:12:49 conrad kernel: [ 557.839518] hda-intel: spurious response 0x0:0x3, last cmd=0x30420000
Oct 27 13:12:49 conrad kernel: [ 557.839521] hda-intel: spurious response 0x0:0x0, last cmd=0x470e00

Da Shroom (9n-georgm-bc) wrote :

Thank you for your bug report, please could you run

apport-collect 882710

in a terminal, to provide the developers with additional information, however, this looks like an issue with a propriety driver, so there may be little that can be done.

apport information

tags: added: apport-collected compiz-0.9 oneiric regression-update running-unity ubuntu
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

I added all that, but one thing I should note is that before doing that I switched around my "Additional Drivers" setting from NVIDIA accelerated graphics driver (version current) to (post-release updates)

palewire (ben-welsh) wrote :

So it was on current at the time of the error.

Da Shroom (9n-georgm-bc) wrote :

Ok, Thanks for the information, I will now confirm the bug because of the added information.

Changed in jockey:
status: New → Confirmed
Da Shroom (9n-georgm-bc) wrote :

I have added it to Jockey, as that seems the most likely cause of this bug (if it isn't in the proprietary drivers)
Thanks for your help with this bug.

palewire (ben-welsh) wrote :

Thanks. If there's anything I can do, let me know. And FYI: The screen blacked out again, later in the day, after I had made the proprietary driver switch. So I don't think it's fixed. I don't think it's the workspaces either. This time it happened when I was running a long database loading script from my terminal

Da Shroom (9n-georgm-bc) wrote :

Thanks for your cooperation, whilst this bug is waiting for a developer to join this bug, would you mind making a log of when it happens and happens and what you are running.

Thanks for you assistance

Martin Pitt (pitti) wrote :

Jockey installs graphics drivers, but it's not at all involved in the actual driver code or operation.

Changed in jockey:
status: Confirmed → Invalid
palewire (ben-welsh) wrote :

Just had it black out on me minutes ago. Immediately after I pushed "SEND" on a Gmail email in Firefox, it just blacked out and went down.

palewire (ben-welsh) wrote :

Just happened again. I was writing a harmless IM in Empathy and crappppped out.

palewire (ben-welsh) wrote :

Just happened again, immediately after opening a new tab in Firefox. Both screens blank. Audio still plays briefly, and then it blacks out.

Da Shroom (9n-georgm-bc) wrote :

Thanks for keeping the log going :-)

palewire (ben-welsh) wrote :

Happened again minutes ago. I was listening to a podcast in Banshee and writing some code in gEdit. Byobu was also open. I reloaded a tab in Firefox, one of about five or six open, and as the page loaded the screens blacked out. The sound kept playing but the system was unresponsive until I restarted.

palewire (ben-welsh) wrote :

Just crapped out again. I had four or five tabs open in Firefox and I had just started watching Gloria Allred's press conference on Herman Cain allegations here http://www.washingtonpost.com/blogs/election-2012/post/herman-cain-harassment-accuser-holds-press-conference-with-attorney-gloria-allred/2011/11/07/gIQAAptevM_blog.html?tid=sm_twitter_washingtonpost

After about 2 minutes, it blacked out.

palewire (ben-welsh) wrote :

I rebooted after that crash. Started up Firefox again. Had this page open in one tab, and Rdio.com playing Miles Davis in the other. Blacked out after a minute or two. Starting to make me wonder if the problems are linked to media streaming through the browser. But I don't have any technical evidence. Just what I'm observing as a user at the time of crash.

palewire (ben-welsh) wrote :

This has happened three times in a couple hours this morning, and it's making my computer virtually unusable. Is there anything I can do? I guess I'll just have to go back to one monitor if you're unable to fix this problem.

Da Shroom (9n-georgm-bc) wrote :

Sorry, bugs like this have a lot of reading, and the drivers are proprietary.

Can you give me your graphics card and RAM amount ?

Also, could you try going to the nvidia settings and re-saving it to the xorg configuration file and then upload that aswell. And do you know if running the second monitor as a separate x session works ?

Thanks for your patience with this bug.

palewire (ben-welsh) wrote :

Here's the xorg file after a fresh save.

Section "ServerLayout"

        # Keyboard settings are now read from /etc/default/console-setup
        # InputDevice "Keyboard0" "CoreKeyboard"
        # commented out by update-manager, HAL is now used and auto-detects devices
        # Keyboard settings are now read from /etc/default/console-setup
        # InputDevice "Mouse0" "CorePointer"
    Identifier "Layout0"
    Screen 0 "Screen0" 0 0
    InputDevice "Keyboard0" "CoreKeyboard"
    InputDevice "Mouse0" "CorePointer"
    Option "Xinerama" "0"
        # commented out by update-manager, HAL is now used and auto-detects devices
EndSection

Section "InputDevice"
    # generated from default
    Identifier "Keyboard0"
    Driver "kbd"
EndSection

Section "InputDevice"
    # generated from default
    Identifier "Mouse0"
    Driver "mouse"
    Option "Protocol" "auto"
    Option "Device" "/dev/psaux"
    Option "Emulate3Buttons" "no"
    Option "ZAxisMapping" "4 5"
EndSection

Section "Monitor"
    Identifier "Monitor0"
    VendorName "Unknown"
    ModelName "DELL 1907FP"
    HorizSync 30.0 - 81.0
    VertRefresh 56.0 - 76.0
    Option "DPMS"
        # HorizSync source: edid, VertRefresh source: edid
EndSection

Section "Device"
    Identifier "Device0"
    Driver "nvidia"
    VendorName "NVIDIA Corporation"
    BoardName "GeForce 210"
    Option "NoLogo" "True"
EndSection

Section "Screen"
    Identifier "Screen0"
    Device "Device0"
    Monitor "Monitor0"
    DefaultDepth 24
    Option "TwinView" "1"
    Option "TwinViewXineramaInfoOrder" "DFP-0"
    Option "metamodes" "CRT: nvidia-auto-select +1680+0, DFP: nvidia-auto-select +0+0"
    SubSection "Display"
        Depth 24
    EndSubSection
EndSection

palewire (ben-welsh) wrote :

Here's the graphics card info from lspci:

01:00.0 VGA compatible controller: nVidia Corporation GT218 [GeForce 210] (rev a2) (prog-if 00 [VGA controller])
        Subsystem: eVga.com. Corp. Device 1310
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Memory at d0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at dc80 [size=128]
        [virtual] Expansion ROM at fea00000 [disabled] [size=512K]
        Capabilities: <access denied>
        Kernel driver in use: nvidia
        Kernel modules: nvidia_current_updates, nvidia_current, nouveau, nvidiafb

01:00.1 Audio device: nVidia Corporation High Definition Audio Controller (rev a1)
        Subsystem: eVga.com. Corp. Device 1310
        Flags: bus master, fast devsel, latency 0, IRQ 17
        Memory at fe9fc000 (32-bit, non-prefetchable) [size=16K]
        Capabilities: <access denied>
        Kernel driver in use: HDA Intel
        Kernel modules: snd-hda-intel

palewire (ben-welsh) wrote :

This has happened to me four or five more times now. It's happened using Firefox, using Chromium, just using Git in my terminal. The black out is still first thing that happens. And sound will continue playing.

How would I two separate X servers?

Da Shroom (9n-georgm-bc) wrote :

If you go to the Nvidia control panel, find the bit where you control the dual monitor config. There will be a drop down menu with twinview currently selected, there is an option saying "separate X session". Can you try this ?

Also, just another thought, when the screen dies, can you still put the mouse on it/drag windows onto it (like if you had the 2 monitors set up but with one unplugged) or does it behave like you've only ever had one screen ever (can't move mouse off the screen)

Thanks again.

Da Shroom (9n-georgm-bc) wrote :

sorry, clearly I'm not with it, both monitors black out you said (severe lack of sleep :-) )

Can you start a virtual terminal after it's blacked out ?

(ctrl-alt-f1)

Thanks.

palewire (ben-welsh) wrote :

Sadly, separate X server was a disaster. The smaller VGA monitor on the right looked normal. The wider DVI monitor showed only a black background and my cursor when I moved over. But no windows or other activity.

I turned off the computer, unplugged the DVI monitor, and restarted. Now, my single monitor is screwed up. I can see the background after logging in, but no unity bar or top nav. Basically, it's busted. Not sure how to fix it, but clearly something is messed in the NVIDIA settings and xorg.conf.

FWIW, I worked for several days with only a single VGA monitor and had no problems. I tried a single DVI monitor and had that crash. Not sure what the means. Also, I have two other machines I watch that have dual screen 11.10 installed. The other two are not crashing randomly this is one is, but the xorg process in top is often running near 100% CPU. That happened on this busted machine as well, and when I had it down to one monitor the xorg was very tame.

So I've got this bug, the new problem introduced by the fix, and the high CPU I've seen on the other machines. IMHO, there is probably something screwed up with how 11.10 and Unity are dealing with dual monitors. It's sad, because I love the new interface. Unfortunately, it's hard to enjoy with my very basic Dell hardware.

palewire (ben-welsh) wrote :

I reverted to a backup xorg.conf from the terminal and got the single monitor working again. So don't worry about me on that.

Da Shroom (9n-georgm-bc) wrote :

Sorry about that :-)

But the DVI thing is interesting (although it won't be a monitor problem if both screens die)
Do you have one of those DVI to VGA adapters that come with the graphics card, if so can you try with just the VGA monitor, but through that.

Also, what about unity 2D, have you tried this ?

Sorry for all this,

I'm confirming this in X thanks to the separate X screen problems (and I have the same card)

Changed in xorg (Ubuntu):
status: New → Confirmed
dmbortz (dmbortz-gmail) wrote :

Just wanted to add that I have been having the same problem (same error messages, same frequency of crashing). Though, my setup has a 3500FX for the monitor and a Tesla C2070 GPU...and I only have the one monitor plugged in. to the 3500FX.

David

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in unity (Ubuntu):
status: New → Confirmed
Da Shroom (9n-georgm-bc) wrote :

Thanks dmbortz, that has saved a lot of work, and it is definitely not a prop driver problem

bugbot (bugbot) on 2011-11-17
affects: xorg (Ubuntu) → nvidia-graphics-drivers (Ubuntu)
Da Shroom (9n-georgm-bc) on 2011-11-18
Changed in xorg (Ubuntu):
status: New → Confirmed
Bryce Harrington (bryce) wrote :

[The -nvidia task is sufficient, we don't need this also filed against xorg.]

This sounds a lot like a driver bug, but I suppose it's possible Unity could be doing something odd. Has anyone attempted reproducing it under Gnome Shell or straight compiz (no unity)?

Changed in xorg (Ubuntu):
status: Confirmed → Invalid
Bryce Harrington (bryce) wrote :

@Alberto, it's unclear if this is Unity or the video driver but appears to be -nvidia specific and possibly a regression starting in oneiric. Would you mind raising this with NVIDIA when you get a chance?

Changed in nvidia-graphics-drivers (Ubuntu):
assignee: nobody → Alberto Milone (albertomilone)
importance: Undecided → High
status: Confirmed → Triaged
description: updated
Bryce Harrington (bryce) wrote :

Could someone on the Compiz or Unity side evaluate this from the standpoint of being a Twinview-specific error? I bet there is some special code in Compiz/Nux/Unity for twinview as opposed to xrandr setups, and wonder if the bug could lay there.

description: updated
Bryce Harrington (bryce) wrote :

To all who are seeing this bug occur, please see if you can narrow down the conditions that seem to lead to this condition. I know it seems totally random, but this type of bug typically is triggered by _something_ - although it may not be at all obvious.

When it happens, note what you had been doing the 5-10 minutes previously. And think about the state of the system, loads it might have been running, etc.

The reason being able to reproduce it at will is important is so we can have you try varying different things to see if it makes the bug go away (or occur more), which can help in narrowing down where exactly the bug is.

Also, I browsed through all the provided logs but saw no error messages. It's likely no errors are written at all anywhere, but sometimes you may find something in ~/.xsession-errors, the gsd logs, /var/log/*, etc. If anyone spots anything that looks relevant from any of these places please post here.

Martin Pitt (pitti) on 2011-11-21
no longer affects: jockey
Da Shroom (9n-georgm-bc) wrote :

Ah, Bug Control have arrived :-D

I have suspicions about it being related to certain monitor sizes, he says that he doesn't get it with just his vga screen, but with just one DVI screen he does. I doubt it is DVI specific, but it could be to do with screen sizes. A DVI-VGA adaptor would settle this. And I also think it's twinview specific. Despite the failures of the separate X screen setup, it didn't black out, although there are trigger problems with this.

Thanks,
George

Didier Roche (didrocks) on 2011-11-22
Changed in unity:
status: New → Confirmed
palewire (ben-welsh) wrote :

I havn't found a converter, but I did try the larger monitor on VGA for a day at work and it ran without problem. It had been on DVI before. I'm away from the office this week (It's Thanksgiving in the US) but I will try to find the parts and try the converter test next week. Just to be clear, both monitors worked alone with the VGA, but the larger one crashed on the DVI both alone and in tandem.

Da Shroom (9n-georgm-bc) wrote :

Thanks for clarifying,

I mean problems with screen sizes within DVI.

Thanks for your help.

palewire (ben-welsh) wrote :

Since I got back from vacation, I've been running the wider of the two monitors solo using the VGA input. It worked fine for a little bit, but the last few days I've had about one crash per day where it blacks in exactly the same fashion as before. Often it happens when I'm listening to a podcast in banshee (some of which are video podcasts) and develop web applications with Firefox and gEdit.

Omer Akram (om26er) on 2011-12-07
no longer affects: xorg (Ubuntu)
Changed in unity:
importance: Undecided → Medium
Changed in unity (Ubuntu):
importance: Undecided → Medium
Bryce Harrington (bryce) on 2011-12-15
summary: - Dual monitors randomly black out, might be linked to switching
- workspaces
+ Dual monitors black out and session freezes, might be linked to
+ switching workspaces

Just to add: I have the same problem, with the same crippling effects: my displays will randomly go into powersave (turn completely black) and cannot be awakened through the mouse or keyboard input. The only solution is a hard reboot.

I have a System 76 Mad Dog:
1 GB nVidia GeForce GT430
4 GB - 2 x 2 GB - DDR3 - 1333 MHz
2nd Generation Intel Core i5-2400

Because of my graphics card, one of my monitors is connected through DVI; the other is connected by HDMI.

I can also echo Palewire that this has had a seemingly positive cause while playing music (Banshee or Clementine) although at other times it seems just random.

Bryce: I can say that I have never been switching workspaces when this occured. However, I do seem to be affected by another bug when using Unity that clicking the Workspace Switcher makes my screens temporarily go black.

Larry Tate (cathect) wrote :

Two additional points:

1. I can reproduce this under Gnome as well as Unity. Although it happens with more frequency with Unity.

2. I can also echo Palewire's claim that this has happened during streaming media. While watching an embedded video, for example. Also, while playing/streaming music on Clementine or Banshee.

palewire (ben-welsh) wrote :

Since first encountering this bug, I've limited myself to working on a single monitor. When I do this, the bug happens much less frequently, but it still does occur once every couple days. And it seems to happen when I have a video podcast going in Banshee. Last week I was watching the CBS Evening News podcast (cool, I know) when it dropped off.

Larry Tate (cathect) wrote :

Here is my kernal log, if it helps.

Larry Tate (cathect) wrote :

I just experienced the bug. Here were the conditions:

Under Gnome:
1. Libreoffice document open.
2. Firefox open.
3. Clementine open.

Went to the left corner to access "activites" and as the expose-like effect began to give me access to Windows and Applications the monitors went black and a hard reboot was required.

This happened around 4:27 (16:27) or so. I'm attching the kernel log.

palewire (ben-welsh) wrote :

Even with one monitor, this continues to happen to me every couple days at work. This morning, I wasn't even watching video, just use my web browser, and I had it black out. Here is a snippet from the kernel log that I think is around the time of the crash.

Jan 17 11:13:37 conrad kernel: [90303.503378] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Jan 17 11:13:37 conrad kernel: [90303.682684] hda-intel: spurious response 0x0:0x0, last cmd=0x470e00
Jan 17 11:13:37 conrad kernel: [90303.682688] hda-intel: spurious response 0x9000094:0x0, last cmd=0x470e00
Jan 17 11:13:37 conrad kernel: [90303.682691] hda-intel: spurious response 0x0:0x0, last cmd=0x470e00
Jan 17 11:13:37 conrad kernel: [90303.682693] hda-intel: spurious response 0x0:0x0, last cmd=0x470e00
Jan 17 11:13:37 conrad kernel: [90303.682695] hda-intel: spurious response 0x0:0x0, last cmd=0x470e00
Jan 17 11:13:37 conrad kernel: [90303.682697] hda-intel: spurious response 0x0:0x1, last cmd=0x10420000
Jan 17 11:13:37 conrad kernel: [90303.682700] hda-intel: spurious response 0x0:0x1, last cmd=0x10420000
Jan 17 11:13:37 conrad kernel: [90303.682702] hda-intel: spurious response 0x9000094:0x1, last cmd=0x10420000
Jan 17 11:13:37 conrad kernel: [90303.682704] hda-intel: spurious response 0x0:0x1, last cmd=0x10420000
Jan 17 11:13:37 conrad kernel: [90303.682707] hda-intel: spurious response 0x0:0x1, last cmd=0x10420000
Jan 17 11:13:37 conrad kernel: [90303.682709] hda-intel: spurious response 0x0:0x1, last cmd=0x10420000
Jan 17 11:13:37 conrad kernel: [90303.682711] hda-intel: spurious response 0x0:0x2, last cmd=0x20420000
Jan 17 11:13:37 conrad kernel: [90303.682713] hda-intel: spurious response 0x0:0x2, last cmd=0x20420000
Jan 17 11:13:37 conrad kernel: [90Jan 17 11:14:23 conrad kernel: imklog 5.8.1, log source = /proc/kmsg started.

palewire (ben-welsh) wrote :
Download full text (3.8 KiB)

Happened in a little bit later in the day. This is a real drag. Is there anything I can do? Here's the kernel log again.

Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x0, last cmd=0x470e00
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x9000094:0x0, last cmd=0x470e00
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x0, last cmd=0x470e00
Jan 17 12:54:51 conrad kernel: last message repeated 2 times
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x1, last cmd=0x10420000
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x1, last cmd=0x10420000
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x9000094:0x1, last cmd=0x10420000
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x1, last cmd=0x10420000
Jan 17 12:54:51 conrad kernel: last message repeated 2 times
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x2, last cmd=0x20420000
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x2, last cmd=0x20420000
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x9000094:0x2, last cmd=0x20420000
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x2, last cmd=0x20420000
Jan 17 12:54:51 conrad kernel: last message repeated 2 times
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x3, last cmd=0x30420000
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x3, last cmd=0x30420000
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x9000094:0x3, last cmd=0x30420000
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x3, last cmd=0x30420000
Jan 17 12:54:51 conrad kernel: last message repeated 2 times
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x0, last cmd=0x470e00
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x1, last cmd=0x10420000
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x2, last cmd=0x20420000
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x3, last cmd=0x30420000
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x0, last cmd=0x470e00
Jan 17 12:54:51 conrad kernel: last message repeated 3 times
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x40:0x0, last cmd=0x470e00
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x0, last cmd=0x470e00
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x0, last cmd=0x470e00
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x11:0x0, last cmd=0x470e00
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x0:0x0, last cmd=0x470e00
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious response 0x1:0x0, last cmd=0x470e00
Jan 17 12:54:51 conrad kernel: [ 6051.296870] hda-intel: spurious...

Read more...

Larry Tate (cathect) wrote :

Palewire:

I'm not getting the same kernal messages, but still have the issue.

I've noticed that moving to the gnome3 shell has significantly reduced my problem. However, if I use Unity, the problem is crippling.

Larry Tate (cathect) wrote :

After today's kernal update, the problem still persists.

Jan 20 06:07:30 alan-desktop kernel: [ 910.375483] Xorg[1310]: segfault at b56b5000 ip b50fd3e1 sp bf864580 error 7 in nvidia_drv.so[b5024000+62a000]

I also noticed a message in the log that there was no NVIDIA graphics adapter found:

Jan 20 06:08:20 alan-desktop kernel: [ 15.210590] nvidia: module license 'NVIDIA' taints kernel.
Jan 20 06:08:20 alan-desktop kernel: [ 15.210593] Disabling lock debugging due to kernel taint
Jan 20 06:08:20 alan-desktop kernel: [ 15.337035] NVRM: No NVIDIA graphics adapter found!

palewire (ben-welsh) wrote :

Booted in GNOME classic, went back to dual monitors. Worked for a day and then crashed again, pretty quickly after I started listening to an audio podcast in Banshee.

Omer Akram (om26er) on 2012-01-23
tags: added: multimonitor
Omer Akram (om26er) wrote :

@alberto, any update on the issue? thx.

Changed in unity:
status: Confirmed → Triaged
Changed in unity (Ubuntu):
status: Confirmed → Triaged
Bryce Harrington (bryce) wrote :

From the testing, so far it sounds like it can be reproduced outside unity, and the dmesg errors mean that it's the driver failing not unity. Closing out the unity tasks.

Also, it appears people see it whether using one or two monitors, so this suggests its not a multi-monitor issue.

Changed in unity (Ubuntu):
status: Triaged → Invalid
summary: - Dual monitors black out and session freezes, might be linked to
- switching workspaces
+ Monitor(s) black out and session freezes. "NVRM: GPU at 0000:01:00.0
+ has fallen off the bus."

I thought I'd experiment with monitor resolution to see if that could be the issue. I've got dual monitors with 1680x1050 resolution on each.

When I tried to back down the resolution to a lower setting I got the exact same problem of the blacked-out monitors the second I applied the new setting.

I had to wait until the timer hit zero and restored my prior resolution. I was unable to get any other resolution to work.

Could this be connected somehow?

Nick Booker (nmbooker) wrote :

Ditto here - both screens go blank in Unity if I try to enable external monitor from the Nvidia settings dialogue box.

I've also noticed that any windows that are minimised at the time I attempt to activate the second monitor won't restore when I click their Unity launcher icons or alt-tab once the display has returned back to normal. The top panel displays the name of the app, but the window itself doesn't appear. The only way around that seems to be to force-quit and re-open the program (or log out and back in).

If I enable twinview the same way under Xfce the twinview feature works perfectly.

Larry Tate (cathect) wrote :

BTW: today's NVIDIA driver update did not resolve the issue.

Larry Tate (cathect) wrote :

Is anyone else having problems with the native display setting tool in ubuntu with this issue?

If I open up display settings, it clearly cannot detect my displays and is presenting me with a huge, single monitor with 3360 X 1050 resolution (seen attached screenshot).

It is unable to detect the display and I am unable to change the resolution.

In past versions of Ubuntu, I was able to use this tool, however it always asked me if I wanted to use the nividia-settings instead. Could there be some conflict here?

Tye (tye3ow) wrote :

still getting GPU has fallen off the bus errors even with 295.20 (from x-updates). same if I force the driver to 280.13 (from official repos).

I'm using Gnome Shell, and I experience it most often when I have intensive OpenGL apps running (Trine 1 & 2 running under WINE, Marble Arena 2 running natively in Desura, etc). I experience it more often using Unity, but that is likely because of Compiz using OpenGL. it seems like it has something to do with system load or with heat generation.

I have a GeForce 210 and have DynamicTwinView disabled (as it creates issues with the current nvidia driver and XBMC). My monitor is connected using the DVI port

Tye (tye3ow) wrote :

still happens all the time, one of the reasons I was reluctant to upgrade to 11.10 and sadly the more it happens the more I think about going back to 11.04 (as it never happened in 11.04)

Tye (tye3ow) wrote :

it seems to be heat generation as I can load and minimize XBMC 11.0 and then watch my GPU temperature climb to above 100°C. this issue is not present in 11.04 which might be something to do with Gnome3 rather than Gnome2 or the nVidia drivers (but that is just a guess at this point)

Larry Tate (cathect) wrote :

A few days ago there was a NVIDIA driver update. At that time I backed down from the version-current updates and to the version-current [recommended] driver.

That has resolved ALL my issues.

Tye (tye3ow) wrote :

@Strange_cathect, what version of the drivers are you using? the 'version-current' in the repos is still 280.13 (unless you're using the x-updates repo)

tye@T:~$ apt-cache policy nvidia-current nvidia-current-updates
nvidia-current:
  Installed: 295.33-0ubuntu1~oneiric~xup1
  Candidate: 295.33-0ubuntu1~oneiric~xup1
  Version table:
 *** 295.33-0ubuntu1~oneiric~xup1 0
        500 http://ppa.launchpad.net/ubuntu-x-swat/x-updates/ubuntu/ oneiric/main i386 Packages
        100 /var/lib/dpkg/status
     280.13-0ubuntu6 0
        500 http://ca.archive.ubuntu.com/ubuntu/ oneiric/restricted i386 Packages
nvidia-current-updates:
  Installed: 280.13-0ubuntu5
  Candidate: 280.13-0ubuntu5
  Version table:
 *** 280.13-0ubuntu5 0
        500 http://ca.archive.ubuntu.com/ubuntu/ oneiric/restricted i386 Packages
        100 /var/lib/dpkg/status

Larry Tate (cathect) wrote :

I have 295.33.

I'm updating on: http://ppa.launchpad.net/ubuntu-x-swat/x-updates/ubuntu

Since I started using this I've had zero issues for several days now.
-------------------

nvidia-current:
  Installed: 295.33-0ubuntu1~oneiric~xup1
  Candidate: 295.33-0ubuntu1~oneiric~xup1
  Version table:
 *** 295.33-0ubuntu1~oneiric~xup1 0
        500 http://ppa.launchpad.net/ubuntu-x-swat/x-updates/ubuntu/ oneiric/main i386 Packages
        100 /var/lib/dpkg/status
     280.13-0ubuntu6 0
        500 http://us.archive.ubuntu.com/ubuntu/ oneiric/restricted i386 Packages
nvidia-current-updates:
  Installed: 280.13-0ubuntu5
  Candidate: 280.13-0ubuntu5
  Version table:
 *** 280.13-0ubuntu5 0
        500 http://us.archive.ubuntu.com/ubuntu/ oneiric/restricted i386 Packages
        100 /var/lib/dpkg/status

Tye (tye3ow) wrote :

what happens when you use an intensive OpenGL app? XBMC in windowed mode is a good place to start, since you can minimize it and watch your heat, or rather anything using windowed mode+OpenGL at any larger scale. you can watch your heat with this (it's what I have in conky lol):
nvidia-settings -query GPUCoreTemp | grep Attribute | grep -o '[0-9]\{2,3\}'
after a minute or so I'm up to 90+°C and after a few minutes I end up over 100°C
I'm using XBMC in windowed/maximized mode and a desktop resolution of 1920×1080@60Hz with DynaimcTwinView disabled (I use grandr to resize, with DynamicTwinView enabled, XBMC fails to get an output list and I end up with massive stutters)

Tye (tye3ow) wrote :

I figured I'd give it a go and see if the problem might be fixed so I fired up some Marble Arena 2 in Desura and played it for a little bit, and then same problem

-------------------
Mar 28 04:42:04 T kernel: [94085.118038] CPU0: Core temperature above threshold, cpu clock throttled (total events = 242827)
Mar 28 04:42:04 T kernel: [94085.118524] CPU0: Core temperature/speed normal
Mar 28 04:47:04 T kernel: [94385.119355] CPU0: Core temperature above threshold, cpu clock throttled (total events = 324658)
Mar 28 04:47:04 T kernel: [94385.119838] CPU0: Core temperature/speed normal
Mar 28 04:50:50 T kernel: [94610.420071] NVRM: GPU at 0000:04:00.0 has fallen off the bus.
Mar 28 04:50:50 T kernel: [94610.420080] NVRM: GPU at 0000:04:00.0 has fallen off the bus.
-------------------

Larry Tate (cathect) wrote :

I just had it occur again. I remain "fixed" on the old issues: playing music on Clementine or watching flash videos. However, I just downloaded the Steel Storm game. A few seconds in the problem appears and I have to hard reboot.

dikkjo (dikkjo) wrote :

Hi Everyone,

My GPU is also falling off the bus ... The funny thing about this message is that the first time i got it, i opened the case to check that the card was still correctly inserted in the bus ;-) Very disturbing issue anyway :-/ and I also started to get it around october/november last year.

I also have the feeling that streaming videos, like youtube in chrome, are kinda triggering this issue.
I also was in a multimonitor configuration, but since a few days I disconnected the second one but just got a black screen.

On a forum I've found a proposed solution for this issue was to use the command "nvidia-smi -pm 1" to enable the persistent mode. I had a few days of peace and just now got it again.

Also tried to disable DPMS as a desperate measure, but did not help.

I also still have a few options about TwinView and metamodes in my xorg.conf which I may comment out if the GPU still fail.
For now I've just disabled Vsync in CCSM and I am waiting to see if it reappears...

About intensive openGl app, I only play Minecraft and I can play for hours normally. But friday I could not play 2 mins without crashing and i had just booted the pc. I will keep an eye on temperatures tho

Anyway as you can see in my log below, my pc was booted only since 800secs when happened, had a youtube video going and minecraft was at the mojang logo screen..nothing in the logs and suddenly the gpu is gone!

-------------------------------
Apr 18 18:39:21 dubu rsyslogd: [origin software="rsyslogd" swVersion="5.8.6" x-pid="1000" x-info="http://www.rsyslog.com"] rsyslogd was HUPed
Apr 18 18:39:46 dubu anacron[1294]: Job `cron.daily' terminated
Apr 18 18:39:46 dubu anacron[1294]: Normal exit (1 job run)
Apr 18 18:42:38 dubu kernel: [ 837.113335] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Apr 18 18:42:38 dubu kernel: [ 837.113386] NVRM: GPU at 0000:01:00.0 has fallen off the bus.

dikkjo (dikkjo) wrote :

I think for me it was a temperature problem. I made a little script to trace my GPU temp with nvidia-smi tool and as soon as it gets around 110°C the GPU is shutting down.
I made an extensive cleanup of my case and also set all the fans to max rpms ( got a fan controller on my case) now I can watch streaming videos full screen at around 60-62°C.
I understood that the GPU was shutting down because when I switched to "Plug&Play OS" in my BIOS, in the logs I found some new lines:
----------------------------------------------------------------------
Apr 23 20:53:33 dubu kernel: [12090.960976] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Apr 23 20:53:33 dubu kernel: [12090.960986] NVRM: os_pci_init_handle: invalid context!
Apr 23 20:53:33 dubu kernel: [12090.960988] NVRM: os_pci_init_handle: invalid context!
Apr 23 20:53:33 dubu kernel: [12090.961017] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
Apr 23 20:53:33 dubu kernel: [12090.961021] NVRM: os_pci_init_handle: invalid context!
Apr 23 20:53:33 dubu kernel: [12090.961023] NVRM: os_pci_init_handle: invalid context!
----------------------------------------------------------------------

Thanks to Tye for putting me in the right way ;-)

Valeriy (tverdohleb) wrote :

I was executing 10×glxgears and has monitored gpu temperature. On 64—65°C GPU has fallen off the bus. It is abnormal, as my GF 8500 can operate up to 100°C and feel ok, as was before. So the overheat is not in my case and the problem seems to be deeper.

Larry Tate (cathect) wrote :

Persists in 12.04....

krab1k (racek-t) wrote :

Hi, I have same problem using Nvidia Go 7300 with latest drivers in precise. Relevant part of dmesg output follows:

[ 435.908958] NVRM: os_pci_init_handle: invalid context!
[ 435.908969] NVRM: os_pci_init_handle: invalid context!
[ 435.908988] NVRM: os_map_kernel_space: can't map 0xc0000000, invalid context!
[ 435.909017] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[ 435.909031] NVRM: os_pci_init_handle: invalid context!
[ 435.909036] NVRM: os_pci_init_handle: invalid context!
[ 435.909045] NVRM: os_map_kernel_space: can't map 0xc0000000, invalid context!
[ 436.944057] irq 16: nobody cared (try booting with the "irqpoll" option)
[ 436.944065] Pid: 1241, comm: Xorg Tainted: P C O 3.2.0-23-generic #36-Ubuntu
[ 436.944068] Call Trace:
[ 436.944077] [<c1561d5f>] ? printk+0x2d/0x2f
[ 436.944083] [<c10b1289>] __report_bad_irq+0x29/0xd0
[ 436.944087] [<c107ad84>] ? tick_handle_oneshot_broadcast+0xf4/0x100
[ 436.944091] [<c10b14e4>] note_interrupt+0x104/0x150
[ 436.944095] [<c10af3ae>] handle_irq_event_percpu+0x9e/0x200
[ 436.944100] [<c1027378>] ? default_spin_lock_flags+0x8/0x10
[ 436.944104] [<c1576d2d>] ? _raw_spin_lock_irqsave+0x2d/0x40
[ 436.944107] [<c10af54b>] handle_irq_event+0x3b/0x60
[ 436.944111] [<c10b1cf0>] ? unmask_irq+0x30/0x30
[ 436.944115] [<c10b1d3e>] handle_fasteoi_irq+0x4e/0xd0
[ 436.944117] <IRQ> [<c157e432>] ? do_IRQ+0x42/0xc0
[ 436.944124] [<c10b5c0d>] ? rcu_irq_exit+0xd/0x10
[ 436.944128] [<c105218c>] ? irq_exit+0x3c/0xa0
[ 436.944132] [<c157e509>] ? smp_apic_timer_interrupt+0x59/0x88
[ 436.944135] [<c157e370>] ? common_interrupt+0x30/0x38
[ 436.944139] [<c1459caf>] ? acpi_pm_read+0xf/0x20
[ 436.944144] [<c1074204>] ? getnstimeofday+0x54/0x120
[ 436.944148] [<c1074316>] ? do_gettimeofday+0x16/0x40
[ 436.944152] [<c10508e3>] ? sys_gettimeofday+0x23/0x70
[ 436.944156] [<c1576ed4>] ? syscall_call+0x7/0xb
[ 436.944158] handlers:
[ 436.944336] [<f9b4b050>] nv_kern_isr
[ 436.944338] Disabling IRQ #16

MvW (2nv2u) wrote :

Never happend to me before until 12.04, this combined with bug:
https://bugs.launchpad.net/ubuntu/+bug/980519
Makes the new LTS unusable!

Tom Robinson (terobin) wrote :

I have this problem.

I'm using Ubuntu 12.04 64bit and NVIDIA Driver Version: 295.40

I open Minecraft and am able to apply for around 5 minutes and then everything goes black. The problem doesn't seem to affect me at any other time, only when playing Minecraft.

Tom Robinson (terobin) wrote :

Sorry, that should say 'play' not 'apply'.

Changed in unity:
status: Triaged → Invalid
summary: - Monitor(s) black out and session freezes. "NVRM: GPU at 0000:01:00.0
- has fallen off the bus."
+ [NVIDIA] Monitor(s) black out and session freezes. "NVRM: GPU at
+ 0000:01:00.0 has fallen off the bus."
no longer affects: unity
no longer affects: unity (Ubuntu)
Larry Tate (cathect) wrote :

Has anyone updated on the ex-swat ppa to see if that makes a difference?

Tom Robinson (terobin) wrote :

If that is these instructions:

"sudo add-apt-repository ppa:ubuntu-x-swat/x-updates
sudo apt-get update
sudo apt-get install nvidia-current"

(found at http://mygeekopinions.blogspot.co.uk/2011/06/how-to-install-nvidia-2750907-driver-in.html)

Then I tried that but no luck.

Bernhard (xro) wrote :

I manually downgraded to NVIDIA 275.43 and the problem is gone, so it appears to be a problem with recent Nvidia drivers.

Mikael Karon (mikael-karon) wrote :

@Bernhard exactly how did you do that (have a good ppa with that version around [precise], or did you download/install from nvidia's site)?

Sergio Callegari (callegar) wrote :

GPU falls off the bus also on
Ubuntu 12.04 64 bit + Dell Precision T5400 (Nvidia Quadro FX 570).
Nvidia drivers are the latest 302.17.

All this seems to be very well known on the NV mailing lists and forums, which show similar bug reports also for 295.x drivers and other distros.

In my case, the screen does not get black, but merely freezes.
The machine does not hang and remains reachable via ssh.

In my case it does not seem to be temperature related (machine is in an air conditioned room, with little load, almost no graphical load and monitor reports the gpu at 57 °C).

the same here:
nVidia Corporation NV44 [GeForce 7100 GS] (rev a1)
driver 302.17
on Gentoo Linux / 686 (32 bit) / Kernel 3.4.4

syslog is flooded with "NVRM: os_pci_init_handle: invalid context!"
and after a while I get
"NVRM: GPU at 0000:01:00.0 has fallen off the bus."
with same symptoms as described above...

(This forced me to revert back to 290.10, which works rock solid over months!)

Tye (tye3ow) wrote :

still present in 12.04 (64bit) with the 295.49 (X-Updates) drivers on the GeForce 210. I don't know if it is a heat issue or if heat generation is just a symptom of whatever issue is causing the problems. I sure would like for the problems to go away though, as this is the second release with the same problems using either the default or the X-Updates drivers and both releases use the 3.x series of kernels. 11.04 is the last release that I could even think about using OpenGL in with any sort of reliability which is very disappointing.

Tye (tye3ow) wrote :

oh yes, update: the screen no longer goes black, but instead as Sergio pointed out, it freezes which I suppose might sound like progress, except that the system reboots itself after a few seconds and the problem still renders OpenGL useless.

Tye (tye3ow) wrote :

I'm not sure if this is a duplicate of this bug or not:
https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers/+bug/973096

the same indications of both reproduction and anecdotal fixes are present there. I might give the upgraded kernel a try but I loathe using packages so far ahead of the repos and PPAs.

Tye (tye3ow) wrote :

apparently not a duplicate as it is only for logouts, not system freezes. I guess we'll see if we garner any attention or intent to actually fix this one now that the other one is actually resolved (although, given the history on this one, I doubt it)

lately my system will not always black out and then restart, sometimes the video simply freezes. others it will black out and after a minute or so it will reset itself. it's all very curious.

Larry Tate (cathect) wrote :

Does anyone know if this bug is resolved in 12.10???

As I am using Gentoo Linux I have no idea which version of the driver is in Ubuntu, but after having the same problems for a long time with many 30x.xx nvidia driver versions, I tried an upgrade from Nvidia driver 290.10 (which still worked fine) to 304.60 - which seems to work fine again, I am using that version for several weeks now without any problems.

nVidia Corporation NV44 [GeForce 7100 GS] (rev a1)
driver 304.60
on Gentoo Linux / i686 (32 bit) / Kernel 3.6.1

thedanyes (thedanyes) wrote :

I encountered a similar problem on my desktop system running 11.04 with the nVidia proprietary driver. Using Intel DH55TC motherboard and nVidia GTS 450 OR nVidia 9800GT (tried both). For me, the issue was resolved through a motherboard firmware update.

https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers/+bug/733374/comments/73

Tye (tye3ow) wrote :

@thedanyes, I'm intrigued, can you play full screen openGL apps for long periods without crashes? some good tests are Marble Arena 2 or XBMC in full screen playing high-res video. if so, do you have links to the two motherboard firmware versions, or at least version numbers of both?

update for me: still experience crashes in high-demand openGL apps

Tom Robinson (terobin) wrote :

Hi, I think the problem may be due to overheating. I've added an additional fan pointing directly at the graphics card and made sure my computer has space to vent heat, and I've not had any problems since. I did this around 7 hours ago and the computer has been running since, letting me play games and stream video at the same time on multiple monitors.

Before doing this I was experiencing the crashing on both Linux and Windows 8 running on this pc.

Luis Alvarado (luisalvarado) wrote :

No overheating on my part. I have tested the 560 ti with the HDMI cable, with a DVI cable and with a DVI-VGA. All cases have same problem.

I have even tried using different versions, starting from the originals that come with 12.10 to the ones in X-Swat, to the ones in Xorg Edgers. Basically from 304.xx to 313.xx. In all cases I used 64 bit. I even tested 13.04.

My hardware specs are:

Intel Core i7 2600
16 GB RAM
 Intel DZ68DB Motherboard
Intel 128GB SSD
Nvidia 560 TI

This was working fine until 2 weeks ago. Then this precise problem appeared out of nowhere. I tested all drivers version in 12.10 and 13.04 that came with it or were in Swat or Edgers PPA. The only one that is working is nouveau but that is just throwing the towel since the solution should be to actually use the video performance.

gmhawash (gmhawash) wrote :

I have had this issue as well, with Linux Mint 14, Ubuntu 12.10 and every flavor I tried. It is definitely heat related but still a problem with the driver. The fan on my Nvidia GTX 590 had been high the last few weeks and I've had issues with hard drives lately; so when I reinstalled, I started seeing the problem; I finally took the NVIDIA card out, and vaccumed the junk out of it.
Now the fan is not on, thankfully, and the system does not crash.

However, the fact that this crash started with a reinstall, tells me that the latest drivers are buggy. The system was working fine before even when the fan was on, and the GPU was getting hot. The install of new system and drivers expose this problem, and heat caused the system (Nvidia driver) to fail.

Hope that helps,

thedanyes (thedanyes) wrote :

@Tye Sorry I didn't see your inquiry for so long. I can't say for sure which revision I had when I was seeing the problems, but I first noticed they were fixed with the latest Version 0048. There were many entries in the firmware development changelog that indicated problems related to video and PCIe devices with different revisions.
http://downloadmirror.intel.com/20725/eng/TC_0048_ReleaseNotes.pdf

I have the same issue. Sometimes my system will lock up several times per day. I have driver 304.43 with latest Ubuntu. I never had that issue before the upgrade to the latest Ubuntu.

Nukeador (nukeador) wrote :

Same problem here, Ubuntu 12.10 64b, macbook pro 6,2, nvidia GT330M

I've tested nvidia-current, nvidia-current-updates and the two nvidia-experimental drivers.

Same problem, blank screen+full system freeze.

I've tried both solutions described here with no success: http://askubuntu.com/questions/235760/unity-does-not-appear-after-installing-proprietary-nvidia-drivers-gpu-has-falle

I've increase fans speed to see if this coudl be a heat problem.

piotr zimoch (ebytyes) on 2013-05-15
Changed in nvidia-graphics-drivers (Ubuntu):
status: Triaged → New
status: New → Incomplete
status: Incomplete → Opinion
cryptor (cryptor) wrote :

I started having this issue [both monitors black out, no mouse our keyboard input, still able to SSH in over the
network] after upgrading from 11.04 to 12.04. Generally, when I SSH after the freeze/hang, Xorg is taking
100% CPU and I see "EQ overflowing" in the Xorg.0.log.

My setup looks like this.

Ubuntu 12.04
Linux box 3.2.0-51-generic #77-Ubuntu SMP Wed Jul 24 20:18:19 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
NVIDIA Driver Version: 304.88 [Additional Drivers: version current-updates]
Quadro FX 2800M (GPU 0)
Two displays: ViewSonic VX2439 Series (DFP-1), LGD (DFP-0)

I believe that my issue is related to the following in /var/log/syslog (dmesg):

[99101.294734] NVRM: GPU at 0000:01:00.0 has fallen off the bus.
[99101.294742] NVRM: GPU at 0000:01:00.0 has fallen off the bus.

There seem to be lots of threads concerning this error message with NVIDIA hardware, but no universal solution.

http://www.nvnews.net/vbulletin/showthread.php?p=2571522

https://devtalk.nvidia.com/default/topic/567297/linux/linux-3-10-driver-crash/1
https://devtalk.nvidia.com/default/topic/537302/linux/both-screens-black-xorg-at-100-cpu-overflow-errors-in-xorg-0-log-nvidia-driver-310-14-quadro-fx/

http://www.cyberciti.biz/faq/debian-ubuntu-rhel-fedora-linux-nvidia-nvrm-gpu-fallen-off-bus/

http://forums.gentoo.org/viewtopic-t-925156-postdays-0-postorder-asc-start-25.html

I have tried enabling TwinView in xorg.conf ala palewire in #45, but that did not seem to make any difference.
I'm pretty sure the external monitor, or more likely its GPU, is involved because I do not have the problem when
traveling without it.

I have no reliable way to reproduce the issue. However, it does seem to happen often when I am scrolling a
window, such as in a browser or paging in a terminal. Several times the screens have gone blank while I
still have a finger on the mouse scrollwheel. Of course, this will not reliably produce a problem.

WBB

Tye (tye3ow) wrote :

it might be an issue with the DVI/HDMI interface, I don't think I had an issue with the VGA port.

it's running headless right now so I can't test it.

cryptor (cryptor) wrote :

@Tye

> issue with the DVI/HDMI interface

Are you referring to the physical connection? If so, I doubt that is it. Nothing about the physical
port changed when I upgraded from 11.04 to 12.04. However, just like you, I did not see the
problem on 11.04. I was definitely using the external monitor on 11.04.

BTW, I am still getting both screens black. One of your posts said that you were now getting
a frozen image, but no longer switching to black. So, that is a difference now.

cryptor (cryptor) wrote :

I seem to have stumbled on a fairly quick way to generate or reproduce the

"NVRM: GPU at 0000:01:00.0 has fallen off the bus."

error on my system.

I login to Ubuntu (either 3D/compiz or Ubuntu 2D) and then open a "Gnome Terminal".
On my system, this terminal has 50 lines.

$ echo $LINES
50

This terminal can be located on either my X screen primary display (external DFP-1) on on my
non-primary display (internal LGD, DFP-0).

Now, I create some listings usually about 1000 lines or so.

$ ls -alt ~
$ ls -alt /
$ ls -alt /usr/lib

At this point, I have a scrollbar widget that pan back and forth through the listings.
If I scroll rapidly back and forth through these listings (by dragging the scroller widget vigorously up and down) in
the gnome terminal window for 3 or 4 minutes, I will always get the black out and frozen X session.

BTW, I have enabled persistence as recommended elsewhere and I have been using Ubuntu 2D. Still the problem
persists on my M6500 laptop with Quadro FX 2800M GPU.

Ubuntu 12.04
Linux box 3.2.0-51-generic #77-Ubuntu SMP Wed Jul 24 20:18:19 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
NVIDIA Driver Version: 304.88 [Additional Drivers: version current-updates]
Quadro FX 2800M (GPU 0)
Two displays: ViewSonic VX2439 Series (DFP-1), LGD (DFP-0)

To post a comment you must log in.