Ubuntu

nvidia-current 195.36.15 + kernel 2.6.32 + dual cards crashes system

Reported by doctordruidphd on 2010-03-25
128
This bug affects 22 people
Affects Status Importance Assigned to Milestone
NVIDIA Drivers Ubuntu
Undecided
Unassigned
nvidia-graphics-drivers (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: nvidia-current

I cannot install the NVIDIA 195.36.15 driver. If I try to install the driver downloaded from the NVIDIA website, first screen about building the module proceeds normally. When the "Install 32-bit opengl drivers" screen appears, the whole system crashes, and requires a power down with the power switch to recover (no response to CTRL-ALT-DEL or ALT-SYSRQ-REISUB).
If I try to install the nvidia-current package, the installation appears to work properly, but on reboot, the system freezes during bootup, before the login screen appears, and requires a hard power down to recover. Note that the 190.53 driver, and all previous drivers, downloaded from the website install and function properly.

OS: kubuntu-10.04-current updates, kernel 2.6.32-17 (problem exists on all 2.6.32 versions)

greenman@Wolfenstein:~$ lsb_release -rd
Description: Ubuntu lucid (development branch)
Release: 10.04

I have tried:

Booting and installing with and without nomodeset option (190 driver from website requires nomodeset option to function)
Removing all traces of previous drivers, and nouveau.
Manually removing nvidia.ko from /lib/modules/2.6.32-17/kernel/drivers/video
The nvidia-installer-log file exists, but is empty after a failed 195 install (crashes before cache flushes)
Contacting NVIDIA - they say to contact the distribution-specific packagers (so here I am)
Modifying, removing, and reinstalling with nvidia-xconfig the xorg.conf file
Installing from the console with 'sudo service kdm stop'

System:
Dell XPS630i Intel Q9550 (amd_64) 8gb ram
Dual NVIDIA 9800GT cards

Robert Hooker (sarvatt) wrote :

Installing drivers from nvidia.com is not a supported way to install them and will not work like you said at the moment because of the extensive changes done to the packaging in lucid. 195.36.15 is available through the package manager as nvidia-current, and you can activate it through hardware drivers. Please remove the nvidia.com ones by running the installer with the --uninstall option before activating the Ubuntu packages.

Changed in nvidia-graphics-drivers (Ubuntu):
status: New → Invalid
doctordruidphd (doctordruidphd) wrote :

Thank You for your response, but as I said in my original post, Installing using the nvidia-current package causes the system to crash during bootup. This is after removing all traces of the binary nvidia driver from the website, and all traces of nouveau. The crash appears to occur before any log files are written, as there is nothing in dmesg to indicate what has crashed.

Changed in nvidia-graphics-drivers (Ubuntu):
status: Invalid → New
Bryce Harrington (bryce) on 2010-03-30
tags: added: kubuntu
vincefn (vincent-favre-nicolin) wrote :

   Hi,

 I have a similar issue: I am testing Lucid on a PC (amd64) )with two nVidia cards (one 9600GT with two screens attached, and ont GTX295 for CUDA computing). Kubuntu (beta2) installs fine, and works all right with the nouveau driver.
 Since I need CUDA, I installed nvidia-current (195.36.15), and after reboot the system hard-freezes right around the time the graphics should appear - I assume that's exactly when the nvidia module is launched, though I do not have a log to show immediately.

  If I remove any of the cards, everything works all right (still with the nvidia 195.36.15 driver).

  Note that this is from a Lucid beta2 clean install, after all available updates have been installed (-generic or -server kernel makes no difference). The most annoying part is that when this happens, for some reason I am not even able to boot in recovery mode, the PC also crashes... The only solution is to reboot with only one card.

  There are a few messages related to this in the nVidia forums, although no working solution has been reported. See for example:
http://www.nvnews.net/vbulletin/showthread.php?t=149415

  I tried disabling the VGA Arbiter in the kernel as suggested, but apparently the kernel VGA_ARB option is automagically turned backed to 'yes', due to some kernel config dependency I have not yet found...

vincefn (vincent-favre-nicolin) wrote :

  Further news... I managed to boot find by installing older kernel 2.6.31-21-generic (from ftp://ftp.ubuntu.com/ubuntu/pool/main/l/linux/), still with the nvidia-current (195.36.15) package.

  So far, it seems to work all right - I had a graphic message telling me that nvidia-current had failed to install, but glxinfo teels me I am using the proprietary driver, so it seems OK.

 So there's definitely a specific problem with 195.36.15 that appears with kernels 2.6.32 and above, when multiple GPU are installed...

summary: - Cannot install nvidia 195.36.15 - crashes system
+ nvidia-current 195.36.15 + kernel 2.6.32 + dual cards crashes system
vincefn (vincent-favre-nicolin) wrote :

I took the liberty of renaming the bug as it seems not to be an installation issue, but rather a crash at restart around the time X launches, specifically when using dual cards with a 2.6.32+ kernel.

 I can confirm (after a few hours) the 2.6.31-21 kernel works fine, including CUDA 3.0 computing.

tags: added: lucid
doctordruidphd (doctordruidphd) wrote :

I agree with your renaming. I can confirm that the 195.36.15 driver, whether loaded from "nvidia-glx" or from the nvidia webaite, works fine with the current karmic 2.6.31 kernel. It crashes all versions of the 2.6.32 kernels on lucid.

Captain Chaos (launchpad-chaos) wrote :

I'm the reporter of that other bug report. I've marked it as a duplicate of this one.

I, too, have two cards (two Geforce 8800 GTX's to be precise), so it seems clear that that's a factor.

geckon (theger) wrote :

My laptop has two cards and I can confirm this problem as well.

02:00.0 VGA compatible controller: nVidia Corporation C77 [GeForce 9100M G] (rev a2)
03:00.0 VGA compatible controller: nVidia Corporation G98 [GeForce 9300M GS] (rev a1)

Confirmation seemed in order with four people experiencing this and the circumstances pretty clear and consistant:

* 2.6.32 kernel
* current binary NVIDIA driver
* two NVIDIA GPU's

Changed in nvidia-graphics-drivers (Ubuntu):
status: New → Confirmed
geckon (theger) wrote :

Looks like all of us use 64-bit version -> it can be another condition to experience this problem. Or am I wrong?

FYI -- The procedure detailed in the link below did not solve the problem:
http://jeffhoogland.blogspot.com/2010/04/installing-nvidia-driver-in-ubuntu-1004.html

wlx (wangliangxu) wrote :

same here.
amd64, and two nvidia cards, could not use the driver from nvidia.com, but nvidia-current package is ok.

This morning I tried the beta drivers 195.36.24 (available from the x-swat ppa), with the same result as before : hard crash with 2.6.32-21+195.36.24, and booting fine using 2.6.31-21+195.36.24...

I also updated my bug report on the nvnews forum (http://www.nvnews.net/vbulletin/showthread.php?p=2241328), but without answer so far. Maybe it will help if other people submit their bug report on nvnews or at <email address hidden>, since it's likely this bug will only be fixed upstream.

schmolch (saschaheid) wrote :

same here.
amd64, nvidia 6150 onboard + nvidia pci-e 7600.
The system freezes before the graphical gui.
Had no problems with Karmic.

Bryce Harrington (bryce) wrote :

Problems related to installation of -nvidia from NVIDIA's website should be directed to NVIDIA.
    http://www.nvnews.net/vbulletin/forumdisplay.php?f=14

This bug tracker is for the -nvidia provided by Ubuntu only.

Changed in nvidia-graphics-drivers (Ubuntu):
status: Confirmed → Invalid
schmolch (saschaheid) wrote :

neither ubuntu-current nor nvidia's package work for me.
With nvidia-current i was not able to get anything but a black screen and hard freeze.
With nvidia's installer i at least got a frozen panel with some fancy video-distortions.

"Problems related to installation of -nvidia from NVIDIA's website should be directed to NVIDIA.
This bug tracker is for the -nvidia provided by Ubuntu only"

I would not ordinarily reply to something like this, but I deeply resent being told to get lost.
NVIDIA says this is an ubuntu problem, and now ubuntu says it's an NVIDIA problem.
If the person who decided this is "invalid" would have taken the time to read the original post, it would be obvious that the problem is with BOTH versions of the driver.
Now, I guess we're all being told it's our problem, and to just go away.
Well, if this problem doesn't get fixed, I suppose that will eventually happen.

FYI- the new 195.36.24 driver exhibits the same behavior.

Alan (evolutionspeak) wrote :

Changed back to confirmed since this affects Ubuntu's 'nvidia-current' package.

Changed in nvidia-graphics-drivers (Ubuntu):
status: Invalid → Confirmed

> Changed back to confirmed since this affects Ubuntu's 'nvidia-current' package.

 Thanks - indeed this bug is not about the installer but really about a hard crash which affects the nvidia-current package.

Filippo82 (filcorti) wrote :

Ok... same here.
two 9800gt cards
2.6.32-21-generic
Ubuntu Lucid....
same issues.

Has anyone encountered this bug with an 32bit (rather than amd64) installation ?

pawmar (pawmar) wrote :

Same for me (after upgrading to 10.04 from 9.10). Two graphics cards:

    02:00.0 VGA compatible controller: nVidia Corporation C77 [GeForce 8200] (rev a2)
    03:00.0 VGA compatible controller: nVidia Corporation G73 [GeForce 7300 GT] (rev a1

and everything works ok on:

    Linux ubik 2.6.31-21-generic #59-Ubuntu SMP Wed Mar 24 07:28:56 UTC 2010 i686 GNU/Linux

but not on 2.6.32.22-generic (freeze during boot). I'm using nvidia drivers from ubuntu, version: 195.36.15

Marcin Szmidt (mmcmarcin) wrote :

same Me after upgrading to 10.04 kernel 2.6.31 work fine but 2.6.32 make hangs. Same situation am64 and 2 VGA cards installed
GF 8300 onboard and GFX 260

sakisds (sakisds) wrote :

Same for me, 2x 8600gt. 64bit

petogal (petogal) wrote :

the same for me
32bit system video is GeForce Go7300

petogal wrote:
> the same for me
> 32bit system video is GeForce Go7300

This bug happens with two nVidia cards - do you also have two ?

Alberto Milone (albertomilone) wrote :

Please try to reproduce the problem and attach your /var/log/Xorg.0.log and /var/log/Xorg.0.log.old

Eddy (krionic) wrote :

Same here. 2 nVidia 8600GT SLi cards. Upgraded my 32-bit from Hardy to Lucid and immediately crashed on reboot. Installed 64-bit from CD on second hdd (ran into the dmraid problem there, lost all data on my primary drive, bleh) and every time I install either the 173 or -current drivers, I get a crash on boot. Believe something wrong with the way it installs.

Jockey.log:
2010-05-01 00:04:05,619 WARNING: modinfo for module nvidia_current failed: ERROR: modinfo: could not find module nvidia_current

"Please try to reproduce the problem and attach your /var/log/Xorg.0.log and /var/log/Xorg.0.log.old"

OK, here's what I did:
This is kubuntu-10.04, all current updates applied, dual 9800GT video cards, amd64.

1. Booted the system with 'nomodeset' option, running prorietary 190.53 driver.
2. Logged out of kde, chose console.
3. sudo service kdm stop ;; stops x
4. sudo nvidia-installer --uninstall ;;removes proprietary driver
5. sudo apt-get install nvidia-current ;; installs 195.36.15, brings in nvidia-settings. No errors reported.
6. sudo shutdown -h now
7. Rebooted with 'nomodeset' option
8. System hangs after the udevd errors, after an fsck message (saying file system is clean)
9. Requires power switch held down, no response to CTRL-ALT-DEL, ALT-SYSRQ REISUB
10. "logfiles" attachment contains /var/log/Xorg.0.log and Xorg.0.log.old

Additional note to above:
The nvidia-current.ko module is present in /lib/modules/2.6.32-22-generic/updates/dkms

Additional note:

The /var/log/dmesg file is from the previous boot with the 190.53 driver; no information was written to the file from the failed boot.

petogal (petogal) wrote :

32bit system video is GeForce Go7300 ., laptop ASUS F9J Dual Core Centrino
1 video card only, lost 3 hours of my life troubleshooting the problems described in this forum, after that I reinstalled and let nouveau keep the graphics, although without any 3d effects and full screen mode of any video is not very smooth

seanlano (seanlano) wrote :

I had a similar problem, but I only have one graphics card.

On 32bit Lucid everything worked fine, but on 64bit Lucid all of the nvidia drivers cause the system to not load a graphical environment, but I can still use a tty to uninstall the driver and go back to nouveau.

seanlano (seanlano) wrote :

Also, I get the crash using both 2.6.32-20 and 2.6.32-21.

Jon Sharpe (nomad64) wrote :

Confirmed on my setup for both the 32- and 64-bit versions of Lucid. After a fresh install everything is working fine. I use the official Ubuntu drivers via the Restricted Drivers utility. Upon reboot, the system freezes and requires a hard power down to recover. In my testing, the issue didn't happen if I took out one of the cards.

Specs:
MSI P6N SLI Motherboard
Pentium Core 2 Duo E6850 @ 3.00GHz
2xGeForce 9600 GT cards (SLI)

Alberto Milone (albertomilone) wrote :

@doctordruidphd
Installing the driver from the nvidia installer must have left your system in a broken state.

Compare these 2 lines from your log:
(II) Loading /usr/lib/xorg/modules/extensions/libglx.so
(II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so

with these 2 lines from my log (which is how things should be):
(II) Loading /usr/lib/xorg/extra-modules/libglx.so
(II) Loading /usr/lib/xorg/extra-modules/nvidia_drv.so

Please try a clean installation of Ubuntu and install the driver from Jockey.

I'm closing this bug report.

@all
If you didn't use the Nvidia installer (and installed the driver from Jockey) and you're experiencing a problem with 2 cards then you're facing a different issue and I'd be glad if you could open a separate bug report about it.

Changed in nvidia-graphics-drivers (Ubuntu):
status: Confirmed → Invalid
Changed in nvidia-drivers-ubuntu:
status: New → Invalid

seanlano wrote:
>I had a similar problem, but I only have one graphics card.
>
>On 32bit Lucid everything worked fine, but on 64bit Lucid all of the
>nvidia drivers cause the system to not load a graphical environment,
>but I can still use a tty to uninstall the driver and go back to nouveau.

  That is a different issue then - this bug involves a hard crash only with two cards - when I (or other people) remove one card, then it boots fine.

Alberto Milone wrote
> Please try to reproduce the problem and attach your /var/log/Xorg.0.log and /var/log/Xorg.0.log.old

  Sorry, this is impossible - somehow the crash occurs *before* Xorg writes anything in the log. I have attached the nvidia bug report.
  What I did was:
1) Start from working 2.6.31
2) reboot on 2.6.32 => crash
3) wait 2 minutes
4) hard reboot to 2.6.31
5) nvidia-bug-report.sh

  In the nvidia-bug-report text file you have aggregated (among other files) the Xorg.0.log and Xorg.0.log.old (just search for 'Xorg.0' to find them). However at the beginning of each log you'll see that the kernel is 2.6.31... In other words during the 2.6.32 boot X did not manage to start logging...

Alberto Milone wrote:
"If you didn't use the Nvidia installer (and installed the driver from Jockey) and you're experiencing a problem with 2 cards then you're facing a different issue and I'd be glad if you could open a separate bug report about it."

  OK, I just did that. New bug report specific for the "nvidia with dual-cards crash" at:

https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers/+bug/573557

I guess most people her will want to subscribe to that new bug...

Alberto Milone wrote:
"Please try a clean installation of Ubuntu and install the driver from Jockey."

Sorry, unable to do that. ALL of the installation cds -- live and alternate -- crash on my system after 9.04.
A separate bug report has been filed on this (#519123) which is not resolved.

It is possible to get the system to boot by deleting the nvidia-current.so module. I will do so, reboot, and try to install from Jockey.

Dax Marola (lord.xadar) on 2010-05-03
Changed in nvidia-graphics-drivers (Ubuntu):
status: Invalid → Confirmed
Changed in nvidia-drivers-ubuntu:
status: Invalid → Confirmed
Daniel Dadap (ddadap) wrote :

Thanks for all the reports. We have tracked this down to a problem with the kernel's VGA arbiter trying to move VGA ownership to a GPU that hasn't POSTed. We're working with the developers of the VGA arbiter to get this fixed in the kernel.

In the meantime, we have a patch which works around the issue by preventing the VGA arbiter from moving VGA ownership away from the default device. It is attached as "NVIDIA_kernel-195.36.24-682377.diff". You can apply the patch by downloading the latest installer from http://www.nvidia.com/object/linux-display-ia32-195.36.24.html (32-bit) or http://www.nvidia.com/object/linux-display-amd64-195.36.24.html (64-bit) and running the installer with the "--apply-patch /path/to/patch.diff" option. This will create a patched installer with a name ending in "-custom.run", which will install a driver with this workaround.

If you're not comfortable with applying the patch yourself, or with working out the conflicts between the driver as installed by the NVIDIA installer and the driver as installed by Jockey (the restricted drivers manager) then please wait until an updated package is available through Jockey.

THANK YOU! The --custom-run driver installs and works perfectly.
Dell XPS-630i amd64 dual 9800GT NVIDIA cards.
For some reason, jockey doesn't work on my system.

Alberto Milone (albertomilone) wrote :

I'm closing the bug report again. Let's track the issue in bug #573557.

Changed in nvidia-graphics-drivers (Ubuntu):
status: Confirmed → Invalid
Changed in nvidia-drivers-ubuntu:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers