random x server freeze (nvidia)

Bug #365885 reported by Zero on 2009-04-24
26
This bug affects 2 people
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-180 (Ubuntu)
Undecided
Unassigned

Bug Description

Many people have reported the "random X freeze" with "put your card here". I have been hunting down this bug in forums and making some durability tests in my pc.

Ok the problem is very random.... yes it is very random, sometimes 5 min. after boot sometimes it can take up to 3 or 4 days to trigger, however I have hunted down the flash player to trigger the bug quicker than anything else, down to a couple of minutes of hd video playing in flash might trigger the bug, so I am suspecting some video memory leak (not ram but vram) somewhere in the X server or the driver, bug happens with and without compiz enabled.

Most of times just the X server freezes however the mouse still moves, also corruption is visible and many flickers on screen and the mouse might freeze after. Some times the hangup is complete no SSH, sometimes machine is accessible via SSH, and rarely switching to console is posssible, however returning to the X server freezes completely the machine.

Extra Info...
Ubuntu 9.04 Jaunty (happened in intrepid)
Nvidia driver nvidia-glx-180 (happened with 177 also but less).
Video card Zogis nvidia 8500GT (the vbios reports 256mb but nvidia-settings reports 512mb)
MoBo: Intel DG31PR
Memory: 2gb dual channel kit at 800mhz (there is no way to disable dual channel).

[lspci]
01:00.0 VGA compatible controller: nVidia Corporation GeForce 8500 GT (rev a1)

Zero (pupp3t-mast3r) wrote :
Zero (pupp3t-mast3r) wrote :
Zero (pupp3t-mast3r) wrote :
Zero (pupp3t-mast3r) wrote :
Zero (pupp3t-mast3r) wrote :
Zero (pupp3t-mast3r) wrote :
Zero (pupp3t-mast3r) wrote :
affects: ubuntu → nvidia-graphics-drivers-180 (Ubuntu)
Zero (pupp3t-mast3r) wrote :

I rolled back to the 173 driver and the machine has been rock solid for 4 days of uptime using flash, flash hd video, hd video under mplayer vlc totem etc., games and even tried svn versions of XBMC. and no problems so far.

I can also confirm this bug with a zogis card 7300gt 256mb. running an AMD64 X2 (32bit Jaunty & stock 32 bit kernel) on a Biostar GeForce 6100 AM2 Mobo. I just updated my family's machine and got the same results with the 180 driver, rolling back to the 173 driver solved the freezes but alas no VDPau.

I forgot the first system (The intel) is also running 32 bits versions of Ubuntu Jaunty and stock Jaunty kernel.

Zero (pupp3t-mast3r) wrote :

This bug may be related to bug #363135 here I can see also my vram doubled in booth machines and also in a dell laptop, inspiron 1521 nvidia 8400gs mobile.

However the freeze have never triggered in the laptop only in the machines with the zogis cards.

- Intel with Zogis Nvidia 8500gt 256mb -> Xorg log and nvidia settings says 512mb Freezes.
- AMD64 with Zogis Nvidia 7300gt 256mb -> Xorg log and nvidia settings says 512mb Freezes.
- Dell Inspiron 1521 Nividia 8400gt M 128mb -> Xorg log and nvidia settings says 256mb No Freezes so far.

I am suspecting a video memory leak because if the machine is just left with the desktop showing, the bug triggers after some days but when used heavily (lets say normally) Openoffice, firefox flash, etc, the bug triggers quicker from a couple of hours to some minutes.

And last I am also suspecting the brand of the cards (Some kind of OC on the vbios or memory tweaks) but this is something that should be verified with the same and different models of zogis cards.

Zugol (franck-jeandinot) wrote :

This happends to me too with GeForce 8500Gt 512mb and 180.22 nvidia driver.
Full system freeze (video/sound/mouse/keybord/lcd display...). *
Most of time it ahppends when I do nothing special.
I tried 180.53 but the same happends.

Luca Antonelli (luca-anto) wrote :

The same for me with GeForce 7300 LE 512 mb and 180 driver.
Triggering the bug depends from use: playing many flash video freezes tha manchine in a few seconds, while a moderated use seems not to tringger the bug (at least for some hours).

Going back to the 173 driver solved the problem.

Bryce Harrington (bryce) on 2009-05-06
description: updated

I tried Zero's solution of rolling back to the 173 version of the NVidia Driver using the "Hardware Drivers" configurator. This fixed my problem. The symptoms I was having as follows:

    Configuration/Environment:

    1. Compiz
    2. Emeral Theme Decorator
    3. Cube Rotation, Reflection, and many other "Effects" enabled

    Symptom:

    Any time I "Maximized" a window the display completely froze. I was unable to switch terminals and not even Ctrl-Alt-Backspace worked (I had re-enabled Ctrl-Alt-Backspace using the 'dontzap' command). The only way to make anything at all happen was to power-off the machine and perform and hard re-boot.

    Of course, re-booting like this tended to corrupt the "Tracker" indexes. What a mess!

    Anyway, rolling back to version 173 of the Nvida driver so far has fixed the problem. Yay! Thanks Zero!

   THIS IS A HORRIBLE BUG! THIS IS THE LAST NVIDIA VIDEO CARD MACHINE I WILL EVER PURCHASE!

<rant>
    PLEASE, PLEASE, PLEASE, INTEL/AMD PRODUCE GOOD FLOSS DRIVERS ASAP! I'M TIRED OF THIS KIND OF THING BECAUSE OF CLOSED-SOURCE CRAPPY DRIVERS! Nvidia, SERIOUSLY, consider your customers, can't things be better than this? Is this the best you can do?
</rant>

Zero (pupp3t-mast3r) wrote :

Bug triggered again with driver 173, however I had the largest uptime in many days... actually 11 days of uptime with everything working good: Flash, flash video (hd), hd video with mplayer, vlc, compiz, openoffice, etc.

It seems like if it is a memory leak it is less evident in 170 releases... also driver 173 have problems with compiz shadows (panel and menus) when they are configured beyond 9 pixels, just showing a white texture in place of the shadow.

I am attaching the nvidia bug report that I managed to catch after the freeze via ssh, it should have more information about the bug.

Zero (pupp3t-mast3r) wrote :

Found the possible cause but I'm not sure what was the origin of the problem so far. A bumped capacitor seems to have been triggering the bug but I am not sure if some faulty driver or something lead to the capacitor fail.

Zero (pupp3t-mast3r) wrote :
Zugol (franck-jeandinot) wrote :

As a workaroud I tried kernell 2.6.30 (early released by kernell team) an I works fine since several days.

Here for the deb packages : http://kernel.ubuntu.com/~kernel-ppa/mainline/v2.6.30/
And here for compatible nvidia drivers : https://launchpad.net/~ubuntu-x-swat/+archive/x-updates

If you give a try to this solution you could encounter problems with suspend/hibernate/resume...

Devilkaka (garywong) wrote :

Experience the same problem with both driver: 173 and 180
System: AMD 64 x2
RAM: 4GB

Video card: NVIDIA GeForce 9500GT
RAM: 512 MB
DUAL Screen

<----GPU temp is between 40 ~ 49 degree on all test --->
Driver version 180 -> Freeze up very often
(Successful reproduce the problem by open up multiple webpage with you tube video playing)

Driver Version 173 -> Freeze up but not so often
(Hit and miss reproduce the problem by open up multiple webpage with you tube video playing)

Bryce Harrington (bryce) wrote :

I've posted a new version of the -nvidia driver to our xorg-edgers PPA,
would you mind testing it either on Jaunty or Karmic and see if it
resolves this bug?

Get nvidia-graphics-drivers-180 - 185.18.14 here:

  https://edge.launchpad.net/~xorg-edgers/+archive/ppa

Changed in nvidia-graphics-drivers-180 (Ubuntu):
status: New → Incomplete
Zero (pupp3t-mast3r) wrote :

Actually I have been working perfectly with driver version 185 from Brandon Snider:

http://ppa.launchpad.net/brandonsnider/ppa/ubuntu.

However due to the capacitor fail I changed my card to a brand new 9600gt from biostar which is working perfectly VDPAU to! Even ram is detected ok and no doubled, it is 512mb ddr3 and nvidia-settings shows 512 so It is consistent now.

By the way I also updated drivers (Same from Brandom Snider) for the AMD computer running with the zogis 7300gt and no freezes so far max uptime is 2 days... it is my family's pc so they don't like to have it powered all the time but 2 days is more than enough for them. (common usage, firefox, flash, flash hd and non hd video, hd and non hd video (mplayer), Skype Video and openoffice apps).

What still puzzles-me is the capacitor thing:

1 It was faulty starting off, but if it was then why drivers 173 and prior delayed so long to trigger or even didn't. In contrast 177 or 180 that triggered so quickly.

2 If the fault was the driver the it might be brand-model-driver-voltages-etc. specific, only certain combinations might trigger it

3 Then might a driver can lead to the component fail on certain circumstances if it is the case then I suppose the driver wasn't dealing with proper hardware so I am suspecting from-factory tweaks and tunings in the card that somehow exceeded nvidia's safety standards and recommendations, I mean... if that thing wasn't solid It might have leaked over plenty of my computer parts damaging them.

I still wouldn't call this bug closed although it seems that new 185 driver version fixes it, If the others having the same issues confirm that the drivers are working and fixes the freeze then go ahead.

How can I best/safely test this? Add the PPA to my software repos then
use the "Hardware Drivers" utility to select it? Should that work OK?

On Fri, 2009-06-26 at 03:55 +0000, Bryce Harrington wrote:

> I've posted a new version of the -nvidia driver to our xorg-edgers PPA,
> would you mind testing it either on Jaunty or Karmic and see if it
> resolves this bug?
>
> Get nvidia-graphics-drivers-180 - 185.18.14 here:
>
> https://edge.launchpad.net/~xorg-edgers/+archive/ppa
>
>
> ** Changed in: nvidia-graphics-drivers-180 (Ubuntu)
> Status: New => Incomplete
>

--
Gerald E Butler <email address hidden>
CSI

Zero (pupp3t-mast3r) wrote :

I did it that way, however Brandon just added yesterday the nvidia driver 190 to his repo (leaked pre-beta version) so it may conflict with some libraries, in special libxine-vdpau since he is compiling for vdpau testing purposes, so i would better use synaptic or aptitude to install the 185 drivers and libraries manually instead of Hardware Drivers manager.

However might give it a try with the drivers manager, if you find any problems with package versions use a package manager to correct them.

Zero (pupp3t-mast3r) wrote :

Nobody has complained again for this bug, also it seems like version 185 of the nvidia drivers fixed this bug for me whatever it was the reason.

Since nobody have reported the opposite I think that release 185 of Nvidia drivers already fixed this bug for everyone.
Thanks for your time!

Changed in nvidia-graphics-drivers-180 (Ubuntu):
status: Incomplete → Fix Released
Irenux (irenux) wrote :

Well, on my system, the Nvidia 185-driver doesn't solve aything. I'm running an AMD64 dual core system with Nvidia 6600 256 MB graphics card. Did a whole bunch of tweaks I found here and there:
- disabled SLI in Xorg.conf
- installed and activated irqbalance
- set irq settings to manual in the BIOS, since my graphics card shared an irq with my 2nd sound card (so this isn't the case anymore)
- uninstalled compiz, as far as possible. I need to find how to disable compositing at all still, I guess it's Xorg.conf
- created an /etc/modprobe.d/nvidia file with the following line: options nvidia NVreg_RegistryDwords="PerfLevelSrc=0x2222"
- added boot option clocksource=hpet and made sure hpet is enabled in the BIOS
- tried several screen resolutions, the nv driver, Debian Lenny, Gnome instead of KDE
- same problem with my previous system, which was a Nvidia motherboard with AMD64 single core

Maybe it's something with Xorg after all, perhaps the xrender library?

Anyway, as far as I'm concerned, it's not solved at all.

Irenux (irenux) wrote :

@ Zugol: Thanks, I tried but with the kernel you mention but I only get a blank screen with a blinking cursor with that one.

This is what I did instead:

- upgraded the kernel to 2.6.31-xx from the Karmic repsitory
- added the following kernel boot parameters to /boot/grub/menu.lst: clocksource=hpet hpet=force pci=nommconf idle=poll
- installed the binary 190 driver from Nvidia
- installed Xfce4 (lighter desktop)
- and all of the above stuff.

There is much improvement, but still the system hangs from time to time, duration differs. In the attachment my latest Xorg.0.log.

Hopefully it's of some use. I tried to google on the WAIT-stuff, but I haven't found anything useful yet.

Zugol (franck-jeandinot) wrote :

In my case, the things I have to do to fix this bug was :

- install 3 packages from kernel team required for my system
     linux-headers-2.6.30-02063004-generic_2.6.30-02063004_i386.deb
     linux-headers-2.6.30-02063004_2.6.30-02063004_all.deb
     linux-image-2.6.30-02063004-generic_2.6.30-02063004_i386.deb

-add the X update repo to get my nvidia driver (185.18.14) still working with this kernel. (190.18 is still in beta)

But as I said before, I think "it only cure cough but not illness"...

I'm running gnome without any change in grub. I was using restricted modules to get my iMon VFD works, but there's no restricted modules for 2.6.30 kernel, maby the freeze was coming from my VFD (???)...

James Cobban (jamescobban) wrote :

I am encountering input problems with the Nvidia ---185 driver on my laptop. The symptoms are not exactly like those described by others on this thread.

The mouse moves the graphics cursor, but pressing a mouse button has no effect.

Most keyboard input still works. For example to shutdown an app I can Alt-F, use the up and down arrows to select the quit item, and then press Enter to select the item. However the Ctl-Q shortcut does NOT work.

Ctl-Alt-F1 switches to the console. But since I am a newbie I don't know what I should do in the console. The only thing I know is "sudo reboot".

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers