Nvidia binary driver won't build with 2.6.31-rt kernel

Bug #413296 reported by Jami Pekkanen
60
This bug affects 11 people
Affects Status Importance Assigned to Milestone
nvidia-graphics-drivers-173 (Ubuntu)
Medium
Alberto Milone
nvidia-graphics-drivers-180 (Ubuntu)
Medium
Iain Buclaw
nvidia-graphics-drivers-96 (Ubuntu)
Medium
Alberto Milone

Bug Description

Binary package hint: nvidia-180-kernel-source

When trying to build the nVidia proprietary drivers against 2.6.31-rt using DKMS, the build fails due to undefined reference to init_MUTEX. The semaphore API seems to have been changed from 2.6.29 to remove the init_MUTEX-macro. Applying patch from [1] allows the module to build. I haven't yet tested if this actually works though.

Is there a way to report this bug also against the kernel image or headers? This could probably be also fixed by restoring the init_MUTEX macro as a custom patch in Ubuntu kernel. Perhaps patching the module is cleaner though.

Versions
nvidia-180-kernel-source 185.18.14-0ubuntu3
linux-image-2.6.31-1-rt 2.6.31-1.1
linux-headers-2.6.31-1-rt 2.6.31-1.1

[1] http://lkml.org/lkml/2009/7/30/74

description: updated
Revision history for this message
prismatic7 (chris-wenn-deactivatedaccount) wrote :

I can confirm a similar issue affecting linux-image-2.6.31-2-rt and this nvidia-180 version, as well as the updated 185.18.36 driver.

Log lines:

/var/lib/dkms/nvidia/185.18.36/build/nv.c: In function nv_alloc_file_private:
/var/lib/dkms/nvidia/185.18.36/build/nv.c:1897: error: implicit declaration of function init_MUTEX

Changed in nvidia-graphics-drivers-180 (Ubuntu):
status: New → Confirmed
Revision history for this message
Niall Creech (sevenmachines-deactivatedaccount) wrote :

seems to be a one line change to MUTEX_init to semaphore_init in nv-linux.c
-#define NV_INIT_MUTEX(mutex) init_MUTEX(mutex)
+#define NV_INIT_MUTEX(mutex) semaphore_init(mutex)
builds and works fine here with that on 2.6.31-3-rt

Revision history for this message
mokabar (tim-klingt) wrote :

it is not as easy, according to this posting [1] , the spinlock type has to be changed ...

would be great to have an updated ubuntu package, soon ...

[1] http://article.gmane.org/gmane.linux.rt.user/5065

Revision history for this message
Iain Buclaw (iainb) wrote :

I ran into this issue, and wrote a patch, and posted here: http://www.nvnews.net/vbulletin/showthread.php?p=2087519#post2087519

It is based on the patch in the previous post, but rather than a separate patch for all NV driver versions, this should work across the board. At least, I hope.

Regards
Iain

Revision history for this message
Iain Buclaw (iainb) wrote :

Oops, forgot to attach.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nvidia-graphics-drivers-180 - 185.18.36-0ubuntu3

---------------
nvidia-graphics-drivers-180 (185.18.36-0ubuntu3) karmic; urgency=low

  * Re-introduce a patch to allow the NVIDIA drivers to build against the RT
    kernel (LP: #413296

 -- Luke Yelavich <email address hidden> Sat, 26 Sep 2009 22:10:10 +1000

Changed in nvidia-graphics-drivers-180 (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Iain Buclaw (iainb) wrote :

This is the patch in the recent upload:
--- nvidia-graphics-drivers-180-185.18.36.orig/debian.binary/patches/rt_preempt_31.patch
+++ nvidia-graphics-drivers-180-185.18.36/debian.binary/patches/rt_preempt_31.patch
@@ -0,0 +1,12 @@
+diff -urN NVIDIA-Linux-x86-185.18.36-pkg1/usr/src/nv/nv-linux.h NVIDIA-Linux-x86-185.18.36-pkg1.new/usr/src/nv/ nv-linux.h
+--- nvidia-185.18.36/nv-linux.h 2009-08-15 10:58:45.000000000 +1000
++++ nvidia-185.18.36.new/nv-linux.h 2009-09-26 21:43:35.000000000 +1000
+@@ -721,7 +721,7 @@
+ #define nv_up(lock) up(&lock)
+
+ #if defined(CONFIG_PREEMPT_RT)
+-#define NV_INIT_MUTEX(mutex) init_MUTEX(mutex)
++#define NV_INIT_MUTEX(mutex) semaphore_init(mutex)
+ #else
+ #if !defined(__SEMAPHORE_INITIALIZER) && defined(__COMPAT_SEMAPHORE_INITIALIZER)
+ #define __SEMAPHORE_INITIALIZER __COMPAT_SEMAPHORE_INITIALIZER

Sorry to be obtuse, but I do not think that this bug has been fixed.

Reasons because this is what I tried first when I encoutered it.
The binary compiles, but it fails to run/black screen/kernel panic (any of those may happen).

I was running a vanilla 2.6.31 kernel with rt11 patches though, I haven't looked into the current state of what the RT patches are like in Ubuntu.

Can someone confirm that they can start X alright please?

Revision history for this message
prismatic7 (chris-wenn-deactivatedaccount) wrote :

I can confirm the error @Iain Buclaw above, with the 2.6.31-5-rt kernel in the repositories. The patch is the same as the initial fix, and leads to a failure to boot.

The failure can take the form of:

- kernel panic
- computer reset
- blank screen and system halt (occasionally so hard that removing power is the only way to unlock)

At no point do I get a working X or console.

Revision history for this message
Iain Buclaw (iainb) wrote :

OK, previous patch was for 190 drivers, so I removed.
Here is an updated patchset for the 185 drivers.

I am taking this off fixed released, and assigning it to myself.

Changed in nvidia-graphics-drivers-180 (Ubuntu):
status: Fix Released → In Progress
assignee: nobody → Iain Bucław (tinivole)
Revision history for this message
Iain Buclaw (iainb) wrote :

And attached is the patch that should now work.

Regards
Iain

Revision history for this message
Trulan Martin (trulanm) wrote :

I can confirm that the attached patch works. Thank you.

Revision history for this message
wanthalf (wanthalf) wrote :

Please, update the legacy 173-version drivers too. Thanks.

Revision history for this message
Iain Buclaw (iainb) wrote :

Since it's BugJam, I had a more detailed look at the difference between 2.6.29->2.6.31 with and without RT patches, and came up with this shorter, sweeter patch that should resolve things, at least, until the NV-devs fix things up properly.

I don't think backwards compatibility with previous kernels is going to be an issue here, so I dropped that too.

Regards
Iain

Revision history for this message
Bryce Harrington (bryce) wrote :

Hey Iain, looks like your updated patch was not attached?

Revision history for this message
Iain Buclaw (iainb) wrote :

Bryce, sorry, but I was very quick to remove it after it became evident that it affected something else in the driver when going into screensaver.

The idea was:
#ifdef CONFIG_PREEMPT_RT
#define spin_lock_init(lock) atomic_spin_lock_init(lock)
/* etc */
#endif
Removing the need for a separate nv_spin_lock_init()

Evidently, this called spin_lock_init() where it wasn't supposed to be called.
So what I posted originally is the most solid workaround we have for the time being until the nv-devs fix it internally.

Regards
Iain

Revision history for this message
Motin (motin) wrote :

Has anyone gotten a stable system yet? When _anyone_ with nvidia graphics can, say, use "Mixxx" which requires accelerated graphics and performs best with JACK in realtime mode, progress is made... But is that reality yet?

Using the latest rt and 185 nvidia drivers from karmic repos and still no-go with stable graphics... Tracking my progress and have collected some valuable information in http://ubuntuforums.org/showthread.php?p=8127639 on the subject.

Revision history for this message
Iain Buclaw (iainb) wrote :

Motin, I will look into it.

I currently use 190.32 drivers... Most current is 190.40, and I *know* that it doesn't work in the most recent stable (185.36).

Will message back with updates.

Revision history for this message
Trulan Martin (trulanm) wrote :

To get the latest NVidia-185.18.36 drivers to work on 2.6.31-9-rt, I had to disable the recently added 'fall_back_on_mtrr_if_no_pat.patch' in addition to applying Iain's patch posted above. Otherwise X fails to start.

Revision history for this message
Alberto Milone (albertomilone) wrote :

@Trulan
fall_back_on_mtrr_if_no_pat.patch shouldn't cause problems. Can you attach the output of the following command, please?

cat /proc/cpuinfo

@Iain
Is "rt_preempt_31.patch" (the one in comment #10) the patch that you recommend for this?

Revision history for this message
Iain Buclaw (iainb) wrote :

Alberto, that is what I have been using when compiling the drivers manually for the past 2 months.

As for the debian build, I noticed there was already an nvidia-rt-compat.patch file in the debian.binary/patches tree.
So I fixed that and added it to the dkms.conf.in file.

Had a try at package building, and it is uploaded here: https://launchpad.net/~tinivole/+archive/ppa
Running on it now, so I can confirm that it works all OK!

Is there a route I must take to summit it to the main repository?
If not, I attached the _recommended_ patch to this post.

the dkms.conf.in file just has this attached to the bottom of the file:
PATCH[2]="nvidia-rt-compat.patch"
PATCH_MATCH[2]="2.6.31"

Regards
Iain

Iain Buclaw (iainb)
Changed in nvidia-graphics-drivers-180 (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Trulan Martin (trulanm) wrote :

@Alberto
cat /proc/cpuinfo

processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 47
model name : AMD Athlon(tm) 64 Processor 3800+
stepping : 2
cpu MHz : 2400.000
cache size : 512 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow up rep_good pni lahf_lm
bogomips : 4809.96
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

It is easily possible I was mistaken about the cause of my recent problem. I will test the updated version and report back.

Revision history for this message
Trulan Martin (trulanm) wrote :

I upgraded the drivers using Iain's ppa. On first reboot, X failed to start (Xsplash never showed up, screen stayed blank for several minutes.) The keyboard was responsive, and I was able to get to a tty. I removed the dkms module, and edited dkms.conf to disable fall_back_on_mtrr_if_no_pat.patch. Built the dkms module, rebooted, and all was well.

Then, I removed the dkms module again, re-enabled fall_back_on_mtrr_if_no_pat.patch, rebuilt the dkms module, and rebooted. It is now working. So I guess fall_back_on_mtrr_if_no_pat.patch can be ruled out as the cause of the problem.

Revision history for this message
Iain Buclaw (iainb) wrote :

@Trulan, how was NViDIA installed before hand?

The first time you tried and failed was probably because you had residual pieces of a previous NViDIA installation still present (ie: from manual installation from the nv site), and so it was more likely an X11 failure rather than a driver failure. As your keyboard was still responsive, that further rules out an issue with the driver. (The real issue here is that after the change from init_MUTEX to semaphore_init, the system goes into a deadlock, and only a REISUB or Hard shutdown will get you out of it).

Regards
Iain

Revision history for this message
Alberto Milone (albertomilone) wrote :

@Iain
I'll see if I can get your patch in Ubuntu early next week.

pablomme (pablomme)
Changed in nvidia-graphics-drivers-180 (Ubuntu):
status: Fix Released → In Progress
Revision history for this message
Alberto Milone (albertomilone) wrote :

I have made a debdiff for the three drivers. I had to adapt the patch so that it could work with -173 and -96.

Revision history for this message
Alberto Milone (albertomilone) wrote :
Revision history for this message
Alberto Milone (albertomilone) wrote :
Changed in nvidia-graphics-drivers-180 (Ubuntu):
importance: Undecided → Medium
Changed in nvidia-graphics-drivers-96 (Ubuntu):
importance: Undecided → Medium
Changed in nvidia-graphics-drivers-173 (Ubuntu):
assignee: nobody → Alberto Milone (albertomilone)
Changed in nvidia-graphics-drivers-96 (Ubuntu):
assignee: nobody → Alberto Milone (albertomilone)
status: New → In Progress
Changed in nvidia-graphics-drivers-173 (Ubuntu):
importance: Undecided → Medium
status: New → In Progress
Revision history for this message
Trulan Martin (trulanm) wrote :

@Iain,
 I had nvidia-glx-185 installed and working. I had manually patched it. I also had removed the fall_back patch as I suspected it of causing trouble. So I went from a working driver, upgraded using the ppa, and the system locked, with a blank screen. I used REISUB, attempted booting again, and this time had a quickly flashing login screen. Halfway through REISUB, before I got to B, the flickering stopped and I was able to log in to a TTY. That's when I removed, then reinstated, the fall_back patch, as I said above.

Before all that, I had installed the 190 drivers but was unable to boot on the RT kernel. So I guess it is possible that I still had some residual stuff from that.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nvidia-graphics-drivers-173 - 173.14.20-0ubuntu5

---------------
nvidia-graphics-drivers-173 (173.14.20-0ubuntu5) karmic; urgency=low

  * debian.binary/patches, dkms.conf.in:
    - nvidia-rt-compat-legacy.patch: Update patch to allow the driver
      to work with rt kernels again (LP: #413296).
    - rt_preempt_31.patch: Add patch which was already in -185.

 -- Alberto Milone <email address hidden> Sun, 25 Oct 2009 10:45:15 +0100

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nvidia-graphics-drivers-180 - 185.18.36-0ubuntu9

---------------
nvidia-graphics-drivers-180 (185.18.36-0ubuntu9) karmic; urgency=low

  * debian.binary/patches, dkms.conf.in:
    - nvidia-rt-compat.patch: Update patch to allow the driver to
      work with rt kernels again (LP: #413296). Thanks to Iain Bucław
      for the patch.

 -- Alberto Milone <email address hidden> Sun, 25 Oct 2009 10:25:29 +0100

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nvidia-graphics-drivers-96 - 96.43.13-0ubuntu6

---------------
nvidia-graphics-drivers-96 (96.43.13-0ubuntu6) karmic; urgency=low

  * debian.binary/patches, dkms.conf.in:
    - nvidia-rt-compat-legacy.patch: Update patch to allow the driver
      to work with rt kernels again (LP: #413296).
    - rt_preempt_31.patch: Add patch which was already in -185.

 -- Alberto Milone <email address hidden> Sun, 25 Oct 2009 10:37:27 +0100

Changed in nvidia-graphics-drivers-173 (Ubuntu):
status: In Progress → Fix Released
Changed in nvidia-graphics-drivers-180 (Ubuntu):
status: In Progress → Fix Released
Changed in nvidia-graphics-drivers-96 (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Miguel Angelo Sepulveda (cattus) wrote :

In my system, tho, I'm experiencing complete instability with the 185.18.36 on the 2.6.31-9-rt kernel from Ubuntu Studio 9.10. I'm experiencing this running a Nvidia Go 7700 with compiz turned on with 'Extra' effects.

Changing to 'None' appears to solve some issues, at least so far I've noticed in 2 days running the box.

Also, on the default instalation there were problems with artifacts rendering the system unusable to a complete mess on screen with windows shaped like triangles flying around.

Turned off powermizer with the PerfLevelSrc=0x2222 fix and problems remained only on instability like blinking, compiz changing automaticly to "None" setting and random freezes that eventually turn into a system crash once I used a scroll bar for too long.

I'll gladly welcome any advice.

Regards,
Miguel Angelo

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers