Installing nvidia-current fails if ccache is enabled

Bug #631007 reported by Theodore Ts'o
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
NVIDIA Drivers Ubuntu
New
Undecided
Unassigned
nvidia-graphics-drivers (Ubuntu)
New
Undecided
Unassigned

Bug Description

Installing nvidia-current (256.53-0ubuntu1) dies for me if ccache is enabled. I'm guessing maybe there's a race condition in the makefiles, and something is getting compiled with "make -j2"? I noticed that I managed to getting to compile once by repeating attempts to compile the nvidia driver using dkms (I kept on running apt-get install -f). I also noticed that once I set environment variable CCACHE_DISABLE, it worked reliably.

Revision history for this message
Theodore Ts'o (tytso) wrote :

On my T410 system with a quad-core Intel Core i7 processor, I confirmed I was able to replicate the failure if CCACHE_DISABLE is not set:

<tytso.root@tytso-glaptop> {/home/tytso}
2051# unset CCACHE_DISABLE
<tytso.root@tytso-glaptop> {/home/tytso}
2052# dpkg-reconfigure nvidia-current
Removing all DKMS Modules
Done.
dpkg-trigger: dpkg-trigger must be called from a maintainer script (or with a --by-package option)
update-initramfs: Generating /boot/initrd.img-2.6.35-20-generic
update-initramfs: Generating /boot/initrd.img-2.6.34-5-generic
Loading new nvidia-current-256.53 DKMS files...
Building for 2.6.34-5-generic and 2.6.35-20-generic
Building for architecture x86_64
Building initial module for 2.6.34-5-generic

Error! Bad return status for module build on kernel: 2.6.34-5-generic (x86_64)
Consult the make.log in the build directory
/var/lib/dkms/nvidia-current/256.53/build/ for more information.

I've attached the make.log file for your information.

The nvidia modules compile successfully if I run the command "export CCACHE_DISABLE=1" before running the command "dpkg-reconfigure nvidia-current".

Revision history for this message
Alberto Milone (albertomilone) wrote :

I'm not really sure about this as I can't reproduce the problem with my i-core 7 here (and ccache is installed).

Do you have some environment variable related to ccache in your .bashrc?

I'm also subscribing Nvidia to see if they know what's going on.

Revision history for this message
Theodore Ts'o (tytso) wrote :

Ah, yes. I also had CCACHE_PREFIX set:

CCACHE_PREFIX=distcc

I don't have any distcc servers enabled, which for everything else causes a fallback to localhost, but apparently this is what was causing the problem; unsetting this environment variable was also enough to cause the nvidia modules to successfully compile.

I'm not sure why it makes a different to the nvidia module. I've built kernels with this set, and it doesn't seem to make a difference.... (the reason why I turn it on even when I have no distcc cluster available is that having this variable set changes how ccache calculates the hash for its cache, so when I *do* have a distcc cluster available, I don't end up invalidating everything in my ccache cache).

Revision history for this message
Alberto Milone (albertomilone) wrote :

Are you still able to reproduce the problem in Maverick?

I tried exporting CCACHE_PREFIX=distcc but I couldn't reproduce the problem ($CCACHE_DISABLE was unset):

root@alberto-desktop:/home/alberto# export CCACHE_PREFIX=distcc
root@alberto-desktop:/home/alberto# dpkg-reconfigure nvidia-current
Removing all DKMS Modules
Done.
dpkg-trigger: dpkg-trigger must be called from a maintainer script (or with a --by-package option)
Loading new nvidia-current-256.53 DKMS files...
Building for 2.6.35-15-generic and 2.6.35-22-generic
Building for architecture x86_64
Building initial module for 2.6.35-15-generic
Done.

nvidia-current.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/2.6.35-15-generic/updates/dkms/

depmod....

DKMS: install Completed.
Building initial module for 2.6.35-22-generic
Done.

nvidia-current.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/2.6.35-22-generic/updates/dkms/

depmod....

DKMS: install Completed.
root@alberto-desktop:/home/alberto# echo $CCACHE_DISABLE

root@alberto-desktop:/home/alberto#

Revision history for this message
Theodore Ts'o (tytso) wrote :

Yes, I'm able to reproduce it, and I even understand why it's failing now.

The problem is that the kernel makefile is using the -MD option to request that the makefile dependency information gets dumped to file (i.e., /var/lib/dkms/nvidia-current/260.19.12/build/.nv.o.d). But ccache 2.4 doesn't understand the -MD command, so it doesn't emulate it if nv.o is in the ccache's cache. Hence the .nv.o.d file is missing, and hence the fixdep command executed by the Kernel makefile fails, and hence, the dkms build command fails.

This is fixed in ccache 3.0 (and ccache 3.1 is the latest version); by adding teaching ccache how to deal with the gcc option -MD.

The easy workaround for now is to make sure the environment variable CCACHE_DISABLE is set before installing a module that uses dkms, such as nvidia-current

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.