DKMS build fails, but package upgrade still successful

Bug #438398 reported by Pauli Virtanen on 2009-09-28
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
dkms (Ubuntu)
Undecided
Unassigned
Karmic
Undecided
Unassigned
nvidia-graphics-drivers-180 (Ubuntu)
High
Unassigned
Karmic
Undecided
Unassigned

Bug Description

The nvidia-185-kernel-source (185.18.36-0ubuntu3) package upgrade succeeds, even though the DKMS build fails.

I believe the package should not be marked as configured in this case, especially as otherwise the user will notice the problem only on the next boot.

Example failure case (due to DKMS bug #438393):
-----------------
Configuring: nvidia-185-kernel-source (185.18.36-0ubuntu3) ...
Removing all DKMS Modules
Done.
Adding Module to DKMS build system
driver version= 185.18.36
Doing initial module build

Error! Bad return status for module build on kernel: 2.6.31-11-generic (i686)
Consult the make.log in the build directory
/var/lib/dkms/nvidia/185.18.36/build/ for more information.
Installing initial module

Error! Could not locate nvidia.ko for module nvidia in the DKMS tree.
You must run a dkms build for kernel 2.6.31-11-generic (i686) first.
Done.
-----------------

Extract from make.log
-----------------
DKMS make.log for nvidia-185.18.36 for kernel 2.6.31-11-generic (i686)
ti 29.9.2009 00.03.29 +0300

The C compiler '/home/pauli/bin/gcc-cached' does not appear to be able to
create executables. Please make sure you have
your Linux distribution's libc development package
installed and that '/home/pauli/bin/gcc-cached' is a valid C compiler
name.

*** Failed CC sanity check. Bailing out! ***

make: *** [select_makefile] Virhe 1
-----------------

Bryce Harrington (bryce) on 2009-11-04
Changed in nvidia-graphics-drivers-180 (Ubuntu):
status: New → Triaged
importance: Undecided → High
Bryce Harrington (bryce) wrote :

Interesting, you're right that in such a case the package upgrade should not be marked as succeeding. This may be exacerbating the situation when the package fails to build for other reasons.

Mario Limonciello (superm1) wrote :

So the problem with declaring the package as failed if the DKMS build failed is that it may actually pass or fail depending on how far along into the updates you are.

Say you are updating to a new linux-headers with a new ABI at the same time as installing the NVIDIA package.

Well if the NVIDIA package is processed first, the headers aren't yet installed, so the package will fail during postinst, but as soon as the headers are loaded, the kernel postinst runs and the modules get successfully built.

Pauli Virtanen (pauli-virtanen) wrote :

Ideally, this known and safe cause of build failure could be distinguished from real build failures. Does DKMS check that the headers are actually available (I think it does not)?

In any case this seems to require some special handling, as nvidia package control file probably cannot instruct dpkg to take care of unpacking updated header files sufficiently early.

Bryce Harrington (bryce) wrote :

So let me see if I understand correctly.

You do an update which is going to upgrade nvidia and then linux-headers. If linux-headers is not installed when nvidia is processed, the nvidia build would (and should) fail. In this case nvidia should be set to depend on linux-headers so it only gets processed after linux-headers.

But in the case that you do have an older linux-headers installed, nvidia will be built against *that*. Then linux-headers gets updated to a new (incompatible) version. Yet in this case, you should still have a properly built nvidia module, no? I guess I'm not understanding why the nvidia build would be failing in this case.

Also, what script generates the output shown in the description of this report? I grepped through the nvidia-graphics-drivers-180 source package but didn't see it. Is it from dkms directly?

Bryce Harrington (bryce) wrote :

<cafetiere> bryce the dkms integration gets retriggered by linux-image and by linux-headers
<cafetiere> so that when the package installs itself it generates for the current kernel
 when the kernel or headers completes it triggers a dpkg dkms trigger which rebuilds for that kernel

Bryce Harrington (bryce) wrote :

If perhaps the issue is that during install it builds only against the currently running kernel, then perhaps it needs to have the equivalent of dkms --all, to rebuild against all installed kernels.

<GrueMaster> ok, then our wrapper needs a script that does a "for k in `ls /usr/src/linux-headers-*`; do make SYSSRC=$k module;done "

Or maybe, if the user has a lot of kernels installed, just build for the most recent N kernels, or something.

Bryce Harrington (bryce) wrote :

<GrueMaster> The only other issue is this same problem could come up just from installing a new kernel, unless that triggers dkms.
 Which brings me back to the second solution. Fix dkms and/or upstart to wait until dkms is finished.
<smb> ITYM something like for kernel versions ; do dkms {build|install} -m $module -v $version -k $kernel-version

Bryce Harrington (bryce) wrote :

<cafetiere> i think the suggestion was that we could make dkms an upstartt job and let it retrigger the card appeatring
 i think it would have been syncronous before and not now

Bryce Harrington (bryce) wrote :

It's starting to sound like for now we should document the issue and workaround, and focus on getting a fix in for Lucid. If the solution is simple enough and proves effective maybe we can consider an SRU at that point.

One bit I still don't quite understand, is if this theory is correct, then after experiencing the race condition failure, wouldn't the user power cycle and by that point nvidia.ko would have completed building, so the system would boot okay after that?

Bryce Harrington (bryce) wrote :

I'll use bug #438398 for the release notes since it seems to be the best canonical description of the problem.

Bryce Harrington (bryce) wrote :

sorry, I should have said I'll use bug #474917 for the release notes

Alberto Milone (albertomilone) wrote :

Honestly I'm not sure what else we can do in the nvidia package other than checking the existence of the kernel module.

In the postinst script of the nvidia package we check the exit status of the script that we use to build the module:

case "$1" in
        configure)
  /usr/lib/dkms/common.postinst $NAME $CVERSION /usr/share/$PACKAGE_NAME $ARCH $2
  exit $?
        ;;

Mario Limonciello (superm1) wrote :

I'm going to invalidate the DKMS tasks. This should be less of a worry in Lucid because DKMS will be more resilient to build failures and provides an upstart task before GDM gets a chance to go at the system.

Alberto's recommendation about checking for the existence of the kernel module is about all I can think of too at this point, and seems like a good solution on a per package basis.

Changed in dkms (Ubuntu):
status: New → Invalid
Changed in dkms (Ubuntu Karmic):
status: New → Invalid
FreeUser (ddwqrbgrbfig) wrote :

This problem drives me crazy :-(
Whenever a new kernel gets installed I have to boot up with the old kernel and remove the nvidia driver, then boot into the new kernel and install the driver again.

PLEASE fix this for Karmic

Bryce Harrington (bryce) wrote :

I believe this issue is a thing of the past. In any case, if it still occurs with Ubuntu precise 12.04 or newer, we'd need an updated bug report. Please file a new report if this still happens.

Changed in nvidia-graphics-drivers-180 (Ubuntu):
status: Triaged → Fix Released
Rolf Leggewie (r0lf) wrote :

Karmic is past End of Life, and is no longer supported. As such, this bug is being marked "Won't Fix" against the Karmic bug task.

Rolf Leggewie (r0lf) wrote :

Karmic has long since stopped to receive any updates. Marking the Karmic task for this ticket as "Won't Fix".

Changed in nvidia-graphics-drivers-180 (Ubuntu Karmic):
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers