nvidia drivers 185.xx compile into kernel 2.6.28 instead of 2.6.31 on update from jaunty to karmic

Bug #474917 reported by Manjul Apratim
50
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Invalid
Undecided
Unassigned
dkms (Ubuntu)
Fix Released
Critical
Alberto Milone
Declined for Jaunty by Alberto Milone
Declined for Karmic by Alberto Milone
Lucid
Fix Released
Critical
Alberto Milone
nvidia-graphics-drivers-173 (Ubuntu)
Fix Released
Critical
Alberto Milone
Declined for Jaunty by Alberto Milone
Declined for Karmic by Alberto Milone
Lucid
Fix Released
Critical
Alberto Milone
nvidia-graphics-drivers-180 (Ubuntu)
Fix Released
Critical
Alberto Milone
Declined for Jaunty by Alberto Milone
Declined for Karmic by Alberto Milone
Lucid
Fix Released
Critical
Alberto Milone
nvidia-graphics-drivers-96 (Ubuntu)
Fix Released
Critical
Alberto Milone
Declined for Jaunty by Alberto Milone
Declined for Karmic by Alberto Milone
Lucid
Fix Released
Critical
Alberto Milone

Bug Description

I did not find an existing bug report on this issue even though certain other reports exist regarding the nvidia-185 package that affect me, so I am filing a new one. While upgrading from jaunty to the karmic RC (on the 28th of October, a day before the final release), during the installation of nvidia-185 (upgrading from nvidia-180.xx, the latest on jaunty), I got a warning message to the effect "this driver has no modules in kernel 2.6.31, installing in 2.6.28", and installation continued in the existing 2.6.28 kernel I had; this sound like a potential problem to me, since I (and I am sure many others) do not like to keep older kernels on the system. I am therefore inclined to believe this is a packaging bug in the existing nvidia drivers (185.xx) available in the karmic upgrade. I did not save the log, since I had never filed a bug report and I was all too excited to check out what karmic was like (the upgrade did complete with a few hiccups), but I believe that the problem is completely reproducible from a system with the latest nvidia drivers (180.xx) in jaunty and running kernel 2.6.28, upon an upgrade. I am not aware if this issue has been fixed in the current version of karmic.

After the upgrade I was plagued by the "powermizer" problem in:

https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-180/+bug/456637

which is a different story altogether, and probably unrelated to the issue at hand. I am inclined to think that the nvidia driver v185.xx is itself the cause of this and until replaced by the newer 190.xx it will continue to be so.

(While filing this report, for some reason I was unable to select nvidia-graphics-drivers-185, with the error message that this package does not exist in Ubuntu, even though it shows up in the search)

Philip Muškovac (yofel)
affects: ubuntu → nvidia-graphics-drivers-180 (Ubuntu)
Revision history for this message
Bryce Harrington (bryce) wrote :

Okay, you've mentioned a few different problems, it is generally best to focus just one issue per bug report. The powermizer bug obviously already has a bug report. The 180/185 confusion is just that - we left the package in launchpad named 180 even though the driver itself is 185 (we didn't care to have to move all the bug reports from package to package); we'll fix the naming to not show numbers in lucid.

Anyway, so let's focus this just to the unique issue mentioned in the title. My guess is that the driver failed to *build* against the 2.6.31 kernel, rather than that it is failing during install. Could you please find and attach your build log for nvidia? The file will be found someplace like:

 /var/lib/dkms/nvidia/185.18.36/build/make.log

The numbered directory may be different depending on what version you have installed, but if you poke around it should be somewhere like that.

Changed in nvidia-graphics-drivers-180 (Ubuntu):
status: New → Incomplete
Revision history for this message
Daniel Quinlan (daniel-chaosengine) wrote :

I got bitten by this upgrading 9.04->9.10 too

after the upgrade the box booted into 2.6.31 and gdm failed to start and the console just flashed

# uname -a
Linux nyx 2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16 14:05:01 UTC 2009 x86_64 GNU/Linux

the doc, /usr/share/doc/nvidia-185-kernel-source/README.Debian, looks to be quite out of date
I tried 'm-a a-i nvidia-kernel-source' but that complained about nvidia-kernel-source being a virtual package with no installation candidate.
I ended up running 'make ; make install' just to get the box working again

a bit of searching and I discovered mention of dkms (not mentioned in the README.Debian)

# dkms status
nvidia, 185.18.36, 2.6.28-11-generic, x86_64: installed

I noticed dkms was in /var/lib/dpkg/info/nvidia-185-kernel-source.postinst so I ran

# dpkg-reconfigure nvidia-185-kernel-source
Removing all DKMS Modules
Done.
Loading new nvidia-185.18.36 DKMS files...
Building for architecture x86_64
Building initial module for 2.6.31-14-generic
Done.

nvidia.ko:
Running module version sanity check.
 - Original module
 - Installation
   - Installing to /lib/modules/2.6.31-14-generic/updates/dkms/

depmod....

DKMS: install Completed.

# dkms status
nvidia, 185.18.36, 2.6.31-14-generic, x86_64: installed

now it works

I don't see how this was going to work during an upgrade that included installing a new kernel

Revision history for this message
Manjul Apratim (manzdagratiano) wrote :

Apologies for the delayed reply - I have been tied up all day; also for the multiple issue report (I was not aware of the 180/185 nomenclature issue). The trouble is that after the powermizer havoc I was unable to work on my system, and I could not afford that at the time, so I had reformatted it with jaunty. Hence no log saved (bummer!). The driver indeed did fail to build against the kernel; the install had happened properly since the graphics were up. The only thing I can appeal is that this bug should be easily reproduced upon an upgrade; I dare not try it at the moment because I would not have the time to deal with the issues at the moment. I hope the log from Dan Quinlan above is useful.

Revision history for this message
Bryce Harrington (bryce) wrote :

manjul, ok thanks for getting back to us. We can focus this bug report on Daniel's version of the issue.

Daniel, looking at your make.log it seems to be working successfully and it mentions it's building against 2.6.31, so it doesn't quite indicate where things failed... Was that make.log what was there before, or was that from after when you ran dpkg-reconfigure?

It is starting to sound like perhaps there is a race condition. I.e., at time of install while you're upgrading to 2.6.31, the 2.6.28 kernel is installed so DKMS builds against that. You reboot into 2.6.31 and X can't find an nvidia.ko for this version of the kernel.

So, I think this might be a dupe of bug #438398. Does that sound accurate?

Revision history for this message
Manjul Apratim (manzdagratiano) wrote :

Yup; seems to me to be exactly the case as in bug #438398 :) Indeed I believe the nvidia build does not fail even when against the older kernel, but the fact that it would be built against the older kernel before the new linux-headers thereby requiring the older one to remain in place is the irritant.

Revision history for this message
Daniel Quinlan (daniel-chaosengine) wrote : Re: [Bug 474917] Re: nvidia drivers 185.xx compile into kernel 2.6.28 instead of 2.6.31 on update from jaunty to karmic

On Thu, Nov 5, 2009 at 6:10 PM, Bryce Harrington
<email address hidden> wrote:
> manjul, ok thanks for getting back to us.  We can focus this bug report
> on Daniel's version of the issue.
>
> Daniel, looking at your make.log it seems to be working successfully and
> it mentions it's building against 2.6.31, so it doesn't quite indicate
> where things failed...  Was that make.log what was there before, or was
> that from after when you ran dpkg-reconfigure?

it was after the dpkg-reconfigure. looking at the output of
dkpkg-reconfigure it looks like it cleaned up whatever was there
previously. ie. there isn't a directory for the previous kernel

> It is starting to sound like perhaps there is a race condition.  I.e.,
> at time of install while you're upgrading to 2.6.31, the 2.6.28 kernel
> is installed so DKMS builds against that.  You reboot into 2.6.31 and X
> can't find an nvidia.ko for this version of the kernel.

yup, that's sounds right

> So, I think this might be a dupe of bug #438398.  Does that sound
> accurate?

no, because in that bug the build failed. in my case the build
succeeded but for the 2.6.28 kernel.

if I'd booted into 2.6.28 things would've been fine. but when 2.6.31
installed it became the default kernel.

is there a way for a new kernel install to trigger a rebuild or a
reconfigure of another package?

--
regards,
Daniel Quinlan

Revision history for this message
Manjul Apratim (manzdagratiano) wrote :

Hmmm yes I don't understand why the build failed in the other bug, since it succeeded in my case too, and for the 2.6.28 kernel. However, if 2.6.31 does become the default kernel when installed, then the infamous gdm-respawn-flickering-problem should invariably show up; in my case gdm started fine and the graphics were up - it is powermizer which plagued me. I did not check which kernel I was booted in though, since the system skips the menu on reboot. But if I were in 2.6.28, that would imply 2.6.31 was not set as the default kernel - also strange!

Revision history for this message
Bryce Harrington (bryce) wrote :

Alright, we'll keep these bug reports separate, although I have a feeling they share a common root cause.

I've another idea to add to the mix. It is possible that the nvidia module *does* get triggered to rebuild by the new kernel - this is bread and butter work for DKMS and why it exists at all. However, it takes a little time to do this. In the past, Ubuntu booted slowly enough that it could do this in parallel to everything else and by the time X started it'd be there. But now with the boot improvements due to upstart, and especially moving X's startup earlier in the boot cycle, DKMS may not always complete prior to when X tries to start. So X doesn't see nvidia.ko and crashes and burns.

The one bit that doesn't seem to fit this theory, is I assume you guys see it flashing the screen for a little bit and then power cycle. Wouldn't that have been sufficient time for nvidia.ko to complete its build, and so then when you restart it should have booted up okay. Yet it sounds like you needed to do a dpkg-reconfigure in order to trigger a whole new dkms run. Do you have any insights as to this bit?

Meanwhile, it's sounding pretty strongly like this is a dkms race condition with upstart, so I'm going to dig in deeper that direction.

Revision history for this message
Bryce Harrington (bryce) wrote :

For now, please add something like the following to the release notes:

"""
Due to the improvements in boot speed, in some upgrades Ubuntu may boot into graphics mode before it has finished installing the nvidia kernel module for the 2.6.31 kernel. This race condition can result in the graphics failing to start, leaving the user at a blank or blinking screen, or in the low graphics failsafe. In some cases a simple reboot will restore the system. An alternative work around is to log into a console and run the command `sudo dpkg-reconfigure nvidia-185-kernel-source`.
"""

Bryce Harrington (bryce)
Changed in nvidia-graphics-drivers-180 (Ubuntu):
status: Incomplete → Triaged
importance: Undecided → Critical
Revision history for this message
Daniel Quinlan (daniel-chaosengine) wrote :

AFAICT this had nothing to do with boot speed. the new nvidia module got built for 2.6.28 successfully *before* the 2.6.31 kernel/headers were installed.

I didn't get low graphics fail-safe. the console flashed constantly and was completely unusable, not responding to any key press.
I ssh'd into the machine and tinkered for quite a while. if DKMS was supposed to run after the first reboot it would've had ample time.

It would be preferable for DKMS to run after the new kernel/headers are installed, not after the next boot.
The only drawback to that is that it removes all earlier versions so the currently loaded module will disappear (that could, obviously, be changed)

Revision history for this message
Manjul Apratim (manzdagratiano) wrote :

Hmmm... the only difference I had from Dan's version was that I did not get a flashing screen on reboot, which is weird since I "saw" the driver being built against the 2.6.28 kernel while upgrade; then the install of the 2.6.31 kernel should have set that as default, and if that happened and the graphics started fine (except for the other issue) it probably means nvidia.ko did get built before reboot, so I am a little lost as to what actually happened (darn my lost logs!)

Revision history for this message
Alberto Milone (albertomilone) wrote :

I have provided upstream with a patch for DKMS so that we can build the kernel module for both the kernel in use and for the most recent kernel in the system.

This solves the problem here.

Changed in nvidia-graphics-drivers-180 (Ubuntu):
assignee: nobody → Alberto Milone (albertomilone)
status: Triaged → In Progress
Revision history for this message
Alberto Milone (albertomilone) wrote :

The attached log shows how the patch works.

Changed in nvidia-graphics-drivers-173 (Ubuntu):
importance: Undecided → Critical
assignee: nobody → Alberto Milone (albertomilone)
Changed in nvidia-graphics-drivers-96 (Ubuntu):
importance: Undecided → Critical
Changed in nvidia-graphics-drivers-173 (Ubuntu):
status: New → In Progress
Changed in nvidia-graphics-drivers-96 (Ubuntu):
status: New → In Progress
assignee: nobody → Alberto Milone (albertomilone)
Revision history for this message
Alberto Milone (albertomilone) wrote :

While DKMS is definitely not the cause of the problem here, driver -180 relies on a template which dkms provides in /usr/lib/dkms/common.postinst.

Driver -173 and -96 should do the same (i.e. source it from the postinst) so as to minimise efforts here by avoiding code duplication.

Revision history for this message
Manjul Apratim (manzdagratiano) wrote :

Yippie!

Revision history for this message
Per Olausson (per-olauzzon) wrote :

I think this is the problem with Nvidia during an upgrade to 9.10 from a previous version:

libxine1-bin:
 Depends: nvidia-185-libvdpau but it is not going to be installed

This is preventing an upgrade or reinstall of Nvidia properly, because nvidia-185-libvdpau won't be removed with all the other dependencies the libxine1-bin will generate (amarok etc...).

Is this an erroneous dependency?

Revision history for this message
Alberto Milone (albertomilone) wrote :

@TaxAlien
Your problem is a different bug. Please file a new bug report against libxine1-bin by typing the following command from the command line:

ubuntu-bug libxine1-bin

Revision history for this message
Alberto Milone (albertomilone) wrote :

Here's a better version of the patch which makes sure that the fix works on Rhel based systems too.

Revision history for this message
Alberto Milone (albertomilone) wrote :
Revision history for this message
Alberto Milone (albertomilone) wrote :
Changed in dkms (Ubuntu):
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Alberto Milone (albertomilone)
Revision history for this message
Manjul Apratim (manzdagratiano) wrote :

Way to go! Well I am glad coming forward with this bug helped identify all these issues... Every step we take to weed out bugs is a step to weed out Bug #1; I would have said the Jackalope itself gives Microsoft serious competition; the Koala takes it further - let us hope the Lynx seals the deal! :)

Revision history for this message
Per Olausson (per-olauzzon) wrote :

@Alberto Milone: well you can't do that because it is not a genuine ubuntu package, so I am not sure how to proceed with it. I am not familiar to the Ubuntu defect process and trying to raise something directly on the website doesn't work.

Anyhow, the issue I posted appears to be the cause of the issues people have to getting Nvidia (and probably other graphics cards) up and running again.

Revision history for this message
Alberto Milone (albertomilone) wrote :

@TaxAlien
I have provided upstream (i.e. DKMS developers) with a patch that I wrote. They will have to review it and then we should introduce the change in Ubuntu and fix the drivers.

piero (petercrue)
Changed in nvidia-graphics-drivers-173 (Ubuntu):
status: In Progress → Confirmed
status: Confirmed → Fix Committed
status: Fix Committed → In Progress
assignee: Alberto Milone (albertomilone) → piero (petercrue)
Revision history for this message
Alberto Milone (albertomilone) wrote :

@piero
Why did you change the status of this bug? I'll upload my fix as soon as my patch is reviewed by upstream.

Changed in nvidia-graphics-drivers-173 (Ubuntu):
assignee: piero (petercrue) → Alberto Milone (albertomilone)
Revision history for this message
Mario Limonciello (superm1) wrote :

Alberto's fix has been committed upstream and will be uploaded to lucid in the next DKMS upload.

Changed in dkms (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Mario Limonciello (superm1) wrote :
Revision history for this message
Manjul Apratim (manzdagratiano) wrote :

Mighty awesome I'd say - thanks to Alberto Milone and the rest of the team!

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.3 KiB)

This bug was fixed in the package dkms - 2.1.1.0-0ubuntu1

---------------
dkms (2.1.1.0-0ubuntu1) lucid; urgency=low

  [ Mario Limonciello ]
  * New upstream version
  * dkms_autoinstall: Minor logic cleanups from submitted patches.
  * dkms_autoinstall: Run under dash since dkms.conf isn't sourced anymore.
  * dkms_autoinstall: Whitespace cleanup.
  * Convert DKMS to an upstart script that starts up before GDM or KDM can
    start. This ensures that drivers are built before X tries to start.
    (LP: #453365)
  * dkms_autoinstall: Rather than having if/else clauses all over the script,
    stub out any functions that aren't provided on Debian/Ubuntu when
    /etc/debian_version isn't present.
  * dkms_autoinstall: Exit immediately if this script is present but DKMS
    isn't anymore rather than sourcing functions and then exiting.
  * kernel_postinst.d_dkms: Launch the upstart script instead. In the process
    all output will be going to /var/log/dkms_autoinstaller (LP: #292606)
  * dkms_autoinstall: Don't ever output to stdout, even with kernel parameters.
  * dkms_autoinstall: Don't log the situation that we already have everything
    installed that needs to be.
  * dkms_autoinstall: Rather than logging to /var/log/dkms_autoinstaller,
    use logger to log to syslog during build and install.
  * dkms_autoinstall: Clean up the method to get arch. These hacks shouldn't
    be necessary. If you have problems with them gone, file a bug and we'll
    fix them more cleanly.
  * dkms_autoinstall: Notate the kernel we are building a module against
    when building it.
  * debian/rules: Don't attempt to stop DKMS on upgrades. It's a task, not
    a daemon, so stop wouldn't do anything.
  * Makefile: Install the old initscript to /usr/lib so that different distros
    can migrate to upstart at their leisure.
  * Makefile: Move any debian specific calls into the Makefile.
  * dkms: Revert the code that runs DKMS as the user "nobody".
    - It's causing problems with people with nonstandard PAM configs because it
      uses "su". (LP: #484725)
    - Also people have reported that nothing should be owned by 'nobody' per
      Debian & Ubuntu policy. This could have been fixed by creating a DKMS
      user, but that still wouldn't solve the problems with using 'su'.
  * dkms: Emit built-module MODULE=foo if initctl is available on the system
    after done building a module.
  * Add a special apport package-hook for when package builds fail to try
    to report them against the package providing that DKMS package.
    (LP: #484871)

  [ Alberto Milone ]
  * dkms_common.postinst: try to build the module for the most recent
    kernel in addition to building it for the current kernel (LP: #474917).

  [ Steve Langasek ]
  * dkms_autoinstall: optimize with a single find call instead of multiple
    loops with ls. (LP: 3484386)
  * dkms_autoinstall: drop localization of the usage message - this is
    inconsistent with all other init scripts on the system.

  [ Pauli Virtanen ]
  * Remove dependence from environment's umask and certain environment
    variables. (LP: #438393, #436039)

  [ Giuseppe Iuculano ]
  * dkms_autoinstall: Correct the prov...

Read more...

Changed in dkms (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nvidia-graphics-drivers-173 - 173.14.22-0ubuntu1

---------------
nvidia-graphics-drivers-173 (173.14.22-0ubuntu1) lucid; urgency=low

  * Rework packaging taking Mandriva's as a model:
    - Use alternatives instead of diversions.
    - Reduce the number of binary packages to nvidia-173,
      nvidia-173-dev and nvidia-173-modaliases.
  * debian/rules:
    - Switch to CDBS.
    - Remove libGL.la as no static library is provided.
  * debian/nvidia-current.README.Debian.in:
    - Document the update process.
  * nvidia-current.postinst: try to build the module for the most
    recent kernel in addition to building it for the current kernel
    (LP: #474917).
  * New upstream release (LP: #494166):
   - Add support for xserver 1.7.x.
 -- Alberto Milone <email address hidden> Fri, 08 Jan 2010 23:30:10 +0100

Changed in nvidia-graphics-drivers-173 (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nvidia-graphics-drivers-96 - 96.43.14-0ubuntu1

---------------
nvidia-graphics-drivers-96 (96.43.14-0ubuntu1) lucid; urgency=low

  * Rework packaging taking Mandriva's as a model:
    - Use alternatives instead of diversions.
    - Reduce the number of binary packages to nvidia-96,
      nvidia-96-dev and nvidia-96-modaliases.
  * debian/rules:
    - Switch to CDBS.
    - Remove libGL.la as no static library is provided.
  * debian/nvidia-current.README.Debian.in:
    - Document the update process.
  * nvidia-current.postinst: try to build the module for the most
    recent kernel in addition to building it for the current kernel
    (LP: #474917).
  * New upstream release (LP: #494166):
   - Add support for xserver 1.7.x.
 -- Alberto Milone <email address hidden> Fri, 08 Jan 2010 23:34:51 +0100

Changed in nvidia-graphics-drivers-96 (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
ochatard (olivier-chatard) wrote :

My graphic card requires the nvidia-96 driver and seems to the same trouble, even after upgrade to the latest 2.6.31-17 kernel.
The only workaround I found to be able to run in high resolution is to run the "old" 2.6.28 kernel, but then, ALSA sound package is no longer working, so the issue to choose between sound and low-resolution graphic mode or no sound and high resolution graphic mode : is that a choice ?

Revision history for this message
Manjul Apratim (manzdagratiano) wrote :

I am using the 2.6.28 kernel still, since I am on Jaunty, and the ALSA package does work on my machine - what I had to do was to switch to alsa-oss for everything to work properly. I am sure you should be able to get the sound to work in that kernel too, unless the experts hither can point out something different.

Revision history for this message
Alberto Milone (albertomilone) wrote :

@ochatard:
The fix was released in only Ubuntu Lucid, therefore the bug is still not fixed in Karmic.

Revision history for this message
Manjul Apratim (manzdagratiano) wrote :

Hmmm... well for myself I am giving the Koala a skip altogether... And am off to try the Lynx on Dustin Kirkland's testdrive

Revision history for this message
Alberto Milone (albertomilone) wrote :

At this point it's quite clear that we won't get the fix for DKMS in Karmic, therefore I declined the task for Karmic. This is fixed in Lucid though.

Changed in nvidia-graphics-drivers-180 (Ubuntu Lucid):
status: In Progress → Won't Fix
status: Won't Fix → Fix Released
Bryce Harrington (bryce)
tags: added: karmic
tags: added: jaunty
Changed in ubuntu-release-notes:
status: New → Invalid
Adam Guthrie (therigu)
tags: added: patch-accepted-upstream
papukaija (papukaija)
tags: added: jaunty2karmic
Revision history for this message
Bryce Harrington (bryce) wrote :

@Alberto, is there anything more that can be reasonably done with this bug report? If not, perhaps the remaining bug tasks should be closed out?

Revision history for this message
Bryce Harrington (bryce) wrote :

Alright, as there's been no responses, I'm going to go ahead and assume the remaining bug tasks should be closed out. If there actually is work still needed to be done, either reopen the appropriate task or (even better) open a new, cleaner bug report against the current development version, using 'ubuntu-bug xorg'.

Changed in nvidia-graphics-drivers-180 (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments