Ubuntu

dkms should start before gdm if needed for video driver

Reported by Anders Kaseorg on 2009-10-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
dkms (Ubuntu)
Medium
Mario Limonciello
gdm (Ubuntu)
Low
Mario Limonciello
xorg (Ubuntu)
Undecided
Mario Limonciello

Bug Description

Binary package hint: dkms

Right now dkms does not start before gdm, so if you’re using a restricted video driver that it hasn’t been built yet for the current kernel, gdm will just sit there helplessly flashing the screen until dkms runs.

tags: added: ubuntu-boot
Changed in gdm (Ubuntu):
importance: Undecided → Low
Bryce Harrington (bryce) wrote :

Can anyone else confirm this is indeed happening? Sounds like it could cause a lot of bad breakage if so.

Anders, any additional data you could provide (log files, error messages, etc.) could help move this issue faster towards becoming actionable.

Anders Kaseorg (anders-kaseorg) wrote :

Well, here are all the logs created in /var/log during a reboot with --verbose after ‘dkms remove -m nvidia -v 185.18.36 -k $(uname -r)’. As you can see, gdm started too early to find the nvidia module that dkms was still in the process of compiling. The nvidia driver could be loaded fine manually after the compile was done, and the next reboot worked fine with the nvidia module already compiled.

Bryce Harrington (bryce) wrote :

Anders, thanks! This confirms what I suspected.

Solving this might be tricky though, but I've suggested we add a release note about the situation.

Mario Limonciello (superm1) wrote :

So the solution that comes to mind for me is to introduce an upstart job for DKMS so that GDM doesn't try to startup until DKMS is done building it's modules.

Mario Limonciello (superm1) wrote :

Here's a diff that should take care of this. I've only tested it in a virtualized environment. Once I can test it on bare metal hardware, i'll upload this and a few other bits that i'm planning for SRU's to DKMS.

Anders Kaseorg (anders-kaseorg) wrote :

> +start on (starting gdm
> + or starting kdm
> + or starting xdm
> + or starting oem-config
> + or starting ubiquity)

I’m not completely familiar with how upstart jobs work, but won’t that make dkms_autoinstaller never start on a server that needs it for openafs-modules-dkms? Can you add a fallback event such as “… or (starting rc RUNLEVEL=[2345])” so that it will always run at least as early in the boot process as it did before?

Mario Limonciello (superm1) wrote :

I've committed a solution to this upstream:
http://linux.dell.com/git/?p=dkms.git;a=commit;h=3692368c51692003a56d5c4a174f7d712aace71e

I tested it on both desktop and server installations, the script still runs in both scenarios.

Changed in gdm (Ubuntu):
status: New → Won't Fix
Changed in dkms (Ubuntu):
status: New → Fix Committed
assignee: nobody → Mario Limonciello (superm1)
milestone: none → lucid-alpha-1
Anders Kaseorg (anders-kaseorg) wrote :

You added “or stopping rc RUNLEVEL=[2345]” rather than “or starting rc RUNLEVEL=[2345]”. This will break openafs-modules-dkms because /etc/init.d/openafs-client (which requires the kernel module to have already been built) is started from rc.

Changed in dkms (Ubuntu):
status: Fix Committed → In Progress

Anders:

Then openafs-client should be converted to an upstart script that depends on
DKMS having finished running

On Tue, Dec 8, 2009 at 15:33, Anders Kaseorg <email address hidden> wrote:

> You added “or stopping rc RUNLEVEL=[2345]” rather than “or starting rc
> RUNLEVEL=[2345]”. This will break openafs-modules-dkms because
> /etc/init.d/openafs-client (which requires the kernel module to have
> already been built) is started from rc.
>
> ** Changed in: dkms (Ubuntu)
> Status: Fix Committed => In Progress
>
> --
> dkms should start before gdm if needed for video driver
> https://bugs.launchpad.net/bugs/453365
> You received this bug notification because you are a bug assignee.
>

--
Mario Limonciello
<email address hidden>

And it will (bug 483506), but I worry that there may be other initscripts with the same problem. In any event, should we really force all of these transitions to happen at exactly the same time?

Mario Limonciello (superm1) wrote :

It's very early in the cycle, so this is the time that these types of things should be switched over and ironed out. I'm going to adjust the DKMS task back to Fix Committed. If you've got some other feedback about how this upstart script should look before I upload it, feel free to leave it here, otherwise open up another bug if you come across any problems with the implementation.

Changed in dkms (Ubuntu):
status: In Progress → Fix Committed
Changed in dkms (Ubuntu):
importance: Undecided → Medium

Even if it is desirable to convert openafs-client to upstart right now, forcing dkms to load after rc would force openafs-client to load after rc. This will be a problem for other services that depend on openafs, such as apache2 installations with docroot in /afs.

dkms provides kernel modules, and userspace services always depends on kernel modules being available, never the other way around, so delaying dkms toward the very end of the boot process seems strange anyway.

In short, I see many advantages to loading dkms before legacy SysV services, and no advantages to loading it after. The upstart transition can happen either way. Is there something I missed?

Perhaps we should try to get feedback from an Upstart developer?

On 12/09/2009 02:54 PM, Anders Kaseorg wrote:
> Even if it is desirable to convert openafs-client to upstart right now,
> forcing dkms to load after rc would force openafs-client to load after
> rc. This will be a problem for other services that depend on openafs,
> such as apache2 installations with docroot in /afs.
>
> dkms provides kernel modules, and userspace services always depends on
> kernel modules being available, never the other way around, so delaying
> dkms toward the very end of the boot process seems strange anyway.
>
> In short, I see many advantages to loading dkms before legacy SysV
> services, and no advantages to loading it after. The upstart transition
> can happen either way. Is there something I missed?
>
> Perhaps we should try to get feedback from an Upstart developer?
>
>
Regardless of the outcome of the rules to when DKMS starts, I'd argue
that this type of problem is begging for the broken pieces to each be
converted to upstart anyway. Openafs should be satisfying a network
filesystems need, and these other services such as apache shouldn't be
starting until the network file systems are ready.

Now, if you chain that with the openafs upstart job waiting for DKMS to
be done and apache/mysql/$favorite_server_service waiting for network
file systems to be ready and you've got a very logical ordering to your
boot process.

I'll have to see how this setup behaves if DKMS loads before the legacy
SysV services, particularly since that might trigger things rather than
gdm as originally intended for the desktop case.

--
Mario Limonciello
<email address hidden>

> Now, if you chain that with the openafs upstart job waiting for DKMS to
> be done and apache/mysql/$favorite_server_service waiting for network
> file systems to be ready and you've got a very logical ordering to your
> boot process.

Yes yes, I understand that. And I want very much to live in that world. But we aren’t going to get there overnight, or even in one release cycle (especially with packages being imported from Debian, which still uses SysV initscripts). So I’m interested in making the transition to that world as smooth as possible, especially when it’s easy.

OpenAFS certainly wants to satisfy a network file systems event—but again, if openafs depends dkms and dkms runs after rc, then now you’ve forced rc to never depend on network file systems.

Someday rc will go away and none of this will be a problem, but until then let’s try to get things right with rc.

> I'll have to see how this setup behaves if DKMS loads before the legacy
> SysV services, particularly since that might trigger things rather than
> gdm as originally intended for the desktop case.

You mean you’re worried about dkms triggering things too early? dkms does not trigger anything yet, and when there are packages that trigger on dkms, they will ‘start on (stopped dkms_autoinstaller and <other conditions…>)’ so that they won’t trigger too early no matter how early dkms finishes.

Better yet, you could register an event (say) built-module, and do ‘initctl emit built-module MODULE=foo’ for each module foo when it is built, so that other services could “start on (built-module MODULE=foo and <other conditions…>)”.

By the way, you can use the starting-dm event instead of ‘starting gdm or starting kdm or starting xdm’.

On Thu, Dec 10, 2009 at 01:07, Anders Kaseorg <email address hidden> wrote:

> > Now, if you chain that with the openafs upstart job waiting for DKMS to
> > be done and apache/mysql/$favorite_server_service waiting for network
> > file systems to be ready and you've got a very logical ordering to your
> > boot process.
>
> Yes yes, I understand that. And I want very much to live in that world.
> But we aren’t going to get there overnight, or even in one release cycle
> (especially with packages being imported from Debian, which still uses
> SysV initscripts). So I’m interested in making the transition to that
> world as smooth as possible, especially when it’s easy.
>

What kind of scale of packages are you talking about? I mean adding upstart
scripts isn't very difficult. I just did it for mysql a week or two ago.
 If the effort is just to get any packages in main by Lucid, I don't see why
it's not achievable.

>
> OpenAFS certainly wants to satisfy a network file systems event—but
> again, if openafs depends dkms and dkms runs after rc, then now you’ve
> forced rc to never depend on network file systems.
>
> Someday rc will go away and none of this will be a problem, but until
> then let’s try to get things right with rc.
>

So am I mistaken, but stopping rc RUNLEVEL=[2345], that means as soon as
runlevel 2 is exited or 3 or 4 or 5, then it would run DKMS right?

You mean you’re worried about dkms triggering things too early? dkms
> does not trigger anything yet, and when there are packages that trigger
> on dkms, they will ‘start on (stopped dkms_autoinstaller and <other
> conditions…>)’ so that they won’t trigger too early no matter how early
> dkms finishes.
>
> Better yet, you could register an event (say) built-module, and do
> ‘initctl emit built-module MODULE=foo’ for each module foo when it is
> built, so that other services could “start on (built-module MODULE=foo
> and <other conditions…>)”.
>

This is a spectacular idea. I've just committed it upstream.

http://linux.dell.com/git/?p=dkms.git;a=commit;h=2acd882f0e28249d05d0bdc88f894cb534824f69

>
> By the way, you can use the starting-dm event instead of ‘starting gdm
> or starting kdm or starting xdm’.
>

I heard that this isn't the proper thing to do in Lucid anymore, and
anything that was doing it will be changed.

> If the effort is just to get any packages in main by Lucid, I don't see why
> it's not achievable.

Fixing all of main isn’t good enough for many users (openafs is itself in universe). Whether it’s acceptable to break universe is up to the Ubuntu TB, but there’s absolutely no point in breaking *anything* when it’s totally avoidable by changing “stopping rc” to “starting rc”.

(I’d love to be proven wrong by getting all of Lucid converted to upstart before the release—but that _still_ doesn’t imply we should arbitrarily break things before that happens.)

> So am I mistaken, but stopping rc RUNLEVEL=[2345], that means as soon as
> runlevel 2 is exited or 3 or 4 or 5, then it would run DKMS right?

Perhaps you are mistaken (or I misunderstood you): rc is a task, so it “stops” as soon as all the SysV initscripts have finished starting, not when the runlevel is exited.

But I want dkms_autoinstaller to run before the SysV initscripts start (‘start on (… starting rc RUNLEVEL=[2345])’), rather than running after the SysV initscripts start (‘start on (… stopping rc RUNLEVEL=[2345])’, which is what you currently have).

Are we on the same page here?

> This is a spectacular idea. I've just committed it upstream.

Cool. I think you’re supposed to also declare
  emits built-module
in /etc/init/dkms_autoinstaller.conf; see init(5).

> I heard that this isn't the proper thing to do in Lucid anymore, and
> anything that was doing it will be changed.

gdm and usplash still emit and use starting-dm in current Lucid (but of course it’s possible that’s still planned to change? *shrug*).

Launchpad Janitor (janitor) wrote :
Download full text (3.3 KiB)

This bug was fixed in the package dkms - 2.1.1.0-0ubuntu1

---------------
dkms (2.1.1.0-0ubuntu1) lucid; urgency=low

  [ Mario Limonciello ]
  * New upstream version
  * dkms_autoinstall: Minor logic cleanups from submitted patches.
  * dkms_autoinstall: Run under dash since dkms.conf isn't sourced anymore.
  * dkms_autoinstall: Whitespace cleanup.
  * Convert DKMS to an upstart script that starts up before GDM or KDM can
    start. This ensures that drivers are built before X tries to start.
    (LP: #453365)
  * dkms_autoinstall: Rather than having if/else clauses all over the script,
    stub out any functions that aren't provided on Debian/Ubuntu when
    /etc/debian_version isn't present.
  * dkms_autoinstall: Exit immediately if this script is present but DKMS
    isn't anymore rather than sourcing functions and then exiting.
  * kernel_postinst.d_dkms: Launch the upstart script instead. In the process
    all output will be going to /var/log/dkms_autoinstaller (LP: #292606)
  * dkms_autoinstall: Don't ever output to stdout, even with kernel parameters.
  * dkms_autoinstall: Don't log the situation that we already have everything
    installed that needs to be.
  * dkms_autoinstall: Rather than logging to /var/log/dkms_autoinstaller,
    use logger to log to syslog during build and install.
  * dkms_autoinstall: Clean up the method to get arch. These hacks shouldn't
    be necessary. If you have problems with them gone, file a bug and we'll
    fix them more cleanly.
  * dkms_autoinstall: Notate the kernel we are building a module against
    when building it.
  * debian/rules: Don't attempt to stop DKMS on upgrades. It's a task, not
    a daemon, so stop wouldn't do anything.
  * Makefile: Install the old initscript to /usr/lib so that different distros
    can migrate to upstart at their leisure.
  * Makefile: Move any debian specific calls into the Makefile.
  * dkms: Revert the code that runs DKMS as the user "nobody".
    - It's causing problems with people with nonstandard PAM configs because it
      uses "su". (LP: #484725)
    - Also people have reported that nothing should be owned by 'nobody' per
      Debian & Ubuntu policy. This could have been fixed by creating a DKMS
      user, but that still wouldn't solve the problems with using 'su'.
  * dkms: Emit built-module MODULE=foo if initctl is available on the system
    after done building a module.
  * Add a special apport package-hook for when package builds fail to try
    to report them against the package providing that DKMS package.
    (LP: #484871)

  [ Alberto Milone ]
  * dkms_common.postinst: try to build the module for the most recent
    kernel in addition to building it for the current kernel (LP: #474917).

  [ Steve Langasek ]
  * dkms_autoinstall: optimize with a single find call instead of multiple
    loops with ls. (LP: 3484386)
  * dkms_autoinstall: drop localization of the usage message - this is
    inconsistent with all other init scripts on the system.

  [ Pauli Virtanen ]
  * Remove dependence from environment's umask and certain environment
    variables. (LP: #438393, #436039)

  [ Giuseppe Iuculano ]
  * dkms_autoinstall: Correct the prov...

Read more...

Changed in dkms (Ubuntu):
status: Fix Committed → Fix Released
Mario Limonciello (superm1) wrote :

Anders:

In case it's not clear, here's the final upstart task that I settled upon:
http://linux.dell.com/git/?p=dkms.git;a=blob;f=dkms_autoinstaller.upstart;h=27fc487923f1f77d0e075b3570f76e718746b9bb;hb=HEAD

It should start immediately upon startup on all systems. I've got some extra integration work that will go into gdm and failsafe-x to listen for the built-* signals.

Changed in gdm (Ubuntu):
status: Won't Fix → In Progress
assignee: nobody → Mario Limonciello (superm1)
Changed in dkms (Ubuntu):
milestone: lucid-alpha-1 → none
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package gdm - 2.29.1-0ubuntu6

---------------
gdm (2.29.1-0ubuntu6) lucid; urgency=low

  * debian/gdm.upstart (LP: #453365)
    - Start on built-successful signal for fglrx or nvidia modules.
    - If dkms is available on the system, use it to check the status
      of fglrx or nvidia.
    - If they're in the DKMS tree but not built, exit the gdm task
      and wait for the build-successful signal. If they don't build,
      the failsafe-x task will receive a build-failed and can start
      BulletProof-X.
 -- Mario Limonciello <email address hidden> Mon, 14 Dec 2009 14:09:14 -0600

Changed in gdm (Ubuntu):
status: In Progress → Fix Released
affects: openafs (Ubuntu) → xorg (Ubuntu)
Changed in xorg (Ubuntu):
assignee: nobody → Mario Limonciello (superm1)
status: New → In Progress
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xorg - 1:7.5~3ubuntu3

---------------
xorg (1:7.5~3ubuntu3) lucid; urgency=low

  * debian/x11-common.failsafe-x.upstart: (LP: #453365)
    - Start on the build-failed signal that DKMS may emit
      during a failed build for nvidia or fglrx.
 -- Mario Limonciello <email address hidden> Mon, 14 Dec 2009 14:17:21 -0600

Changed in xorg (Ubuntu):
status: In Progress → Fix Released

> --- debian/gdm.upstart 2009-12-11 05:36:55 +0000
> +++ debian/gdm.upstart 2009-12-14 20:12:34 +0000
> @@ -9,7 +9,9 @@
> start on (filesystem
> and (graphics-device-added fb0 PRIMARY_DEVICE_FOR_DISPLAY=1
> or drm-device-added card0 PRIMARY_DEVICE_FOR_DISPLAY=1
> - or stopped udevtrigger))
> + or stopped udevtrigger)
> + or build-successful MODULE=nvidia
> + or build-successful MODULE=fglrx)
> stop on runlevel [016]
>
> emits starting-dm

I’m a little worried that this has race conditions:
• If build-successful fires before the graphics device is ready, gdm startup may be triggered too early (is this a problem?).
• If build-successful fires between the time X fails to module and the time gdm finishes failing to start, the event will be lost entirely.

Maybe one way this could be solved is to leave gdm the way it was and add a dummy task to the nvidia and fglrx packages to block gdm from starting up before the module is available. This avoids any race conditions that involve trying to start the gdm job multiple times, and also seems more neatly modular:

  start on starting gdm and (build-successful MODULE=nvidia or build-failed MODULE=nvidia)
  task
  exec /bin/true

(Given that use case, perhaps build-failed should not actually be a separate event? It might be nicer to emit events like
  dkms-build MODULE=nvidia RESULT=successful
  dkms-build MODULE=nvidia RESULT=failed
so that one can wait for dkms-build MODULE=nvidia without needing to indicate which.)

Should I file a separate bug?

Mario Limonciello (superm1) wrote :

I had similar worries about race conditions, but this code path is supposed
to be happening so rarely, that the race conditions would probably be even
rarer.

That being said, that does seem like a cleaner solution. File a different
bug and we'll track it there. It's just going to need some good testing to
make sure that we get it right.

On Mon, Dec 14, 2009 at 15:26, Anders Kaseorg <email address hidden> wrote:

> > --- debian/gdm.upstart 2009-12-11 05:36:55 +0000
> > +++ debian/gdm.upstart 2009-12-14 20:12:34 +0000
> > @@ -9,7 +9,9 @@
> > start on (filesystem
> > and (graphics-device-added fb0 PRIMARY_DEVICE_FOR_DISPLAY=1
> > or drm-device-added card0 PRIMARY_DEVICE_FOR_DISPLAY=1
> > - or stopped udevtrigger))
> > + or stopped udevtrigger)
> > + or build-successful MODULE=nvidia
> > + or build-successful MODULE=fglrx)
> > stop on runlevel [016]
> >
> > emits starting-dm
>
> I’m a little worried that this has race conditions:
> • If build-successful fires before the graphics device is ready, gdm
> startup may be triggered too early (is this a problem?).
> • If build-successful fires between the time X fails to module and the time
> gdm finishes failing to start, the event will be lost entirely.
>
> Maybe one way this could be solved is to leave gdm the way it was and
> add a dummy task to the nvidia and fglrx packages to block gdm from
> starting up before the module is available. This avoids any race
> conditions that involve trying to start the gdm job multiple times, and
> also seems more neatly modular:
>
> start on starting gdm and (build-successful MODULE=nvidia or build-failed
> MODULE=nvidia)
> task
> exec /bin/true
>
> (Given that use case, perhaps build-failed should not actually be a
> separate event? It might be nicer to emit events like
> dkms-build MODULE=nvidia RESULT=successful
> dkms-build MODULE=nvidia RESULT=failed
> so that one can wait for dkms-build MODULE=nvidia without needing to
> indicate which.)
>
> Should I file a separate bug?
>
> --
> dkms should start before gdm if needed for video driver
> https://bugs.launchpad.net/bugs/453365
> You received this bug notification because you are a bug assignee.
>

--
Mario Limonciello
<email address hidden>

Filed as bug 497137.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments