Comment 9 for bug 806761

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Hi Caleb! Thanks for taking a crack at integrating these into Ubuntu. This is really great and even though my review below is long, I have to say that this is a great first attempt at a really difficult problem space. :) I'm going to unsubscribe ubuntu-sponsors for now, but please do re-subscribe sponsors when you've made the necessary adjustments.

My Feedback:

* start on (local-filesystems and net-device-up IFACE!=lo)

This does not do what you think it does. The first time there is a network connection available, this will work fine. But if the network goes away and comes back, this will not be started. That is because the 'and' will be waiting for local-filesystems to be emitted again. If you remove local-filesystems, then you will start too soon on systems that have NFS root. Also for systems with multiple network interfaces, this may not actually be sufficient either, as it doesn't mean the one that is needed for the service is up, just that "one of them" is up.

This also means if a system is brought into single user mode (runlevel 1) that the service will never be started back up again because you haven't listed runlevel [2345] in the start on.

Realistically, what you need is

start on runlevel [2345]

This will wait for any static interfaces configured in /etc/network/interfaces. For network-manager managed interfaces, you'll still be racing with them, so you also need a script in /etc/network/if-up.d to start the job any time a network interface comes up after runlevel 2 is reached:

#!/bin/sh
start nslcd 2> /dev/null

* Upstart logs all job output now (as of upstart 1.5, released w/ Ubuntu 12.04) so there is no need to write to a file in /tmp. Simply echo what you want to say and it will end up in /var/log/upstart/$jobname.log. Also the user is shown upstart's starting events in the plymouth details plugin (available when you hit 'down' in the gui startup, or always shown on servers). So there's really no need to echo 'Starting' anything.

* We are deprecating use of /etc/default files. I recommend replacing that file with a message like this: 'This file has been deprecated, please edit /etc/init/nslcd.conf. For users who do not want to edit packaged files, 'env' lines can be changed in /etc/init/nslcd.override'. Then remove all checking for its existence and loading of it, and move the defaults into the upstart jobs as 'env XXXXX=YYY' lines. This makes a tiny improvement on boot speed for this one job, but it that adds up as we do it for all jobs.

* In your post-start loop, you should check to see what the state of the job is between iterations. An administrator may have decided to give up, and run 'stop nslcd' since the loop started waiting, and respawn also may happen if the binary exitted abnormally. Mysql shows how this works:

post-start script
   for i in `seq 1 30` ; do
        /usr/bin/mysqladmin --defaults-file="${HOME}"/debian.cnf ping && {
            exec "${HOME}"/debian-start
            # should not reach this line
            exit 2
        }
        statusnow=`status`
        if echo $statusnow | grep -q 'stop/' ; then
            exit 0
        elif echo $statusnow | grep -q 'respawn/' ; then
            exit 1
        fi
        sleep 1
    done
    exit 1
end script

You might also want to tweak respawn limit as the defaults don't really take into account a daemon that might take more than 0.5s to fail.

* exit 1 does not really abort the script the way you have it done here. Instead you will want to use 'stop; exit 0'. This means the job intentionally does not want to start. Also a non-existant binary is not an error, so do not print anything or exit 1 for that. If a package has been 'removed', its upstart job will still be present and should handle this gracefully.

* Some of the variables have been blindly copied in as 'env' statements. K5START_DESC for instance is copied in, but is not used in the upstart job.

* Instead of mkdir+chown just use install -d.