Feature Request: Upstart scripts for nslcd

Bug #806761 reported by Caleb Callaway on 2011-07-07
68
This bug affects 12 people
Affects Status Importance Assigned to Milestone
nss-pam-ldapd (Ubuntu)
Wishlist
Unassigned
Lucid
Undecided
Unassigned
Precise
Undecided
Unassigned

Bug Description

This forum thread contains Upstart scripts that makes nslcd (and by extension the whole nss-pam-ldapd package) more resilient in the face of network connectivity outages: http://ubuntuforums.org/showthread.php?t=1335022

Please incorporate these into 12.04!

Changed in nss-pam-ldapd (Ubuntu):
status: New → Confirmed

Here's a debdiff against nss-pam-ldap 0.8.10 with the required changes.

The attachment "upstart scripts for nslcd" of this bug report has been identified as being a patch in the form of a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. In the event that this is in fact not a patch you can resolve this situation by removing the tag 'patch' from the bug report and editing the attachment so that it is not flagged as a patch. Additionally, if you are member of the ubuntu-sponsors team please also unsubscribe the team from this bug report.

[This is an automated message performed by a Launchpad user owned by Brian Murray. Please contact him regarding any issues with the action taken in this bug report.]

tags: added: patch
Arthur de Jong (adejong) wrote :

I've been looking into integrating the patch into Debian. The spelling fix was easy so that will be done with the next upload ;)

However, I have a few questions about the upstart scripts:
- Why was the init script dropped? Isn't it better to keep both so that systems without upstart can still start nslcd?
- Why was the script split into two parts?
- Are you sure the upstart script should exit with status 1 if it is not cofigured to start (sasl_mech isn't set in nslcd.conf)?
- Why are both scripts logging to /tmp with a predictable name?
- A lot of checks are duplicated in the pre-start script and the script. Isn't there a nicer way of avoiding this duplication?

Can you explain what the extra suggestions add (I'm not much of a Kerberos user myself)?

Hi Arther,

Thanks for taking the time to review the patch. :)

The motive for Upstart scripts instead of a init.d script is simple: with the init.d script, I couldn't find a simple way to guarantee nslcd started _after_ a network connection was available. At the time I logged the feature request (I haven't tested this recently--I probably should), nslcd would simply terminate if no network connections were available, so there was a race condition between nslcd and the networking configuration scripts. The Upstart script is triggered by an event that's emitted when a network interface is fully configured (net-device-up), so that race condition is avoided.

I removed the init.d script for a couple reasons. First, running `service nslcd restart` was preferring the init.d script to the Upstart script if the init.d script was present, leading to unexpected behavior. Second, it seemed best to have a single method for starting the daemon. Multiple, independent methods seemed like a good way to confuse users. When I built a test package with debuild in Ubuntu, wrappers for the Upstart scripts were placed in /etc/init.d/. I'd assume the same is true for Debian packages, but I haven't tested it.

Point taken about the status 1 exit when sasl_mech or krb5_ccname isn't available--those two checks should definitely exit with status 0.

I setup logging to the /tmp directory because AFAIK there's no good way to leverage existing log facilities with Upstart, and I understand that /tmp is usually available earlier in the boot process than /var. The log files contain diagnostic information about the startup process only--no secrets are shared. So, it seemed safe to have a log file, even though it wouldn't necessarily survive a reboot.

Can you point out the duplicate checks? There's a lot of annoying redundancy in the variable initialization because Upstart's env directive doesn't support expansion (see https://bugs.launchpad.net/upstart/+bug/328366), but I tried to eliminate duplicate checks wherever possible. The one duplicate check I see is the initialization of the state directory, which happens in both nslcd and nslcd-kerberos scripts. That's there because we require a state directory for both daemons, but we can't assume one has run before the other.

Resolving the race condition with networking configuration is the primary motive for moving to the Upstart script. Extracting the Kerberos-related initialization into a separate file doesn't improve the functionality at all; I did it so I could take advantage of Upstart's management features instead of manually killing the daemon. Also, it makes the core nslcd script much more readable, IMO.

Hopefully that answers your questions. Just let me know if anything needs clarification.

Thinking more about the way the sasl_mech and krb5_ccname options were handled, I saw that the logic in the nslcd-k5start file wasn't very good--we should only be looking for the kinit and k5start binaries if the user enabled Kerberos, either through the defaults file override, or with the nslcd.conf settings. Rewriting to account for this cleans up the logic quite a bit.

I'm struggling with some very odd behavior (calling kinit and k5start in the same script stanza fails if the job is started as a pre-condition for some other job, but works fine if the job is started directly), but expect to submit a revised debdiff in the next day or so.

The odd behavior turned out to be a race condition between the nslcd and nslcd-k5start scripts: per the Upstart docs (http://upstart.ubuntu.com/cookbook/#expect), Upstart assumes the job has started after the first PID that is generated in the script, unless an expect stanza is present. So, Upstart believes the nslcd-k5start job has started as soon as the calls to sed/hostname/etc are made and starts the nslcd job, which fails because it can't find the credentials cache it needs.

I can think of three options, none of which look good:

1) Use kinit in a pre-start script for the nslcd-k5start job, and duplicate the K5START_START logic in both the pre-start script and the main script.
2) Re-solve the problem that Upstart's supposed to solve with a signalling system, like letting the nslcd job wait until the nslcd-k5start job had set a flag (or a timeout occurs). Something like a spin lock.
3) Stall the nslcd script for set period of time and hope the nslcd-k5start script doesn't take very long to start.

At the moment, option 3 seems like the best of the bad options: introducing an extra dependency and duplicating control logic seems like a good way to introduce bugs, and implementing a lock system is robust but complicated. A revised debdiff that uses the third option is attached. I'm quite open to arguments in favor of another option, and I'd be delighted to know of a non-hackish solution.

Okay, after reading the Daemon Behavior section of the Upstart cookbook (http://upstart.ubuntu.com/cookbook/#daemon-behaviour), I've settled on a solution that I'm reasonably happy with: looping in a post-start script until the ticket cache is detected on the file system _or_ a timeout occurs (so we don't stall the job start process indefinitely)

Patch is attached. Contains the updated logic, as well as miscellaneous edits for clarity.

Bryce Harrington (bryce) on 2012-07-19
Changed in nss-pam-ldapd (Ubuntu):
importance: Undecided → Wishlist
status: Confirmed → Triaged
status: Triaged → In Progress
assignee: nobody → Caleb Callaway (enlightened-despot)
Clint Byrum (clint-fewbar) wrote :
Download full text (3.9 KiB)

Hi Caleb! Thanks for taking a crack at integrating these into Ubuntu. This is really great and even though my review below is long, I have to say that this is a great first attempt at a really difficult problem space. :) I'm going to unsubscribe ubuntu-sponsors for now, but please do re-subscribe sponsors when you've made the necessary adjustments.

My Feedback:

* start on (local-filesystems and net-device-up IFACE!=lo)

This does not do what you think it does. The first time there is a network connection available, this will work fine. But if the network goes away and comes back, this will not be started. That is because the 'and' will be waiting for local-filesystems to be emitted again. If you remove local-filesystems, then you will start too soon on systems that have NFS root. Also for systems with multiple network interfaces, this may not actually be sufficient either, as it doesn't mean the one that is needed for the service is up, just that "one of them" is up.

This also means if a system is brought into single user mode (runlevel 1) that the service will never be started back up again because you haven't listed runlevel [2345] in the start on.

Realistically, what you need is

start on runlevel [2345]

This will wait for any static interfaces configured in /etc/network/interfaces. For network-manager managed interfaces, you'll still be racing with them, so you also need a script in /etc/network/if-up.d to start the job any time a network interface comes up after runlevel 2 is reached:

#!/bin/sh
start nslcd 2> /dev/null

* Upstart logs all job output now (as of upstart 1.5, released w/ Ubuntu 12.04) so there is no need to write to a file in /tmp. Simply echo what you want to say and it will end up in /var/log/upstart/$jobname.log. Also the user is shown upstart's starting events in the plymouth details plugin (available when you hit 'down' in the gui startup, or always shown on servers). So there's really no need to echo 'Starting' anything.

* We are deprecating use of /etc/default files. I recommend replacing that file with a message like this: 'This file has been deprecated, please edit /etc/init/nslcd.conf. For users who do not want to edit packaged files, 'env' lines can be changed in /etc/init/nslcd.override'. Then remove all checking for its existence and loading of it, and move the defaults into the upstart jobs as 'env XXXXX=YYY' lines. This makes a tiny improvement on boot speed for this one job, but it that adds up as we do it for all jobs.

* In your post-start loop, you should check to see what the state of the job is between iterations. An administrator may have decided to give up, and run 'stop nslcd' since the loop started waiting, and respawn also may happen if the binary exitted abnormally. Mysql shows how this works:

post-start script
   for i in `seq 1 30` ; do
        /usr/bin/mysqladmin --defaults-file="${HOME}"/debian.cnf ping && {
            exec "${HOME}"/debian-start
            # should not reach this line
            exit 2
        }
        statusnow=`status`
        if echo $statusnow | grep -q 'stop/' ; then
            exit 0
        elif echo $statusnow | grep -q 'respawn/' ; then
            exit ...

Read more...

Hi Clint! Many thanks for taking the time to review the patch! I'm revising the scripts based on your feedback and should have a new debdiff soon.

Hi Clint,

I can see how first recommended change ("start on runlevel [2345]" and an if-up.d script) is an improvement: the if-up.d script makes sure we try to start nslcd *every* time an interface comes up, not just the *first* one.

However, I'm encountering another race condition with the if-up.d script in testing. Suppose we have two interfaces, a "good" interface that can connect us to the LDAP/Kerberos servers, and a "bad" interface that can't. If the "bad" interface comes up first, the nslcd job will start, and eventually terminate. However, it's possible the "good" interface will come up _while the nslcd job is in the process of failing_. If so, Upstart will see the nslcd job is running and refrain from launching a duplicate job, which means the nslcd service will never be launched without manual intervention.

Any ideas on how to resolve this? It seems like the if-up.d script needs to wait until the nslcd job was either in the "started" or "stopped" state--we can't assume the "starting" state is sufficient evidence that the job is completely run. This could be done with a loop similar to the one in the nslcd-kerberos post-start script, but hopefully you or someone else knows of a more elegant solution.

Also, in the point about the use of 'exit 1', you say, "Also a non-existant binary is not an error, so do not print anything or exit 1 for that." I can understand using 'stop; exit 0' instead of 'exit 1', but are you sure about the recommendation to not print anything? Many of my most frustrating troubleshooting sessions in Linux are related to services that fail to launch without providing any diagnostic output--I'm sure others have had the same experience. It seems like good practice to let the user/administrator know _why_ we can't start the service instead of failing silently.

Thanks for your time and attention to making this patch as robust and bullet-proof as possible. Every refinement gives me a warm fuzzy feeling. :D

-Caleb

One further point: I don't think we can replace the defaults file with env stanzas, because some of the variables that can be overridden in the defaults file require expansion, which the env stanza doesn't support.

I'm assuming the lack of comments indicates that there are no violent objections to the path I'm taking. The current revision has been working quite well on my system, as well as a testbed VM. So, I'm re-subscribing ubuntu-sponsors and attaching the revised patch.

Overview of changes:

-changed "start on" condition to start once at runlevels [23456], and added a if-up.d script to signal nslcd to start whenever an interface comes up
-took advantage of Upstart's built-in log facilities; removed redundant "starting..." messages.
-check state of job in post-start loop
-use "stop; exit 0" instead of "exit 1"
-removed unused env stanzas
-use install -d instead of mkdir+chown

What hasn't changed:

-The scripts still use an /etc/defaults file--I know of no way to do expansion with env variables, which is a requirement for several default values.
-Logging of diangostic information when the job terminates because of an improperly configured environment (e.g. missing binaries)

James Hunt (jamesodhunt) wrote :

Hi Caleb,

Thanks for keeping the momentum going on this. I notice though that you are still using $K5START_LOGFILE which really
isn't required now. Have you maybe attached an older version of the patch in #13 by mistake?

Regarding your comment about good and bad interfaces, the Upstart 'instance' stanza may help solve that issue. See:

http://upstart.ubuntu.com/cookbook/#instance

Using the 'interface' stanza, you could have multiple instances of the job.

Hi James,

Well, that's embarrassing--I did upload the wrong file. Thanks for pointing that out. The correct patch file is attached.

I'll take a look at the instance stanza as well--it'd be be nice to have a more elegant solution to that interface issue.

Clint Byrum (clint-fewbar) wrote :

Hi Caleb, sorry for not responding sooner I've been quite busy.

You really just need to serialize your attempts to start nslcd. This is actually doable pretty easily. Do this in your if-up.d script:

flock /etc/init/nslcd.conf start wait-for-state WAIT_FOR=nslcd WAITER=$INTERFACE WAIT_STATE=started

This will ensure that only one of them is ever running at a time, so if there's one about to fail, this will wait for that. Then it uses the 'wait-for-state' upstart job (only available in Ubuntu 11.10 and later) that will just exit gracefully if it is already started, and if it has not started yet, will try to start it.

Hi Client,

No worries about the response time. I really appreciate your help in making these scripts as excellent as possible.

The recommended method for serializing start attempts is working well. I'll dogfood the change for a day or so, then post a revised patch.

New revision. Changes:

-much cleaner mechanism for avoiding races in if-up.d script (thanks Clint!)
-Longer timeout waiting for credentials cache from k5start, to make sure timeout errors from k5start get logged.

So, any further recommendations or fixes? If not, what's the next step? Should I be coordinating with Arthur to get this merged upstream?

Dave Walker (davewalker) wrote :

Hi,

As the Debian Maintainer seems to be interested to include this fix in Debian, and there isn't currently a delta between Debian and Ubuntu in Quantal, it would be really good to get this support into Debian first, and merge the package once this has happened.

Therefore, I'd like to request that this is pursued first.. If it turns out not viable, we can re-evaluate this.

I am unsubscribing ~ubuntu-sponsors, as there isn't currently a direct task. If this needs to be reconsidered, please re-subscribe this team.

Thanks for helping to make Ubuntu (& Debian) better!

Dave Walker (davewalker) wrote :

s/merge/sync/

Arthur de Jong (adejong) wrote :

Hi, I've had a quick look at the patch (Patch rev5) but there are a few problems/questions for inclusion into Debian:

- Debian is currently preparing for the next stable release and as such I don't think I will upload this change to Debian unstable any time soon as it could interfere with getting other changes into wheezy.
- Debian doesn't install upstart by default so I don't want to drop the init script just yet. Do you know how upstart behaves if an init script is also present? For being included into Debian it should support both init systems side-by-side.

A few points regarding the patch:
- In nslcd.if-up flock seems to be missing a -c option (I assume the start command is part of upstart).
- What is the reason for adding the recommendation on libsasl2-modules-gssapi-mit | libsasl2-modules-gssapi-heimdal? What extra functionality does it provide to nslcd?
- The post-start script of nslcd checks /etc/init.d/nscd but runs /usr/sbin/nscd. Invalidating nscd can be a good idea but the script should check /usr/sbin/nscd (unscd ships a different init script but supports the nscd command interface).
- The post-stop script stops nscd which it shouldn't do IMO.
- The post-stop script has a debugging date command left over.
- The nslcd.if-up script doesn't support environments without upstart.
- In nslcd.nslcd-k5start.upstart NSLCD_STATEDIR is created before parsing /etc/default/nslcd.
- In nslcd.nslcd-k5start.upstart there is a section script. Isn't a pre-start or start missing?
- It seems debian/rules tries to install a nslcd-kerberos.upstart script but it is named nslcd-k5start.
- debian/rules calls dh_installinit with the --upstart-only option which isn't supported in Debian.
- Passing --noscripts to dh_installinit makes that nslcd is not restarted on upgrades.
- I'm not sure the post-start script in nslcd-k5start works correctly if k5start shouldn't be started ("$K5START_START" != "yes").
(aesthetic point but the scripts use tabs, please only use them in Makefiles)

I've only done a visual inspection of the patch and ran a build but haven't run any further tests. I also don't have a system with upstart handy at the moment.

(I did fix the typo in the development repository so that will go into the next upload)

Thanks for your work on implementing this.

Download full text (3.6 KiB)

Hi Arthur,

Thanks for looking at this, and taking the time to give your feedback.

- If Upstart isn't part of the base Debian install, I'd agree that dropping the init script doesn't make sense, although the race conditions that the Upstart scripts aim to address wouldn't be addressed. Do you know if an equivalent to the "wait-for-start" Upstart script is available in Debian?

I don't think Upstart has any direct knowledge of other service management systems, but I think the only affected functionality would be the `service` command. The Ubuntu manpage for `service` indicates that the Upstart job will take precedence over a System V init script: http://manpages.ubuntu.com/manpages/precise/man8/service.8.html. Looks like the Debian version of `service` would simply ignore the Upstart script: http://manpages.debian.net/cgi-bin/man.cgi?query=service

I'm going to have to do some research on how the two service management systems might co-exist. I think the minimum requirement for co-existence would be a high-level lock to make sure Upstart didn't try to start nslcd while the System V script was running, and vice versa.

- I'm not sure I understand flock's -c option well enough to know what failure modes would result from its absence, although the manpage certainly seems to indicate it's necessary. It bothers me that my tests haven't failed with the -c option absent. From what I'm reading in the manpages, I'd expect to get a pretty noisy failure without the -c option present, since the `start` obviously not a valid file descriptor. I think I need to run more tests...

The "start" command is indeed part of Upstart, though.

- the libsasl2-modules-gssapi-mit | libsasl2-modules-gssapi-heimdal recommendation is present because one of those modules is required to perform GSS-API (in our case, Kerberos) authentication to an LDAP server. So, it's actually a requirement (not just a recommendation) for Kerberos authentication to the LDAP server. I think the ideal solution would a separate `nslcd-kerberos` package with Kerberos-related scripts and those recommendations as requirements. That change seemed outside the scope of the original bug report, though, so I didn't include it in the patch. I'm not sure my initial assessment is correct, though--thoughts?

- good point regarding the checks for /etc/init.d/nscd: fixed

- good point regarding the stopping of nscd in the post-stop script: it's been dropped.

- leftover date commmand: fixed

- if-up.d script's lack of non-Upstart environments noted. I think the fix will depend on how the service management system co-existence is implemented.

- NSLCD_STATEDIR created before parsing: fixed

- I'm not sure what you mean by "prestart or start missing". Are you thinking a pre-start script stanza is mandatory?

- debian/rules tries to install nslcd-kerberos.upstart: fixed

- Debian's lack of --upstart-only noted. (see comments regarding co-existence)

- Passing --noscripts preventing nscld restart noted. (see comments regarding co-existence)

- the post-start script in nslcd-k5start is invoked immediately after the main script starts, because we aren't using the `expect` stanza and daemonizing k5start (k5start d...

Read more...

Arthur de Jong (adejong) wrote :

It may be useful to know that Debian just added some information to policy regarding init systems other than SysV init and even some notes specific to upstart:
http://www.debian.org/doc/debian-policy/ch-opersys.html#s-alternateinit

Ah, good to know. Thanks for the link.

Okay, I've attached revised scripts that should be compatible with Debian Wheezy (to the extent that I can readily test, which is a Wheezy build using pbuilder). The lack of a separate System V init script for nslcd-k5start does not seem to cause any issues.

All the technical issues Arthur raised have been addressed as well, in addition to log() functions that prefix log messages with date and time information.

Arthur de Jong (adejong) wrote :

I have been looking at trying to integrate the patch but I still don't have a really good feeling about this whole upstart thing and I don't really have a proper way to test this.

For example I still don't really understand why the whole thing with the if-up file is required. It seems like a very ugly hack and slows down boot-up by enforcing serial initialisation of network interfaces. Wouldn't something like this work:

start on runlevel [2345] and net-device-added INTERFACE!=lo
(or some other condition which just means that networking is available)

I still can't seem to wrap my mind around how upstart is supposed to work given the examples I've seen though. For example, in Debian there is a file /etc/init/networking.conf which seems to automatically bring down networking if all remote filesystems are unmounted.

For the relation between the nslcd and the nslcd-k5start services, wouldn't it be a nicer solution to only emit an event (for example from the nslcd service configuration) when the nslcd-k5start service is really needed? That way upstart wouldn't try to start it if it isn't needed.

Do you know how the dependency information that is available in the init script can be modelled in upstart? For example nslcd should be running before most mail servers because otherwise mail could bounce.

Also, a nicer solution to the wait until the cache is actually established loop is a trick I've seen in some other upstart script: only define an pre-start script that starts the service and no bare script or exec.

The nslcd upstart job clears the nscd cache. Why is this needed exactly?

It is probably better to avoid /etc/deftault/nslcd altogether for the upstart config and put everything in the upstart config file. It should probably also be OK to hard-code the nslcd user and group names instead of getting it from the configuration.

The call to dh_installinit --name=nslcd-k5start in debian/rules causes a lintian error and a warning. Just installing the file in debian/nslcd.conffile (nslcd.nslcd-k5start.upstart /etc/init/nslcd-k5start.conf) works better. An alternative would be to either also split the init scripts or to combine the upstart configurations.

All in all, I think it is better to have a change like this first uploaded and tested in Ubuntu before I add it to the Debian packages.

I noticed an error in the k5start Upstart script: the settings in the /etc/defaults/nslcd file were being overridden. I've attached a revised patch.

Still working on my reply to Arthur's latest.

Another revision, this time using the correct target state when calling the wait-for-start script, per http://upstart.ubuntu.com/cookbook/#job-states

Okay, another patch rev and a big-huge post addressing Arthur's last concerns (thanks again to him for taking the time to review the patch).

-Regarding serialization of network interface bring-ups, I've addressed this by running the flock command asynchronously. As far as I can tell, this doesn't affect the serialization of nslcd startup attempts, which is all we really care about. Testing on my laptop and desktop has gone well.

This change does mean that there is the possibility of a noticeable delay between the time a network interface comes up and the time nslcd is ready to service requests, but that strikes me as more of a login management concern.

-I thought about only emitting an event when the nslcd-k5start script was actually needed. However, it would have made the nslcd script much more complicated (checking for the appropriate configuration options, etc). Having these checks in the nslcd-k5start scripted seemed like a much better separation of concerns, and also makes a future migration to a nslcd-kerberos package easier.

-I do not know of a way to model the mail server dependency info in Upstart. Perhaps someone on the ubuntu-sponsors list has an idea about this.

-I think Arthur's recommendation about waiting for a cache to be established is quite valid, but using the post-start stanza appears to be more idiomatically correct. See http://upstart.ubuntu.com/cookbook/#post-start

-I added the nscd cache invalidation so the nscd cache was refreshed whenever a connection was established to the authoritative source (i.e. the LDAP directory). However, invalidating the cache has led to problems over unstable (e.g. WiFi) connections: if the connection comes up long enough to invalidate the cache and then drops, the cache contents are lost. On the other hand, relying on the nscd cache means that current, up-to-date user/group information isn't available to the user until the nscd cache expires, times out, or is manually invalidated.

Computing the stability of a network connection is definitely outside the scope of this patch, so I've removed the cache invalidation for now. I think a good short-term solution to the issue of cache invalidation for stable connections would be a flag to control invalidation of the cache. Ideally, this flag would be per-connection.

-I agree that the use of /etc/defaults/nslcd is a bit awkward (and apparently is being deprecated), but it's not only the nslcd user and group that use expansion: several of the k5start-related variables use it as well. I think we're sort of stuck with the defaults file until expansion is available in Upstart's "env" stanza.

-I'm not seeing a lintian warning/error on Ubuntu. Since Arthur's wanting to test this out in Ubuntu first, I'd prefer to leave it as-is until such a time it seems expedient to merge the changes into Debian.

ubuntu-sponsors: it would seem Arthur prefers to have this patch tested in Ubuntu first. What's the next step forward in that process? Also, any thoughts on modeling the mailserver dependency information in Upstart?

Dimitri John Ledkov (xnox) wrote :

* Most mail-servers start on event runlevel 2-5 or started via init scripts. Runlevel event is not emitted, until static-network-up event (network configured) and since nslcd is kicked-off from network configuration, nslcd will come up before staic-network-up/runlevel events. Thus there are sufficient provisions to get nslcd running as early as possible.

I am tempted to upload this into saucy and start testing it there, such that there is enough time to stabilize these jobs before saucy release.

I recommend giving in to this temptation. ;) I'm happy to help with testing in any way I can.

I think that this bug is affecting me at boot. I am having a very slow boot (2 minutes approximately), with some nslcd timeouts. After that, the computer works ok. My distribution is Ubuntu 12.04. Can you suggest me how to workaround it?. Thanks.

Juan,

If nscld timeouts are causing slow boot times (which may or may not be the
case in the situation you describe--I'd recommend disabling nslcd to see if
the boot time improves), I don't know of clean solution except the init
scripts that I've written and attached to this bug report.

HTH

On Mon, Aug 5, 2013 at 8:00 AM, Juan Andrés Ghigliazza <
<email address hidden>> wrote:

> I think that this bug is affecting me at boot. I am having a very slow
> boot (2 minutes approximately), with some nslcd timeouts. After that,
> the computer works ok. My distribution is Ubuntu 12.04. Can you suggest
> me how to workaround it?. Thanks.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/806761
>
> Title:
> Feature Request: Upstart scripts for nslcd
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/nss-pam-ldapd/+bug/806761/+subscriptions
>

Thanks Caleb. I have not tested your upstart script yet, but I could speed up the boot by calling sleep for 10 seconds in nslcd init script when starting, before doing anything else.

Arthur de Jong (adejong) wrote :

Juan,

Can you provide some more information on your boot sequence? nslcd should only hang if it has been started before networking is available (which shouldn't happen because of the init scripts dependencies).

If your connection to the LDAP server is otherwise reliable you could also reduce the bind_timelimit and reconnect_retrytime options to reduce the delay.

Another patch rev: after upgrading to Raring (nss-pam-ldapd 0.8.10), the expect fork stanza no longer behaves correctly. Running script to determine fork count (http://upstart.ubuntu.com/cookbook/#how-to-establish-fork-count) indicated 6 calls to fork or clone when nslcd starts.

To address this issue, the latest patch rev runs nslcd in debug/foreground mode (-d). This has the added benefit of logging debug output to the nslcd job's log file, although it's a bit verbose for normal operation.

Arthur de Jong (adejong) wrote :

It is not recommended to run nslcd in debug mode in production.

Anyway, on start-up nslcd will call daemon() to daemonise. I thought that daemon() called fork() twice but according to the manual page it only forks once. After that, it starts a number of threads (configured by the threads option in nslcd.conf) and optionally starts another sub-process to do cache invalidation. This last process is only started in 0.9.0 and later if configured and is started before dropping privileges so runs as root (while other processes commonly run as user nslcd).

Hi Arther,

Good to know. I think the behavior I'm seeing is most likely a result of
some change to Upstart, because I'm using the same modified 0.8.10-1
package that I've used in the past. Even so, the process ID that Upstart
decides to track is very definitely incorrect whether I track the first or
second call to fork(). I'm running `ps auxw | grep nslcd` to verify the PID.

Running in the foreground is the only way Upstart will reliably select the
correct process ID. This is true for both my main workstation and a
virtualized testbed. Is there some way to run nslcd in the foreground
without enabling debug mode?

On Wed, Aug 14, 2013 at 3:58 AM, Arthur de Jong <email address hidden> wrote:

> It is not recommended to run nslcd in debug mode in production.
>
> Anyway, on start-up nslcd will call daemon() to daemonise. I thought
> that daemon() called fork() twice but according to the manual page it
> only forks once. After that, it starts a number of threads (configured
> by the threads option in nslcd.conf) and optionally starts another sub-
> process to do cache invalidation. This last process is only started in
> 0.9.0 and later if configured and is started before dropping privileges
> so runs as root (while other processes commonly run as user nslcd).
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/806761
>
> Title:
> Feature Request: Upstart scripts for nslcd
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/nss-pam-ldapd/+bug/806761/+subscriptions
>

Arthur de Jong (adejong) wrote :

Currently nslcd does not support not forking into the background outside of debug mode.

The pid of nslcd can be reliably determined by looking at /var/run/nslcd/nslcd.pid.

Good to know about the PID file, but as far as I know, there is no
mechanism in Upstart for utilizing PID files. This is an intentional design
decision, see
https://lists.ubuntu.com/archives/upstart-devel/2008-August/000735.html(the
section on the "pid file" stanza being removed)

Is there any technical reason nslcd can't have a foreground mode? I'm happy
to dive into the code and see about making the necessary modifications, but
I wouldn't want to invest the time unless it's a feasible option.

On Thu, Aug 15, 2013 at 1:04 AM, Arthur de Jong <email address hidden> wrote:

> Currently nslcd does not support not forking into the background outside
> of debug mode.
>
> The pid of nslcd can be reliably determined by looking at
> /var/run/nslcd/nslcd.pid.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/806761
>
> Title:
> Feature Request: Upstart scripts for nslcd
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/nss-pam-ldapd/+bug/806761/+subscriptions
>

Arthur de Jong (adejong) wrote :

According to the mailing list post you would expect that "expect fork" should be the right thing to do.

If you really want to implement a command-line switch for this (I think it is a bit silly to have to do this for upstart), please name it -n (this seems to be used by a few daemons that provide such an option). The change itself shouldn't be too complicated.

I'd agree that "expect fork" looks like the correct approach, and until the
latest release, using that stanza seemed to work without issue.

It's true that adding functionality to accommodate a service management
system is a poor separation of concerns, but nslcd's generation of a PID
file does establish a precedent for doing so, as does other daemons' use of
the -n switch. it seems making this sort of compromises is a fairly common
practice.

Ultimately, though, I'm just trying to follow the path of least resistance
to my goal of making it painless to have mobile access to LDAP account
information. No custom scripts, no PPAs, no service restarts when I boot up
my laptop and have to manually establish a wireless connection in a coffee
shop. Right now, adding a -n switch to nslcd seems like the next step
forward on that path.

On Sat, Aug 17, 2013 at 2:33 PM, Arthur de Jong <email address hidden> wrote:

> According to the mailing list post you would expect that "expect fork"
> should be the right thing to do.
>
> If you really want to implement a command-line switch for this (I think
> it is a bit silly to have to do this for upstart), please name it -n
> (this seems to be used by a few daemons that provide such an option).
> The change itself shouldn't be too complicated.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/806761
>
> Title:
> Feature Request: Upstart scripts for nslcd
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/nss-pam-ldapd/+bug/806761/+subscriptions
>

Arthur de Jong (adejong) wrote :

I've merged your change upstream in both the 0.8 and 0.9 branches. Attached is a patch that should be suitable for dropping in debian/patches for version 0.8.13-2.

Excellent, many thanks!

On Sun, Aug 18, 2013 at 6:25 AM, Arthur de Jong <email address hidden> wrote:

> I've merged your change upstream in both the 0.8 and 0.9 branches.
> Attached is a patch that should be suitable for dropping in
> debian/patches for version 0.8.13-2.
>
> ** Patch added: "implement-nofork.patch"
>
> https://bugs.launchpad.net/ubuntu/+source/nss-pam-ldapd/+bug/806761/+attachment/3776774/+files/implement-nofork.patch
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/806761
>
> Title:
> Feature Request: Upstart scripts for nslcd
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/nss-pam-ldapd/+bug/806761/+subscriptions
>

Arthur,

My problem seems to be a race condition between nslcd and nslcd-k5start, as Calleb pointed it out in an old post. I put a sleep nearly the end of k5start_start function in my Ubuntu 12.04 /etc/init.d/nslcd script, and it works much better. I don't see more "ldap_result() timed out" at boot time; now I only see some of them when I log out from the system (but not when I shut down it).

Thanks very much.

Errata: my problem seems to be unrelated to this bug. It was caused because of a huge growth in our LDAP directory over the lasts weeks. The new data is generated from an application that has nothing to do with OS users and groups, so I could solve it writting some LDAP filters in nslcd.conf.

My aplogies, and thanks very much.

Luke Faraone (lfaraone) wrote :

0.8.13-3 is in trusty, so the patch contributed here has been included in Ubuntu. As such, I'm marking the bug "fix released" and unsubscribing ~ubuntu-sponsors.

Thank you for working through this process to get your contribution merged!

Changed in nss-pam-ldapd (Ubuntu):
status: In Progress → Fix Released

Hi Luke, thanks for the good news!

Please note that rev 10 of the patch uses debug mode ("-d"), which Arthur (the upstream maintainer) does not recommend. I've attached another patch rev that follows the upstream recommendation by using the foreground ("-n") commandline switch. It is otherwise identical to patch rev 10.

Luke Faraone (lfaraone) wrote :

Ah, I misunderstood the current state of the package. Sure, I'll take a look at the init script and see whether we can't get it included in an Ubuntu revision, as I see upstream would prefer to have it validated downstream first.

Changed in nss-pam-ldapd (Ubuntu):
assignee: Caleb Callaway (enlightened-despot) → Luke Faraone (lfaraone)
status: Fix Released → In Progress
Luke Faraone (lfaraone) wrote :

You have trailing whitespace in debian/nslcd.nslcd-k5start.upstart on line 4 and 43

In debian/nslcd.init , please use the mechanism described in https://wiki.ubuntu.com/UpstartCompatibleInitScripts to check if upstart is the init system, and check in targets besides the "start" target.

Please run "update-maintainer" to change the maintainer to the appropriate target for Ubuntu, and test your changes on top of 0.8.13.

The patch looks good otherwise. Please resubscribe ~ubuntu-sponsors when the above is fixed.

Changed in nss-pam-ldapd (Ubuntu):
assignee: Luke Faraone (lfaraone) → nobody
status: In Progress → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for nss-pam-ldapd (Ubuntu) because there has been no activity for 60 days.]

Changed in nss-pam-ldapd (Ubuntu):
status: Incomplete → Expired
Ricky C (it-omegadiamond) wrote :

Ok, so I took it upon myself to make some of the requested changes, as far as my current ability and worktime allows:
* Cleared the trailing whitespace.
* Updated the nslcd.init patch to use init_is_upstart

However, while I can update a patch, I have no knowledge as yet on how/where to specify the Depends: lsb-base (>= 4.1+Debian3) nor do I know how to "handle stopping the service in their preinst when upgrading from a pre-upstart-capable version" as per the final section of https://wiki.ubuntu.com/UpstartCompatibleInitScripts

Likewise the maintainer update was not done for similar reasons...

Sorry I can't yet be of more help.

Hi Ricky, thanks for taking a look at this! Sorry I didn't get a chance to follow up sooner.

Unfortunately it looks like futher efforts on this patch aren't worthwhile: Ubuntu will soon be moving to the systemd init system. See http://www.markshuttleworth.com/archives/1316. In the meantime, it should be pretty straight-forward to manually apply the Upstart scripts supplied in the patch.

Ricky C (it-omegadiamond) wrote :

That I did - though I forgot to set the if-up.d script executable. :P That
caused a pile of headaches until I checked my assumptions!

Interesting. Despite my longtime use of Ubuntu flavors, I'd barely gotten
used to the switch to Upstart! Ah, well. :)

Thanks,
Ricky

On Tue, May 6, 2014 at 6:06 PM, Caleb Callaway <email address hidden>wrote:

> Hi Ricky, thanks for taking a look at this! Sorry I didn't get a chance
> to follow up sooner.
>
> Unfortunately it looks like futher efforts on this patch aren't
> worthwhile: Ubuntu will soon be moving to the systemd init system. See
> http://www.markshuttleworth.com/archives/1316. In the meantime, it
> should be pretty straight-forward to manually apply the Upstart scripts
> supplied in the patch.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/806761
>
> Title:
> Feature Request: Upstart scripts for nslcd
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/ubuntu/+source/nss-pam-ldapd/+bug/806761/+subscriptions
>

GB (godmar) wrote :

Our nslcd (0.8.4) keeps crashing on Ubuntu 12.04, like so:

[2221821.130260] nslcd[17641]: segfault at 0 ip 00000000004159c8 sp 00007f11fd78b6a0 error 4 in nslcd[400000+21000]
[4448958.894131] nslcd[31759]: segfault at 0 ip 00000000004159c8 sp 00007fa87d3826a0 error 4 in nslcd[400000+21000]

and then it does not restart.

Would having a /etc/init/nslcd.conf (with an appropriate respawn) fix this issue, or is there a way to fix it within the old 'service' framework?

Any help would be appreciated - whenever it crashes (which happens every few weeks) some user with a local account has to log on and restart it.

@GB you might be able to work around the problem with an Upstart script, but you're probably better off getting the latest version of nslcd installed. 0.8.4 is fairly out-of-date, so it's likely the issue has been resolved in a newer release.

Adam Thompson (athompso) wrote :

Per #54... yes, but for one thing: 14.04 LTS is still in the support cycle for several more years, and we're just now starting to deploy 14.04 across the board for developer desktops at work.

Because I have more than one AD server, I'm using "uri DNS" (see https://bugs.launchpad.net/ubuntu/+source/nss-pam-ldapd/+bug/1449168) and that fails because of the lack of dependency tracking between nslcd and network-manager.

All the workstations I've integrated are unusable after reboot) because nslcd starts up before network-manager finishes setting up the DHCP interface, and the DNS lookup fails.

Since the problem is known AND a solution is known, I would expect a fix to come out in the 14.04 LTS support cycle.
Or is Canonical also subject to CADT (http://www.jwz.org/doc/cadt.html) ?

Adam Thompson (athompso) wrote :

CADT, anyone? As we're waiting for a fix to come out, the bug expires because we're all waiting. I guess that's one way to "solve" the problem...

Changed in nss-pam-ldapd (Ubuntu):
status: Expired → Incomplete
Luke Faraone (lfaraone) wrote :

Adam Thompson: That attitude really isn't productive, nor is calling out Canonical even correct. This package isn't in "main", which means it is supported by the community, not Canonical.

The bug expired because it stayed in an "incomplete" state for 60 days, after feedback was given to the submitted patch. I see the patch was updated on 2014-05-06, but the bug status was not changed to re-request review.

Changed in nss-pam-ldapd (Ubuntu):
status: Incomplete → Confirmed
Adam Thompson (athompso) wrote :

Ah... I had failed to notice the component from was universe. I believed, incorrectly, that it was a Canonical-supported component.
I'll switch to SSSD.

Rolf Leggewie (r0lf) wrote :

lucid has seen the end of its life and is no longer receiving any updates. Marking the lucid task for this ticket as "Won't Fix".

Changed in nss-pam-ldapd (Ubuntu Lucid):
status: New → Won't Fix
Rolf Leggewie (r0lf) wrote :

Given the amount of work that went into this ticket it is indeed sad to see that upstart is now no longer the init system of Ubuntu. I guess it would be nice to fix Precise and Trusty via an SRU.

Should the main task be set to wontfix? Is upstart still developed at all?

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nss-pam-ldapd (Ubuntu Precise):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers