libnss-ldap: calls to initgroups() causes boot to hang when using 'bind_policy hard'

Bug #155947 reported by cropr on 2007-10-22
88
Affects Status Importance Assigned to Milestone
libnss-ldap (Debian)
Fix Released
Unknown
libnss-ldap (Ubuntu)
Undecided
Dustin Kirkland 

Bug Description

Binary package hint: libnss-ldap

When during the ldap configuration the passwd group and shadow fields of the nsswitch.conf file are changed to "files ldap", the system behaves correctly: "getent passwd" shows the added LDAP users. After this change, Ubuntu Gutsy hangs on reboot with the last message shown: "Starting kernel log daemon...".

After booting with the recovery kernel, I put the passwd, group a,d shadow fields in nsswitch.conf to "files" and I get again a bootable system.

I did not see this buggy behaviour on edgy or feisty.

Ik ben afwezig tot maandag 5 november. U kunt zonodig contact opnemen met Dhr. M. Papenhove, <email address hidden> of Dhr E. Papenhove, <email address hidden>, tel. 0172-491416.

Met vriendelijke groet,
Sebastiaan Veldhuisen

Ik ben afwezig tot maandag 5 november. U kunt zonodig contact opnemen met Dhr. M. Papenhove, <email address hidden> of Dhr E. Papenhove, <email address hidden>, tel. 0172-491416.

Met vriendelijke groet,
Sebastiaan Veldhuisen

I have reported the duplicate of this at #156562:

I have installed libnss-ldap and related packages in order to authenticate against LDAP server.
During installation, I have been asked for server information, LDAP version etc.
However, this information has been placed NOWHERE in /etc. It seems that Ubuntu simply drops the information! It dosent set up ANY needed configuration for LDAP auth!
Strange, that there is /etc/ldap.conf both with /etc/ldap/ldap.conf (this one is not efective, all rows commented)

Then I managed all the needed files manually: /etc/ldap/ldap.conf, /etc/ldap/ldapserver, /etc/pam.d/login, /etc/pam.d/common-account, /etc/nsswitch.conf and so on. I did it the way that has WORKED with Feisty Fawn and Debian Etch before.

After reboot, system fails to boot. It stops on "Starting kernel log daemon" forever.

When I set passwd, group and shadow parameters in nsswitch.conf back to "compat", it starts normally, however when I set them to "compat ldap files", fails to boot again.

tuharsky (tuharsky) wrote :

Id like to get emails about this.. I gonna turn back to Feisty.

Ik ben afwezig tot maandag 5 november. U kunt zonodig contact opnemen met Dhr. M. Papenhove, <email address hidden> of Dhr E. Papenhove, <email address hidden>, tel. 0172-491416.

Met vriendelijke groet,
Sebastiaan Veldhuisen

After Gutsy Gibbon installed I was getting
    fsck.ext3: Device or resource busy while trying to open /dev/sda1
This is my /boot partition.

Then the hang at:
    Starting kernel log daemon...

Thanks for posting your messages and "limp home work arounds".

I can confirm I have the same issue, and have solved it booting in rescue mode then removing ldap from nsswitch.conf
Although this isn't a long term solution atleast my machine is bootable.
My main motivation for upgrading to Gutsy was to use the AuthClientConfig model

There is discussion of this issue here http://ubuntuforums.org/showthread.php?t=583726
With some promising solutions:-
    "I had the same problem, I have just resolved it few hours ago by changing bind_policy from hard to soft."

Jamie Strandboge (jdstrand) wrote :

Thank you for reporting this bug and helping to make Ubuntu even better. I haven't looked at this very closely yet, but think this may be network related, which is why 'bind_policy soft' would help (ldap will fail immediately, rather than waiting for a *long* time with 'bind_policy hard'). If the network is starting after LDAP lookups are being made, you will have this problem.

Until this issue is properly fixed, workarounds are the same as for sometimes disconnected users (eg laptops):

1. use 'bind_policy soft' in /etc/ldap.conf
2. use libnss-db
3. use libpam-ccreds

A desktop system may be able to get away with just the first, whereas laptops will need all three. Though bug 51315 is unrelated, the workarounds people used there may also be helpful.

Note that assuming it is indeed just a networking issue, the machine will *eventually* come up with 'bind_policy hard', but it will be several minutes (it would be helpful if you confimed this).

BTW: the fsck error should be unrelated and is probably just that /dev/sda1 is mounted read/write when fsck is trying to run.

Jamie Strandboge (jdstrand) wrote :

Thank you for reporting this bug and helping to make Ubuntu even better. I haven't looked at this very closely yet, but think this may be network related, which is why 'bind_policy soft' would help (ldap will fail immediately, rather than waiting for a *long* time with 'bind_policy hard'). If the network is starting after LDAP lookups are being made, you will have this problem.

Until this issue is properly fixed, workarounds are the same as for sometimes disconnected users (eg laptops):

1. use 'bind_policy soft' in /etc/ldap.conf
2. use libnss-db
3. use libpam-ccreds

A desktop system may be able to get away with just the first, whereas laptops will need all three. Though bug 51315 is unrelated, the workarounds people used there may also be helpful.

Note that assuming it is indeed just a networking issue, the machine will *eventually* come up with 'bind_policy hard', but it will be several minutes (it would be helpful if you confirmed this).

BTW: the fsck error should be unrelated and is probably just that /dev/sda1 is mounted read/write when fsck is trying to run.

Ik ben afwezig tot maandag 5 november. U kunt zonodig contact opnemen met Dhr. M. Papenhove, <email address hidden> of Dhr E. Papenhove, <email address hidden>, tel. 0172-491416.

Met vriendelijke groet,
Sebastiaan Veldhuisen

No, it dosen't come up after several minutes for me. I have left it for some 15 hours and it dosen't move on.

I'll try the soft mode.

tuharsky (tuharsky) wrote :

The soft mode works for me.

Ik ben afwezig tot maandag 5 november. U kunt zonodig contact opnemen met Dhr. M. Papenhove, <email address hidden> of Dhr E. Papenhove, <email address hidden>, tel. 0172-491416.

Met vriendelijke groet,
Sebastiaan Veldhuisen

Ik ben afwezig tot maandag 5 november. U kunt zonodig contact opnemen met Dhr. M. Papenhove, <email address hidden> of Dhr E. Papenhove, <email address hidden>, tel. 0172-491416.

Met vriendelijke groet,
Sebastiaan Veldhuisen

tuharsky: I am glad the workaround is working for you. It was discussed that this may be a bug 38203 (or a variant). Do you have the nvram group in /etc/group? What does 'getent group nvram' return on and off the network?

Note: the nvram bug should be fixed, but it could still be another group or user, in which case will have to pinpoint it.

Changed in libnss-ldap:
assignee: nobody → jamie-strandboge
status: New → Incomplete

Ik ben afwezig tot maandag 5 november. U kunt zonodig contact opnemen met Dhr. M. Papenhove, <email address hidden> of Dhr E. Papenhove, <email address hidden>, tel. 0172-491416.

Met vriendelijke groet,
Sebastiaan Veldhuisen

Ik ben afwezig tot maandag 5 november. U kunt zonodig contact opnemen met Dhr. M. Papenhove, <email address hidden> of Dhr E. Papenhove, <email address hidden>, tel. 0172-491416.

Met vriendelijke groet,
Sebastiaan Veldhuisen

Changed in libnss-ldap:
importance: Undecided → Medium
status: Incomplete → Confirmed

Ik ben afwezig tot maandag 5 november. U kunt zonodig contact opnemen met Dhr. M. Papenhove, <email address hidden> of Dhr E. Papenhove, <email address hidden>, tel. 0172-491416.

Met vriendelijke groet,
Sebastiaan Veldhuisen

Ik ben afwezig tot maandag 5 november. U kunt zonodig contact opnemen met Dhr. M. Papenhove, <email address hidden> of Dhr E. Papenhove, <email address hidden>, tel. 0172-491416.

Met vriendelijke groet,
Sebastiaan Veldhuisen

I have had some time to look at this more and do not believe it is a problem with non-existent groups. When ldap is supplied in ldap.conf, then glibc will check ldap when glibc's initgroups() is used. initgroups() is used by start-stop-daemon (which starts most daemons on boot). By definition, initgroups() tries to find all the groups a particular user is in, so glibc checks nsswitch.conf for all available nameservices (eg both 'file' and 'ldap'), and uses them.

With this in mind, I configured a feisty and gutsy machine to have identical /etc/nsswitch.conf files, and equivalent /etc/ldap.conf files (feisty uses /etc/libnss-ldap.conf instead of /etc/ldap.conf). Can you post your /etc/ldap.conf and /etc/libnss-ldap.conf-dpkg.old files (the second should be an automatic backup that was created with debconf if you decided to not manually configure). If you do not have /etc/libnss-ldap.conf-dpkg.old, please post your previous version of /etc/libnss-ldap.conf.

Changed in libnss-ldap:
importance: Medium → Undecided
status: Confirmed → Incomplete
Jamie Strandboge (jdstrand) wrote :

That should have said:
'When ldap is supplied in /etc/nsswitch.conf'

Jamie Strandboge (jdstrand) wrote :

I also forgot to mention that with identical feisty and gutsy configurations I was unable to reproduce the error.

tuharsky (tuharsky) wrote :

I do have nvram group there. I got nvram:x:105: when on network. This line is also in /etc/group

As of my ldap.conf, nothing particulary interesting goes there. I just set upd the lines: host, base, uri, nss_base_passwd, nss_base_group, nss_base_shadow. Commented out rootbinddn.

Greek Ordono (grexk) wrote :

I also encounter this problem when the LDAP server is down. My system hangs after "Starting system message bus dbus".

Ik ben afwezig tot maandag 5 november. U kunt zonodig contact opnemen met Dhr. M. Papenhove, <email address hidden> of Dhr E. Papenhove, <email address hidden>, tel. 0172-491416.

Met vriendelijke groet,
Sebastiaan Veldhuisen

Can you try to use you old libnss-ldap.conf file and reboot, then post your results? It should be found in /etc/libnss-ldap.conf-dpkg.old.

Ik ben afwezig tot maandag 5 november. U kunt zonodig contact opnemen met Dhr. M. Papenhove, <email address hidden> of Dhr E. Papenhove, <email address hidden>, tel. 0172-491416.

Met vriendelijke groet,
Sebastiaan Veldhuisen

After configuring my system to use libpam-ccreds(https://help.ubuntu.com/community/PamCcredsHowto) the system hangs up at "Starting system message bus dbus" for almost 5 minutes.

/etc/ldap.conf:
base dc=domain,dc=com
uri ldap://remote.domain.com
ldap_version 3
binddn cn=nssldap,ou=DSA,dc=domain,dc=com
bindpw nssldap1234
bind_policy soft
nss_reconnect_tries 1
nss_reconnect_sleeptime 1
nss_reconnect_maxsleeptime 8
nss_reconnect_maxconntries 2
nss_base_passwd ou=Users,dc=domain,dc=com?one
nss_base_passwd ou=Computers,dc=domain,dc=com?one
nss_base_shadow ou=Users,dc=domain,dc=com?one
nss_base_group ou=Groups,dc=domain,dc=com?one
ssl off
pam_password md5

/etc/nsswitch.conf
passwd: compat db [NOTFOUND=return] ldap
group: compat db [NOTFOUND=return] ldap

Toby Collett (thjc) wrote :

I am also seeing these symptoms, leaving for a number of minutes does not result in a login. It may be worth noting that booting in recovery/single user mode and then performing a telinit 5 results in a system that works fine.

Toby Collett (thjc) wrote :

Sorry, Correction to my last comment, boot to single user, then bring up network, then telinit 5. The bring up the network step is not needed if the network interface is set to auto in /etc/network/interfaces

Cory Albrecht (bytor) wrote :

I have this hang on boot after "Starting kernel logging daemon", in Gutsy server x86, too. Fortunately the "bind_policy soft". Any idea when this status will move from "Incomplete" to "confirmed" and get an importance level attached?

Toby Collett (thjc) wrote :

After some more testing.
1) bind_policy soft reduces the delay to reasonable (I was initially using a set of config files from feisty on a fresh gutsy install so the bind_policy was incorrectly set in libnss-ldap.conf)
2) booting to single user and then starting services one at a time klogd is the first service to hang starting, dbus appears to as well. Both of these start a long time before the dbusdhcp service which I assume is what starts up the network in a fresh gutsy install

Bart (marc-lecrosnier-enensys) wrote :

Same problem on the first two computers of the company that we have migrated to gutsy.
I will do some investigations on a fresh install.

Lars Kneschke (lkneschke) wrote :

From my point of view the problem is not located in the nss_ldap but in nss_compat/nss_files.

I have 2 servers. Both have a local openldap server running.

I have following lines in nsswitch.conf

passwd: compat ldap
group: compat ldap
shadow: compat ldap

And I have configured following lines in ldap.conf on the affected server

host 127.0.0.1 172.17.7.15

And I have following lines in /etc/passwd

syslog:x:101:102::/home/syslog:/bin/false
klog:x:102:103::/home/klog:/bin/false

I should never see any ldap queries on server 172.17.7.15 during the boot process of the affected server, because all needed information should be in /etc/passwd.

But when I enabled logging on the ldap server 172.17.7.15(which is already up and running) I can see following lines:
conn=16 fd=21 ACCEPT from IP=172.17.7.201:52540 (IP=0.0.0.0:389)
conn=17 fd=22 ACCEPT from IP=172.17.7.201:52542 (IP=0.0.0.0:389)
conn=16 op=0 BIND dn="" method=128
conn=16 op=0 RESULT tag=97 err=0 text=
conn=17 op=0 BIND dn="" method=128
conn=17 op=0 RESULT tag=97 err=0 text=
conn=16 op=1 SRCH base="dc=schule,dc=loc" scope=2 deref=0 filter="(&(objectClass=posixAccount)(uid=syslog))"
conn=17 op=1 SRCH base="dc=schule,dc=loc" scope=2 deref=0 filter="(&(objectClass=posixAccount)(uid=klog))"
conn=17 op=1 SEARCH RESULT tag=101 err=0 nentries=0 text=
conn=17 op=2 SRCH base="dc=schule,dc=loc" scope=2 deref=0 filter="(&(objectClass=posixGroup)(memberUid=klog))"
conn=17 op=2 SRCH attr=gidNumber
conn=17 op=2 SEARCH RESULT tag=101 err=0 nentries=0 text=
conn=17 fd=22 closed (connection lost)
conn=16 op=1 SEARCH RESULT tag=101 err=0 nentries=0 text=
conn=16 op=2 SRCH base="dc=schule,dc=loc" scope=2 deref=0 filter="(&(objectClass=posixGroup)(memberUid=syslog))"
conn=16 op=2 SRCH attr=gidNumber

As you can see, the affected server is trying to look up the uid's of the accounts syslog and klog from the ldap directory. This should never happen, as these information are stored in /etc/passwd and nss_compat/nss_files should be able to lookup these informations already. nss_ldap should never get a request for these accounts.

Kevin Slater (kevin-slater) wrote :

I think I agree with Lars assessment. Here's my datapoint to support that -

In the ldap client authentication configuration wiki page they discuss using the nss_updatedb utility to build a local cache of the passwords/groups that are stored in ldap. This would typically be used with a laptop configuration where you could find yourself away from the network with your ldap authentication server. The configuration instructions call for the following changes to nsswitch.conf:

passwd: files ldap [NOTFOUND=return] db
group: files ldap [NOTFOUND=return] db

The expected behavior is that first the local files will be checked for id/group information, then the ldap server will be tried, if the ldap server is unable to be reached, then the cached database information will be checked. The page on the wiki has notes saying that although this *should* work it doesn't.

Could it be that the libraries aren't paying proper attention to the order of the methods specified in the configuration?

Jamie Strandboge (jdstrand) wrote :

marking confirmed as many people are experiencing the problem.

Changed in libnss-ldap:
status: Incomplete → Confirmed
Sean (svk-sweng) wrote :

Happens for me on:

Gutsy desktop
Using ldap client
I tried to follow the directions for working with nss-updatedb. That results in putting ldap in the nsswitch.conf. That seems to do it for me. I have to boot into single user mode and edit that file back to compat.

Sean

alberto (sp-3fnbvbnsai43) wrote :

I can also confirm the problem, but I think there are actually two issues going on.

The first is the fact that systems need to boot when ldap is not available. The soft option does fix this problem (and the local database cache should work for mobile users).

The second problem is the order of when the network starts. In the current debian based distros it is too late. If the ldap server(s) are online there is no reason why we should have to fail on the query. The problem is that the network doesn't seem to be fully online. This is also seen by the fact that I don't always get my NFS mounts to work. They are in /etc/fstab but most of the time I have to log in as root after the system boots and do a: mount -a -t nfs

Alberto

I got the problem on updating a machine yesterday.

I used my backup of ldap.conf and applied 'bind_policy soft' in /etc/ldap.conf
This didn't work for me.

I got it running by editing /etc/nsswitch.conf:
 passwd: files [UNAVAIL=return] ldap
 group: files [UNAVAIL=return] ldap
 shadow: files [UNAVAIL=return] ldap

I just added the [UNAVAIL=return] on these three lines. This was not necessary on prior Ubuntu versions.
I hope this helps on finding the bug. If you want me to send som conf-files or logs just tell me what you want to have.

I think that you meant to add the following to /etc/nsswitch.conf

passwd: files ldap [UNAVAIL=return]
group: files ldap [UNAVAIL=return]
shadow: files ldap [UNAVAIL=return]

You want it to give up trying ldap when it can't reach the ldap server.

But, this did not work for me with the problem that I was seeing. I ended up adding a host line of

host 127.0.0.1 123.123.123.123

in /etc/ldap.conf. Fortunately, we have multiple ldap servers running and the master will stay at feisty until I see that this problem has been resolved. If this is your only ldap server and you are running gutsy I don't see how to work around this problem.

Bill

Greek Ordono (grexk) wrote :

Anyone wants to try this script I'm using for configuring my gutsy workstations. Issues/problems I encounter so far are the following:

1. Unplug network cached credentials work, plugged doesn't work.
2. LDAP server down clients can boot but gdm doesn't work.

Thanks Bill MacAllister, you're right.
This didn't work.

I will reinstall the machine because I will need it up and running again.
I hope someone will fix this soon.

Thanks for all commenters!

Jamie Strandboge (jdstrand) wrote :

This may be related to https://bugs.launchpad.net/ubuntu/+source/libnss-ldap/+bug/51315. Can people test the solutions there (look towards the bottom)?

--On Wednesday, January 23, 2008 10:13:10 PM +0000 Jamie Strandboge
<email address hidden> wrote:

> This may be related to https://bugs.launchpad.net/ubuntu/+source/libnss-
> ldap/+bug/51315. Can people test the solutions there (look towards the
> bottom)?

The symptom that is reported here is not the problem that I was seeing.

Here is a quick summary of what I saw.

  1. Successfully running libpam_ldap and libnss_ldap on gutsy
  2. Aptitude upgrade merged two config files into /etc/ldap.conf
  3. Reboot of the system runs to login prompt, BUT no scripts in
     /etc/init.d are run after the kernel log.

The solution was to change the host line in the /etc/ldap.conf file to:

  host 127.0.0.1 123.123.123.123

where 123.123.123.123 is a working ldap server in the network. This works
okay, but it is really unclear how you bring up the first server with ldap
and libnss_ldap. This has been a problem for long enough I am quickly
coming to the opinion that you should _never_ run the ldap server on a
system that is using libnss_ldap. This is a big pain in the neck since two
sites that I manage are configured to have a local ldap replica to improve
reliability. Currently, running a local ldap server is fine if you can
ever get the system booted.

Bill

+---------------------------------------------------------------------
| Bill MacAllister <email address hidden>
| Systems Programmer, ITS Unix Systems, Stanford University

a workaround (working in my setup) to stop calling ldap for local system users and groups:

add this line to /etc/ldap.conf (adapt it to your setup):

nss_initgroups_ignoreusers root,root.slocate,daemon,bin,sys,sync,games,man,lp,mail,news,uucp,proxy,www-data,backup,list,irc,gnats,nobody,dhcp,syslog,klog,avah
i-autoipd,messagebus,avahi,cupsys,haldaemon,hplip,statd,ntp,sshd,beagleindex,clamav

with a bind policy soft.

Hope this helps

Sebastiaan

Is there any reason that this shouldn't be used on any system that uses
libnss_ldap? Indeed, it seems that it would make sense to just include all
the groups in the /etc/group file in this list. Am I missing something?

Bill

--On Wednesday, February 13, 2008 06:20:35 PM +0000 Sebastiaan Veldhuisen
<email address hidden> wrote:

> a workaround (working in my setup) to stop calling ldap for local system
> users and groups:
>
> add this line to /etc/ldap.conf (adapt it to your setup):
>
> nss_initgroups_ignoreusers
> root,root.slocate,daemon,bin,sys,sync,games,man,lp,mail,news,uucp,proxy,w
> ww-data,backup,list,irc,gnats,nobody,dhcp,syslog,klog,avah
> i-autoipd,messagebus,avahi,cupsys,haldaemon,hplip,statd,ntp,sshd,beaglein
> dex,clamav
>
> with a bind policy soft.
>
> Hope this helps
>
> Sebastiaan

+---------------------------------------------------------------------
| Bill MacAllister <email address hidden>
| Systems Programmer, ITS Unix Systems, Stanford University

This workaround is only relevant if you use nss with ldap. It prevents
group lookups for the users that are provided with
nss_initgroups_ignoreusers. The list excludes users for group lookups in
LDAP, not the other way around. I think the bug is relevant to
libnss_ldap, because my system boots ok with this fix.

Sebastiaan

Bill MacAllister wrote:
> Is there any reason that this shouldn't be used on any system that
> uses libnss_ldap? Indeed, it seems that it would make sense to just
> include all the groups in the /etc/group file in this list. Am I
> missing something?
>
> Bill
>
> --On Wednesday, February 13, 2008 06:20:35 PM +0000 Sebastiaan
> Veldhuisen <email address hidden> wrote:
>
>> a workaround (working in my setup) to stop calling ldap for local system
>> users and groups:
>>
>> add this line to /etc/ldap.conf (adapt it to your setup):
>>
>> nss_initgroups_ignoreusers
>> root,root.slocate,daemon,bin,sys,sync,games,man,lp,mail,news,uucp,proxy,w
>>
>> ww-data,backup,list,irc,gnats,nobody,dhcp,syslog,klog,avah
>> i-autoipd,messagebus,avahi,cupsys,haldaemon,hplip,statd,ntp,sshd,beaglein
>>
>> dex,clamav
>>
>> with a bind policy soft.
>>
>> Hope this helps
>>
>> Sebastiaan
>
>
>
>
> +---------------------------------------------------------------------
> | Bill MacAllister <email address hidden>
> | Systems Programmer, ITS Unix Systems, Stanford University
>

An added twist to this bug...
I've been running LDAP servers on Feisty and Breezy and Hoary for a long time, without this problem.
The latest couple I've set up have shown me this problem. This problem is very significant in this case because the LDAP servers themselves are unable to boot. Because slapd is started late in the boot-process by default (S41 in rc2.d), we find ourselves caught in the middle of a race condition, or rather deadlock. klogd can't boot because it is waiting for the LDAP server which will not initialize because it's waiting on klogd.

My initial work-around was simply moving SLAPD to start as S13 in rcS.d, right after networking. However, reading how much of a problem this is for Gutsy, I will be considering other tactics. I recently built a significant fraction of an infrastructure on Gutsy, and have been incorporating each machine into LDAP. Since they are all static-addressed and the network is available and fast, I've not seen this problem on non-LDAP servers to date... but each LDAP server I have built lately has been a headache (since it's been just enough time since the last one that I forget to modify the boot order).

Best of luck to you guys. Keep up the great work! I'm looking forward to Hardy.
Matt

Given that Hardy defaults to roaming network mode and that any use of libnss-ldap will result in a non-bootable machine, shouldn't the importance of this bug be made critical?

Sorry, I'm not trying to rant - I just want to make sure appropriate resources are made available to fix this for an LTS release :)

Guy Van Sanden (gvs) wrote :

I Agree with Darren.

This bug has been biting people sinds Dapper. And it's not only laptops. If you have an Ubuntu server in a network where the LDAP is down or unreachable for some reason, it no longer boots.

gheeke (gerhard-heeke) wrote :

I do also think, that a fix should made available for an LTS release. libnss-ldap is critical for an enterprise environment.

Lars Kneschke (lkneschke) wrote :

I'm pretty sure that this is not a problem of nss-ldap but nss-files or the nss subsystem at all!

nss-ldap should never get asked about any account informations (root, daemon,...) found by nss-files already.

And yes, it is really a pita that no one took care for this major bug already.

Dustin Kirkland  (kirkland) wrote :

Hi all,

I've been trying for several days to reproduce this problem in Hardy, without any luck.

Can anyone confirm that this problem is endemic in Hardy?

If so, can you please provide very detailed instructions to reproduce? Also, please note your udev version.

Thanks,
:-Dustin

Brian Zachary (zachary-eecs) wrote :

I only got the problem when I was using "groups: ldap files" in nsswitch.conf, and without the "bind_policy soft" in ldap.conf. Changing to groups: files ldap and bind_policy soft resolved the problem for me, though I'm not sure of the udev version as the machine has been reinstalled several times in the past few days as we continue testing.

@Brian:

Please confirm, is this Hardy? If so, which alpha/beta release?

:-Dustin

Dustin,
Sorry, this was Hardy Beta... I'm not sure as to the particular release, since I'm doing test preseed installs over a network from a local mirror that is updated daily. I do recall that the version of libnss-ldap was 259 at the time, though 258 is the latest I see in the repos now.

Dustin Kirkland  (kirkland) wrote :

@Brian-

I'm still trying, albeit unsuccessfully, to reproduce this problem. It seems that the complexity of the configuration is a bit deeper than in my test cases.

If possible, would you (or anyone else) who is seeing this problem in Hardy Beta please ping me in IRC? My handle is 'kirkland' and you should be able to find me in #ubuntu-server on irc.freenode.net.

In the mean time, could you please post (or email me) the following configuration files:
* /etc/nsswitch.conf
* /etc/ldap.conf
* /etc/services
* /etc/pam.d/common-password
* /etc/pam.d/common-auth
* /etc/pam.d/common-account
* /etc/pam.d/common-session
* /etc/ldap/ldap.conf

Also, please send a dump of all packages installed:
* dpkg -l

Thanks,
:-Dustin

  • unnamed Edit (1.8 KiB, text/html; charset=ISO-8859-1)

Same happened to me. I think the problem lies in the reference to ldap
server.
With "pam_ldap.conf" my ref is a uri and works, but when I make it a
straight host name ref, then it hangs.
Dunno if this helps.

On Mon, Apr 7, 2008 at 6:49 PM, Dustin Kirkland <email address hidden>
wrote:

> @Brian-
>
> I'm still trying, albeit unsuccessfully, to reproduce this problem. It
> seems that the complexity of the configuration is a bit deeper than in
> my test cases.
>
> If possible, would you (or anyone else) who is seeing this problem in
> Hardy Beta please ping me in IRC? My handle is 'kirkland' and you
> should be able to find me in #ubuntu-server on irc.freenode.net.
>
> In the mean time, could you please post (or email me) the following
> configuration files:
> * /etc/nsswitch.conf
> * /etc/ldap.conf
> * /etc/services
> * /etc/pam.d/common-password
> * /etc/pam.d/common-auth
> * /etc/pam.d/common-account
> * /etc/pam.d/common-session
> * /etc/ldap/ldap.conf
>
> Also, please send a dump of all packages installed:
> * dpkg -l
>
> Thanks,
> :-Dustin
>
> --
> ldap config causes Ubuntu to hang at a reboot
> https://bugs.launchpad.net/bugs/155947
> You received this bug notification because you are a member of Ubuntu
> Directory Services, which is subscribed to libnss-ldap in ubuntu.
>

Dustin Kirkland  (kirkland) wrote :
  • unnamed Edit (189 bytes, application/pgp-signature; name=signature.asc)

On Mon, 2008-04-07 at 20:10 +0000, Hilton Gibson wrote:
> With "pam_ldap.conf" my ref is a uri and works, but when I make it a
> straight host name ref, then it hangs.

Ahh... Okay, so this tells us something very interesting.

pam_ldap.conf hasn't existed since Feisty. There, it was part of the
libpam-ldap package.

What distro version are you running on the system where you've seen this
problem? And was it upgraded from a feisty system?

Actually, those are general questions to anyone on this thread. As
we're trying to narrow this down, please, please, please let us know
what distro version you're seeing this on, and whether it was a fresh
install or an upgrade, and if an upgrade, what the original distro
version was.

:-Dustin

Hilton Gibson (hgibson) wrote :
  • unnamed Edit (1.9 KiB, text/html; charset=ISO-8859-1)

Ok. It's a production system that started with Debian stable, then Ubuntu
Hoary, then Ubuntu Dapper. Hopefully Ubuntu Hardy soon.
No re-installs, could not afford downtime. All the time very problematic
upgrading.

On Mon, Apr 7, 2008 at 10:57 PM, Dustin Kirkland <email address hidden>
wrote:

> On Mon, 2008-04-07 at 20:10 +0000, Hilton Gibson wrote:
> > With "pam_ldap.conf" my ref is a uri and works, but when I make it a
> > straight host name ref, then it hangs.
>
> Ahh... Okay, so this tells us something very interesting.
>
> pam_ldap.conf hasn't existed since Feisty. There, it was part of the
> libpam-ldap package.
>
> What distro version are you running on the system where you've seen this
> problem? And was it upgraded from a feisty system?
>
> Actually, those are general questions to anyone on this thread. As
> we're trying to narrow this down, please, please, please let us know
> what distro version you're seeing this on, and whether it was a fresh
> install or an upgrade, and if an upgrade, what the original distro
> version was.
>
> :-Dustin
>
>
> ** Attachment added: "unnamed"
> http://launchpadlibrarian.net/13181879/unnamed
>
> --
> ldap config causes Ubuntu to hang at a reboot
> https://bugs.launchpad.net/bugs/155947
> You received this bug notification because you are a member of Ubuntu
> Directory Services, which is subscribed to libnss-ldap in ubuntu.
>

Brian Zachary (zachary-eecs) wrote :

Sorry I can't provide more detailed config info, but as I explained before,
the machine has been wiped and reinstalled and reconfigured many times since
the original problem. But mine was a clean Hardy install, and all our
configs lived in ldap.conf (no pam_ldap.conf or anything like that), and we
were using a hostname for the ldap server, not an IP.

Jamie hasn't had time to look at this and has transferred it to me.

Changed in libnss-ldap:
assignee: jamie-strandboge → kirkland
Dustin Kirkland  (kirkland) wrote :

@Brian-

A few more questions, then...

So you can reproduce this problem reliably on a Hardy Beta fresh install? And by reproduce, the machine hangs on boot (and not just on login)?

Your process for reproducing this problem involves a fresh install of Hardy Beta, and then customizing the config files as follows:
* /etc/ldap.conf: a hostname (not an IP or URI)
* /etc/ldap.conf: "bind_policy hard"
* /etc/nsswitch.conf: "group ldap files"

Then you remove connectivity to the ldap server, and you reboot the Hardy Beta machine. At that point, your machine hangs on reboot? Is it responsive to anything? Ping, etc?

On a subsequent reboot, can you check /var/log/syslog and paste any relevant log messages from the previous boot?

Dustin Kirkland  (kirkland) wrote :

@Hilton-

Would it be possible for you to switch it back to hostname and reproduce the problem and report any meaningful error messages in /var/log/syslog for the failed boot? (Note, you might have to look in one of the log rotations in /var/log/syslog.*)

:-Dustin

  • unnamed Edit (1.0 KiB, text/html; charset=ISO-8859-1)

Err.. Ok. For a while only. Am at home now.
Tommorow OK?
Will get logs etc...

On Mon, Apr 7, 2008 at 11:57 PM, Dustin Kirkland <email address hidden>
wrote:

> @Hilton-
>
> Would it be possible for you to switch it back to hostname and reproduce
> the problem and report any meaningful error messages in /var/log/syslog
> for the failed boot? (Note, you might have to look in one of the log
> rotations in /var/log/syslog.*)
>
> :-Dustin
>
> --
> ldap config causes Ubuntu to hang at a reboot
> https://bugs.launchpad.net/bugs/155947
> You received this bug notification because you are a member of Ubuntu
> Directory Services, which is subscribed to libnss-ldap in ubuntu.
>

Kevin Slater (kevin-slater) wrote :
  • unnamed Edit (421 bytes, text/html; charset=ISO-8859-1)

Dustin,

I had the problem in two different desktop machines running Fiesty, with
LDAP client authentication working, when I upgraded them to Gutsy. The fix
was to change bind policy to soft and to manually configure the proper files
using various sources on the web as a guide. The machines would hang to the
point that they were only accessible by starting with a recovery kernel
config.

...Kevin

Dustin Kirkland  (kirkland) wrote :
  • unnamed Edit (189 bytes, application/pgp-signature; name=signature.asc)

On Tue, 2008-04-08 at 12:07 +0000, Kevin Slater wrote:
> I had the problem in two different desktop machines running Fiesty,
> with LDAP client authentication working, when I upgraded them to
> Gutsy.

@Kevin,

Thanks for confirming our suspicions, that this problem arises in
systems upgraded from an original Feisty installation.

> The fix was to change bind policy to soft and to manually configure
> the proper files using various sources on the web as a guide.

Interesting, okay. So perhaps the 'fix' to this bug is to ensure that
the upgrade process injects this 'soft bind' policy into the
configuration files. Let me thinking abou that.

> The machines would hang to the point that they were only accessible
> by starting with a recovery kernel config.

Please clarify... Would you be prompted with a login? Were you able to
enter a username? And what about a password? Is that, then, the point
at which your system 'hung'? If so, I am able to reproduce this
behavior. But I'd call that a 'hang on login', which is different than
a 'hang on boot'.

:-Dustin

Kevin Slater (kevin-slater) wrote :
  • unnamed Edit (1.1 KiB, text/html; charset=ISO-8859-1)

On Tue, Apr 8, 2008 at 2:45 PM, Dustin Kirkland <email address hidden>
wrote:

>
> Please clarify... Would you be prompted with a login? Were you able to
> enter a username? And what about a password? Is that, then, the point
> at which your system 'hung'? If so, I am able to reproduce this
> behavior. But I'd call that a 'hang on login', which is different than
> a 'hang on boot'.
>

It's been some time, so it's hard to recall. I do remember getting to the
login prompt and then having it fail to login, that's for sure. I seem to
recall also getting into a state where the boot process would never get to
the login prompt as well, but that could have been after attempting to
manually correct the issue.

...Kevin

ldap-auth-client combined /etc/libnss-ldap.conf and /etc/pam-ldap.conf. If either or both of these files exist on upgrade, then the user is prompted with a message to manually migrate the files. From debian/ldap-auth-config.templates:

Template: ldap-auth-config/move-to-debconf
Type: boolean
Default: true
_Description: Reconfigure LDAP with debconf?
 The LDAP authentication libraries now use the new unified configuration
 file ${newfn}, and no longer use ${pamfn} or ${nssfn}. One or both of
 these old configuration files were found. These files cannot be
 automatically migrated to the new ${newfn}. You MUST either reconfigure
 your settings with debconf, or manually migrate your settings into ${newfn}
 and verify your configuration before logging out.

Dustin Kirkland  (kirkland) wrote :

Okay, snapshot of conclusions at this point...

(1) Any systems Feisty (and earlier) upgraded to Hardy (and later) would require a manual migration of /etc/libnss-ldap.conf and /etc/pam-ldap.conf if either or both of those files exist.

(2) None of the 5+ Ubuntu developers who have looked at this bug has successfully reproduced the "boot hang" aspect of this bug. A boot hang involves a system which is not responsive to a network ping, not responsive to banging keys, and toggling caps-lock/num-lock does not affect the associated LEDs. (That's a crude definition, of course, but some decent guidelines.) ANYONE who is able to reproduce such a boot hang, please respond and attach (a cleansed copy) of:
 * /var/log/syslog (as retrieved from a subsequent rescue boot)
 * /etc/ldap.conf
 * /etc/nsswitch.conf
 * /etc/libnss-ldap.conf
 * /etc/pam-ldap.conf

(3) We have been able to reproduce a "hang on login". I'd argue that this is a "functions as designed" scenario. If you require an LDAP server to login, and it's not available, logins should not succeed until the target LDAP server becomes available. In the case where you want to relax that requirement, a system can be configured to use a soft bind policy.

:-Dustin

Changed in libnss-ldap:
status: Confirmed → Incomplete

Good criteria. But please also consider the PAM rules for logins. Some allow
a graceful fall thru to pam_unix.so as a backup. This should be a default no
matter what other auth system is used. There are many other pam auth
systems, eg: fingerprint, usb key etc... LDAP is only one of many. So when
configuring lib-auth-client take very careful note of the PAM config files
and the order of the auth mechanism's.

On Thu, Apr 10, 2008 at 12:01 AM, Dustin Kirkland <email address hidden>
wrote:

> Okay, snapshot of conclusions at this point...
>
> (1) Any systems Feisty (and earlier) upgraded to Hardy (and later) would
> require a manual migration of /etc/libnss-ldap.conf and /etc/pam-
> ldap.conf if either or both of those files exist.
>
> (2) None of the 5+ Ubuntu developers who have looked at this bug has
> successfully reproduced the "boot hang" aspect of this bug. A boot hang
> involves a system which is not responsive to a network ping, not responsive
> to banging keys, and toggling caps-lock/num-lock does not affect the
> associated LEDs. (That's a crude definition, of course, but some decent
> guidelines.) ANYONE who is able to reproduce such a boot hang, please
> respond and attach (a cleansed copy) of:
> * /var/log/syslog (as retrieved from a subsequent rescue boot)
> * /etc/ldap.conf
> * /etc/nsswitch.conf
> * /etc/libnss-ldap.conf
> * /etc/pam-ldap.conf
>
> (3) We have been able to reproduce a "hang on login". I'd argue that
> this is a "functions as designed" scenario. If you require an LDAP
> server to login, and it's not available, logins should not succeed until
> the target LDAP server becomes available. In the case where you want to
> relax that requirement, a system can be configured to use a soft bind
> policy.
>
> :-Dustin
>
> ** Changed in: libnss-ldap (Ubuntu)
> Status: Confirmed => Incomplete
>
> --
> ldap config causes Ubuntu to hang at a reboot
> https://bugs.launchpad.net/bugs/155947
> You received this bug notification because you are a member of Ubuntu
> Directory Services, which is subscribed to libnss-ldap in ubuntu.
>

I have the same problem on a Hardy server installation. I have updated it from Gutsy, but i count it as a Hardy install because i only used Gutsy server cd to install the base system then i have updated it to hardy straight away. In my case it was not enough to put "bind_policy soft" in ldap.conf, i had to change runlevel of slapd from S19 to S10.
My system without these changes hangs at login, i can ping it but can not login. The boot process stops at "Starting kernel log daemon...".

I have attached my config files. I don't have pam_ldap.conf and libnss-ldap.conf.

Todvard (todvard) wrote :
Todvard (todvard) wrote :
Todvard (todvard) wrote :
Todvard (todvard) wrote :
Todvard (todvard) wrote :
Todvard (todvard) wrote :
Todvard (todvard) wrote :
description: updated

I modified the title of this bug from "hang at reboot" to "hang at login".

Hanging at boot is a little misleading, as a boot hang generally points to a kernel bug, which we are sure this is not.

Rather, this bug manifests itself by prompting the user for login credentials, and then hanging because an LDAP server is not available. In all reproductions of this bug, the console is responsive to pressing enter, is pingable, and toggling numlock/capslock flashes the LEDs.

Furthermore, the fact that the last message shown was "Starting kernel log daemon" is entirely a timing issue with Upstart (I was eventually able to reproduce this, see the attached screenshot). This was misleading, as the login prompt was displayed as soon as possible, with Upstart continuing through it's startup measures. Scott has fixed this as reported in another bug, see: Bug #65230. Now, Upstart will delay printing the login prompt on tty1 until all init scripts have completed.

:-Dustin

Martin Emrich (emme) wrote :

Hi!

This bug just bit me while upgrading from dapper to hardy. As I expected some rough things I took a full image of my production server (still running dapper now) and exercised the upgrade to hardy in a virtual machine.
After upgrading to hardy, I had the problems described here. Setting bind_policy to soft fixed it for me. If you want me to do any tests, I still have a snapshot of the VM before starting the upgrade, I can torture it if you like.

Ciao

Martin

Andreas Hasenack (ahasenack) wrote :

I believe that "soft" is a better default for nss_ldap. FWIW, Mandriva has been using this patch for quite a while now:
http://svn.mandriva.com/cgi-bin//viewvc.cgi/packages/cooker/nss_ldap/current/SOURCES/nss_ldap-250-bind_policy_default_soft.patch?revision=5975&view=markup

However, I note that the Debian changelog of the libnss-ldap package has this entry:
libnss-ldap (251-4) unstable; urgency=low

  * Added system which implicitly sets bind_policy to 'soft'
    during system boot/shutdown. This is implemented by an
    init script run at end of system boot and start of system
    shutdown which creates/removes a file in /var/lib/libnss-ldap
    called 'bind_policy_soft'. When this file exists the policy
    is treated as 'soft' regardless of the configuration in
    /etc/nss-ldap.conf. Note that soft doesn't mean 'always
    fail' but rather only try to connect to each URI listed in
    the configuration file once, with no sleeping.
    Closes: #375077, #375215

I don't know what to make of that yet. I will try to get a system running this afternoon with ldap, nss_ldap, etc and a notebook client and see what happens.

Jamie Strandboge (jdstrand) wrote :

After seeing two machines experience these hangs, I was able to reproduce it on a clean hardy install:

1. install hardy (of course)
2. apt-get install ldap-auth-client
3. configure ldap to use (via debconf):
ldap://127.0.0.1/
root requires a password: 'no'
everything else defaults
4. configure /etc/nsswitch.conf to have:
passwd: compat ldap
group: compat ldap
5. configure /boot/grub/menu.lst to have:
# defoptions=
6. update-grub
7. reboot

This is not a 'kernel hang' as it will respond to 'ctrl+alt+delete'. It does stop after kernel logger.

Jamie Strandboge (jdstrand) wrote :

Thanks to Andreas for finding that changelog entry. I have linked this bug to the Debian bug. Please note that although the Debian bug is marked Fixed, there were several iterations of the fix until finally in 259-1 the initscript 'kluge' is removed.

Changed in libnss-ldap:
status: Incomplete → Confirmed
Andreas Hasenack (ahasenack) wrote :

We really need to set this bind_policy to soft, or else the machine just won't boot. It stops at klogd. I just tried with hardy.

So, after setting it to soft using a rescue disk, boot proceeded as normal. I didn't even touch the timeouts. Now, the ldap user obviously can't login yet in my scenario because it's a notebook with wireless, so there is no network setup yet at the gdm prompt. But this is another issue. The local user login works.

You also have upgrade issues to consider. Now we seem to have reverted to upstream's default of using just /etc/ldap.conf for both pam_ldap and nss_ldap. FWIW, I agree with this change. Debian was the only kid in the block with renamed config files for these two libraries, and they both share a lot of config options. So, these settings need to be migrated to /etc/ldap.conf, or at least a warning should be issued. (If this is already done, then please ignore what I just wrote).

Andreas Hasenack (ahasenack) wrote :

As expected, using a wired connection instead of wireless allows any user to login at the gdm prompt, local or ldap, because the network is up by then in this case.

Jamie Strandboge (jdstrand) wrote :

STATUS UPDATE:
The ubuntu package does include 00boot_delays_h.patch which is the Debian patch for this issue. This seems to work for udev, but not for functions that call libc's initgroups(), which by definition will try to get all the groups a member belongs to, and if /etc/nsswitch.conf contains 'ldap' for the groups database, libc checks ldap. start-stop-daemon is one such application.

libnss-ldap does provide for 'nss_initgroups_ignoreusers', which allows for nss to skip ldap in intigroups() for the specified groups. So, to confirm that this was the problem, I added to /etc/ldap.conf:
nss_initgroups_ignoreusers klog

I then rebooted and the boot proceeded past klogd, and then hung at crond. Currently thinking about a solution.

Jamie Strandboge (jdstrand) wrote :

Updated the description and marked as triaged.

Changed in libnss-ldap:
status: Confirmed → Triaged
Martin Emrich (emme) wrote :

How about a patch to libnss-ldap to respect an environment variable to force soft binding? Something like 'if (getenv("NSS_LDAP_BINDPOLICY_SOFT" != 0 && getuid() < 100)) { /* force soft bind regardless of ldap.conf */ }'

Jamie Strandboge (jdstrand) wrote :

Discussion on irc and phone resulted in the following solution:

Add a new configuration option 'nss_initgroups_ignoreusers_below_uid' (or similar) and have it default to '1000'. This option will be configurable in /etc/ldap.conf. Admins can adjust this to be any valid uid.

Jamie Strandboge (jdstrand) wrote :

Upstream started to consider this, and have a preliminary patch here:
http://bugzilla.padl.com/show_bug.cgi?id=341

Jamie Strandboge (jdstrand) wrote :

Turns out the above doesn't work out too well, as the patch depends on getpwnam_r(), a glibc function which then ends up using libnss-ldap. Tried a few things, but it didn't help the hang.

A less intrusive patch will be to have an initscript run on shutdown which edits /etc/ldap.conf based on the value of nss_initgroups_minimum_uid.

Changed in libnss-ldap:
status: Triaged → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libnss-ldap - 258-1ubuntu3

---------------
libnss-ldap (258-1ubuntu3) hardy; urgency=low

  * add nssldap-update-ignoreusers that updates nss_initgroups_ignoreusers in
    /etc/ldap.conf based on nss_initgroups_minimum_uid. Added initscript to
    call nssldap-update-ignoreusers on shutdown. Based on changes by
    Dustin Kirkland. Fix for LP: #155947
  * References
    https://bugs.edge.launchpad.net/ubuntu/+source/libnss-ldap/+bug/155947

 -- Jamie Strandboge <email address hidden> Tue, 22 Apr 2008 14:07:14 -0400

Changed in libnss-ldap:
status: Fix Committed → Fix Released
Dustin Kirkland  (kirkland) wrote :

Howdy all-

An updated libnss-ldap package is available in the Hardy repos as of yesterday, containing a fix for this problem.

We've tested it in our environments and it seems to solve the issue. I'm curious if anyone else out there has tried it, and if problems persist.

:-Dustin

Changed in libnss-ldap:
status: Unknown → Fix Released
Kevin Waddell (onatawahtaw) wrote :

Is there a fix for this for 7.10? When the LDAP server is down, it hangs on klog etc. I have bind_policy soft set and it still hangs. Adding nss_initgroups_ignoreusers to my ldap.conf seems to be a workaround.

Kevin Waddell (onatawahtaw) wrote :

PS. This is on an Ubuntu 7.10 Client authenticating with a Debian Etch 4.0r4a Server.

On Friday 10 October 2008, Hilton Gibson wrote:

This sounds like a political subject ;)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.