Comment 11 for bug 253937

Revision history for this message
Etienne Goyer (etienne-goyer-outlands) wrote :

On 2008-08-06, Steve had this tidbit of wisdom:
> If you are only using libnss-ldap without nscd, there is nowhere in the
> model for this reachability information to be stored. If you use nscd,
> results will be cached in the event the server is down.

Well, yes and no. Enumeration of NSS database, such as happen when you invoke initgroups(), would still block. As such, GDM would still take forever to start a desktop session, even if you are running nscd. In fact, nscd is of practically no help if the network directory server goes down.

> But adjusting the timeout limits should also have an effect - were you
> changing the 'timelimit' or the 'bind_timelimit' setting? In normal
> circumstances, I would expect the 'bind_timelimit' to be the one that
> applies for such failures; 'timelimit' only matters if your server *is*
> alive but is taking a pathologically long time to reply to queries.

Even setting bind_timelimit (with or without "bind_policy soft") will not help much, as every NSS query will still need to wait for the timeout, and all these timeout do add up pretty quickly (we measured 45 minutes to open a GNOME session with "bind_timelimit 5" on hardy).

It is a pretty complex problem. I have pushed a blueprint to resolve that, reliable-nss-caching, and mathiaz packaged the sssd client from the FreeIPA project in karmic to address that issue. We need to test it and make sure it actually resolve the issue in a resilient and scalable fashion.