slow group indexing when using huge ldap

Bug #616719 reported by Klaus Vink Slott on 2010-08-12
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
libnss-ldap (Ubuntu)
Medium
Unassigned

Bug Description

Binary package hint: libnss-ldap

We have a OpenLDAP server with more than 50.000 user accounts and almost 5.000 groups. Some of these groups may refer to more than 20.000 users. When a user, which is a member of one of the big groups, tries to logon from an LDAP client host it takes several minutes before the prompt appears.

Executing "id [uid]" has a similar effect.

During the wait CPU load on the LDAPclient machine goes high and the OpenLDAP server is bombarded with ldap searches from the Ubuntu client machine.

Judging from the ldap log on the server it seems that the Ubuntu ldap client cycles trough all group memberships for the requested uid and verifies that all other members of the same group are present in the ldap people tree.

> gqv604@nms:~$ cat /etc/issue
> Ubuntu 10.04 LTS \n \l
> gqv604@nms:~$ apt-cache policy libnss-ldap
> libnss-ldap:
> Installeret: 264-2ubuntu2
> Kandidat: 264-2ubuntu2
> Versionstabel:
> *** 264-2ubuntu2 0
> 500 http://dk.archive.ubuntu.com/ubuntu/ lucid/main Packages
> 100 /var/lib/dpkg/status

This makes it impossible to use an Ubuntu host in a large scale environment.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: libnss-ldap 264-2ubuntu2
ProcVersionSignature: Ubuntu 2.6.32-21.32-server 2.6.32.11+drm33.2
Uname: Linux 2.6.32-21-server x86_64
Architecture: amd64
Date: Thu Aug 12 12:25:53 2010
InstallationMedia: Ubuntu-Server 10.04 LTS "Lucid Lynx" - Release amd64 (20100427)
ProcEnviron:
 LANG=da_DK.UTF-8
 SHELL=/bin/bash
SourcePackage: libnss-ldap

Klaus Vink Slott (k-slott) wrote :
tags: added: ldap
removed: amd64 apport-bug lucid
Scott Moser (smoser) wrote :

Klaus,
  Thank you for taking the time to make a good bug report.
  Do you know if this behaviour is a regression from a previous ubuntu release ?
  Do you know if this behaviour is present in the upstream nss_ldap code ?

Changed in libnss-ldap (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
status: Triaged → Incomplete
Klaus Vink Slott (k-slott) wrote :

Hi Scott, thanks for looking into this.

I have never used Ubuntu in a environment like this before (actually I'v never really used Ubuntu) so I cant say for sure. But this guy http://ubuntuforums.org/showthread.php?t=1238322 might have been hit by the same issue in August last year.

I do not know if the problem is present upstream, actually I cant say for sure that the problem is in nss_ldap itself. But we do have a lot of OpenSUSE running in the same setup and have newer seen this problem before. Our OpenSUSE is now at ..
Version : 264 Vendor: openSUSE
Release : 3.1 Build Date: man 19 okt 2009 18:45:47 CEST
Source RPM: nss_ldap-264-3.1.src.rpm

witch seems to be pretty much the same version.

Excerpts from Klaus Vink Slott's message of Fri Aug 13 07:02:36 UTC 2010:
>
> I do not know if the problem is present upstream, actually I cant say for sure that the problem is in nss_ldap itself. But we do have a lot of OpenSUSE running in the same setup and have newer seen this problem before. Our OpenSUSE is now at ..
> Version : 264 Vendor: openSUSE
> Release : 3.1 Build Date: man 19 okt 2009 18:45:47 CEST
> Source RPM: nss_ldap-264-3.1.src.rpm
>
> witch seems to be pretty much the same version.
>

Is nscd running on the opensuse systems?

--
Mathias Gug
Ubuntu Developer http://www.ubuntu.com

Klaus Vink Slott (k-slott) wrote :

Yes. Based on a few tests done by hand on OpenSUSE, nscd speeds up the process a lot:
When nscd is running I get a response within 100 mSec in average, sometimes down to 8 mSec and maximum 2.2 seconds. If I stop nscd, answer times ranges between 400 mSec and 2 seconds - the average around 800 mSec.
Requesting a new uid (not cached) with each request does not seem to add much to these figures.

I only did 4 tests on Ubuntu 2 with nscd running - and the same 2 tests without nscd.
With nscd: 2 minutes 51sec., and 16 minutes
and the same to tests without nscd: 3 minutes, and 14 minutes.
The differences is negligible and most likely due to other load on the ldap server I think.

Klaus Vink Slott (k-slott) wrote :

By reducing the number of groups in our setup we have managed to improve log on time a little. But login and using the id command is still terribly slow. This is a showstopper for us in offering Ubuntu as a choice in our university virtual hosting service.

Please let me know if I can be of further help to debug this problem.

Philipp Kaluza (pixelpapst) wrote :

Klaus: in an environment of this size, I strongly recommend against using libnss-ldap, because it just doesn't scale well enough. Please try installing nslcd and libnss-ldapd (notice the d), get it running, after that add nscd again, and evaluate if this better fits your needs. If it doesn't, you might also have a look at sssd and libnss-sss, but AFAICR that's only really available starting from maverick.
BTW both these daemons come with their own pam packages, which replace libpam-ldap.

Klaus Vink Slott (k-slott) wrote :

Philipp: Thanks for notifying about the other ldap possibility. While I am not sure that I agree on libnss-ldap is the cource of the problem (see timing on OpenSUSE above) replacing it with nslcd and libnss-ldapd certainly improves login time to an acceptable level:

me@myserver:~$ time id tfp696
/...id output removed.../
real 0m7.034s
user 0m0.050s
sys 0m0.020s

This still still 3 times more than OpenSUSE/nss_ldap, but fully useable so you can consider this issue as resolved.

Launchpad Janitor (janitor) wrote :

[Expired for libnss-ldap (Ubuntu) because there has been no activity for 60 days.]

Changed in libnss-ldap (Ubuntu):
status: Incomplete → Expired

Bug still exists
time id
real 0m44.414s
user 0m4.152s
sys 0m0.292s

Changed in libnss-ldap (Ubuntu):
status: Expired → In Progress

I've been noticing a similar issue. As our environment grows, it's becoming increasingly crippling. I filed a similar bug a while ago that might shed a small amount of light on the situation, but probably not actually get us anywhere.

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=730053

44sec? That's nice. It takes me >12min since we have >30,000 users. :(

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.