nscd crashes

Bug #302724 reported by Christian Schlittchen
6
Affects Status Importance Assigned to Milestone
glibc (Ubuntu)
New
Undecided
Unassigned

Bug Description

Distribution: ubuntu 8.10 x86_64 with current updates
Package: nscd 2.8~20080505-0ubuntu7

nscd crashes within a few minutes of running. stracing nscd -d shows the following:

16332: GETFDGR
16332: provide access to FD 6, for group
) = 1
epoll_wait(8, 16332: remove GETGRBYGID entry "680"
nscd: mem.c:417: gc: Assertion `off_alloc == off_allocend' failed.
 <unfinished ...>
+++ killed by SIGABRT +++

This happens on several different machines.

Revision history for this message
Christian Schlittchen (schlittchen) wrote :
Download full text (5.6 KiB)

The following is a valgrind trace of a complete run from start to crash. Maybe it help tracking down this problem:

==20449== Memcheck, a memory error detector.
==20449== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==20449== Using LibVEX rev 1854, a library for dynamic binary translation.
==20449== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==20449== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation framework.
==20449== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==20449== For more details, rerun with: -v
==20449==
==20449== My PID = 20449, parent PID = 13380. Prog and args are:
==20449== /usr/sbin/nscd
==20449== -d
==20449==
==20449== Thread 3:
==20449== Conditional jump or move depends on uninitialised value(s)
==20449== at 0x14EDB: (within /usr/sbin/nscd)
==20449== by 0x15282: (within /usr/sbin/nscd)
==20449== by 0x114A2: (within /usr/sbin/nscd)
==20449== by 0x7726: (within /usr/sbin/nscd)
==20449== by 0x503B3E9: start_thread (in /lib/libpthread-2.8.90.so)
==20449== by 0x554FC6C: clone (in /lib/libc-2.8.90.so)
==20449==
==20449== Conditional jump or move depends on uninitialised value(s)
==20449== at 0x14EDD: (within /usr/sbin/nscd)
==20449== by 0x15282: (within /usr/sbin/nscd)
==20449== by 0x114A2: (within /usr/sbin/nscd)
==20449== by 0x7726: (within /usr/sbin/nscd)
==20449== by 0x503B3E9: start_thread (in /lib/libpthread-2.8.90.so)
==20449== by 0x554FC6C: clone (in /lib/libc-2.8.90.so)
==20449==
==20449== Syscall param msync(start) points to uninitialised byte(s)
==20449== at 0x50427DB: (within /lib/libpthread-2.8.90.so)
==20449== by 0x1521D: (within /usr/sbin/nscd)
==20449== by 0x15282: (within /usr/sbin/nscd)
==20449== by 0x114A2: (within /usr/sbin/nscd)
==20449== by 0x7726: (within /usr/sbin/nscd)
==20449== by 0x503B3E9: start_thread (in /lib/libpthread-2.8.90.so)
==20449== by 0x554FC6C: clone (in /lib/libc-2.8.90.so)
==20449== Address 0x7be01db is not stack'd, malloc'd or (recently) free'd
==20449==
==20449== Syscall param msync(start) points to uninitialised byte(s)
==20449== at 0x50427DB: (within /lib/libpthread-2.8.90.so)
==20449== by 0x119FF: (within /usr/sbin/nscd)
==20449== by 0x7726: (within /usr/sbin/nscd)
==20449== by 0x503B3E9: start_thread (in /lib/libpthread-2.8.90.so)
==20449== by 0x554FC6C: clone (in /lib/libc-2.8.90.so)
==20449== Address 0x7be01db is not stack'd, malloc'd or (recently) free'd
==20449==
==20449== Thread 2:
==20449== Conditional jump or move depends on uninitialised value(s)
==20449== at 0x9F4D: (within /usr/sbin/nscd)
==20449== by 0xA7EB: (within /usr/sbin/nscd)
==20449== by 0xA973: (within /usr/sbin/nscd)
==20449== by 0x114A2: (within /usr/sbin/nscd)
==20449== by 0x7726: (within /usr/sbin/nscd)
==20449== by 0x503B3E9: start_thread (in /lib/libpthread-2.8.90.so)
==20449== by 0x554FC6C: clone (in /lib/libc-2.8.90.so)
==20449==
==20449== Conditional jump or move depends on uninitialised value(s)
==20449== at 0x9F4F: (within /usr/sbin/nscd)
==20449== by 0xA7EB: (within /usr/sbin/nscd)
==20449== by 0xA973: (within /usr/sbin/n...

Read more...

Revision history for this message
mjmac (h-launchpad-macdonald-cx) wrote :

Just adding confirmation of this bug.

mjmac@ganymede:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 8.10
Release: 8.10
Codename: intrepid

mjmac@ganymede:~$ uname -a
Linux ganymede 2.6.27-7-generic #1 SMP Tue Nov 4 19:33:06 UTC 2008 x86_64 GNU/Linux

Ran nscd -d ... lasted for about 15 minutes then crashed:

...
3767: provide access to FD 6, for group
3767: Reloading "0" in password cache!
3767: Reloading "mjmac" in password cache!
3767: Reloading "polkituser" in password cache!
3767: remove GETPWBYUID entry "1000"
3767: remove GETPWBYNAME entry "mjmac"
nscd: mem.c:417: gc: Assertion `off_alloc == off_allocend' failed.
root@ganymede:~#

I use nscd to propagate uid/gid info into build chroots. This worked peachy-keen on 7.10 and 8.04, but now it's unusable in production for 8.10 (good thing I didn't roll this out without testing...).

Please let me know if I can provide any more information. I see that the original reporter has already provided strace/valgrind info, so I don't know what I can provide (other than a patch :) ).

Revision history for this message
dmohr (ubuntu-m0hr) wrote :

Any updates on this issue?!
I am also affected, using nscd to cache ldap passwd/group information on an ltsp in production, without nscd passwd/group lookups take a serious amount of time. My temporary solution is to enable paranoia mode and set restart interval to 600 seconds in /etc/nscd.conf. That seems to work for me at the moment but is not the perfect solution.

Revision history for this message
Todd Eddy (vrillusions) wrote :

I can confirm this happens to me as well. right now I just have a cron job that restarts nscd every half hour.

I have tried playing with nsswitch settings as well as nscd.conf. Been trying to figure this out for a while. Let me know if any other info could be useful

The log output:
31636: handle_request: request received (Version = 2) from PID 652
31636: GETHOSTBYNAME (ns1.pchdns.com)
31636: Reloading "teddy" in group cache!
31636: remove INITGROUPS entry "root"
nscd: mem.c:417: gc: Assertion `off_alloc == off_allocend' failed.
Aborted

# grep '^[^#]' /etc/ldap.conf
base dc=XXXXXX,dc=com
uri ldaps://ldap1.XXXXXX.com:636/ ldaps://ldap2.XXXXXX.com:636/
ldap_version 3
rootbinddn cn=admin,dc=XXXXXX,dc=com
bind_policy soft
pam_groupdn cn=XXXXXXX,ou=Group,dc=XXXXXX,dc=com
pam_member_attribute memberUid
pam_password md5
tls_checkpeer no
nss_reconnect_tries 1
nss_reconnect_sleeptime 1
nss_reconnect_maxsleeptime 8
nss_reconnect_maxconntries 2

Revision history for this message
Petra Humann (humann) wrote :

nscd crashes on my hosts also.

lsb_release -rd:
Description: Ubuntu 8.10
Release: 8.10

apt-cache policy nscd:
nscd:
  Installed: 2.8~20080505-0ubuntu7

apt-cache policy libnss-ldap
libnss-ldap:
  Installed: 260-1ubuntu2

I'm running nscd in a ldap environment. After some time (may be hours) it stops working.
The last entries in the file nscd.log include:

31387: remove GETGRBYGID entry "0"
31387: remove GETGRBYNAME entry "root"

19614: remove INITGROUPS entry "root"

13527: remove GETPWBYNAME entry "username"
13527: remove GETPWBYUID entry "410"

5466: remove INITGROUPS entry "username"
"username" is a actually logged in user.

Running "nscd -d" ends with:

nscd: mem.c:417: gc: Assertion `off_alloc == off_allocend' failed.
or (one time) with:
nscd: mem.c:275: gc: Assertion `off_free <= db->head->first_free' failed.

Regards.
Petra Humann

Revision history for this message
Christian Schlittchen (schlittchen) wrote :

As a workaround I switched to unscd (http://busybox.net/~vda/unscd/), which is not as feature rich
as the standard nscd and is also somewhat slower (at least at my setup) but is very stable so far.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.