Username completion crashes with libnss-ldap

Bug #219527 reported by Martin Emrich
24
Affects Status Importance Assigned to Milestone
bash (Ubuntu)
Fix Released
Undecided
Unassigned
bash-completion (Debian)
New
Unknown
openldap (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: bash-completion

After upgrading to hardy, bash-completion crashes when trying to complete a username and libnss-ldap is enabled.

Steps to reproduce:
1. Set up ldap authentication (slapd, a few users, libnss-ldap, libpam-ldap,....)
2. enable bash completion
3. type e.g. "id <tab>".

Instead of showing available users, it prints:

 malloc: unknown:0: assertion botched
 free: start and end chunk sizes differ
 Aborting...

Attached is an strace, attached to the bash on tty1 just before hitting "TAB" and detached afterwards.

Ciao

Martin

Revision history for this message
Martin Emrich (emme) wrote :
Revision history for this message
Mika Fischer (zoop) wrote :

Hi Martin,

Can you try the following and see if it also crashes?

$ compgen -u

Revision history for this message
Martin Emrich (emme) wrote :

Yes, "compgen -u" gives me the same error message. When I remove the "ldap" entries in nsswitch.conf, the error does not occur.

Revision history for this message
Mika Fischer (zoop) wrote :

Martin, can you try to compile and run the attached test program? It should do pretty much what bash is doing when completing filenames.

gcc -o pwent_test pwent_test.c
./pwent_test

Revision history for this message
Martin Emrich (emme) wrote :

Hi!

Your program works fine, lists all available users. In the same shell just afterwards, hitting "su - po<TAB>" crashes the bash instance (bumps me back to the Box I ssh'd into the server from).
BTW: I upgraded my "real" server over the weekend, and it exhibits the same behaviour as the VM I first encountered this bug in.

I just tried it with gdb, but there are no debug symbols in bash:

(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /bin/bash
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
root@sauron:~# su - po(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
[Thread debugging using libthread_db enabled]
Error while reading shared library symbols:
Cannot find new threads: generic error
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
---Type <return> to continue, or q <return> to quit---
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
Cannot find new threads: generic error
(gdb)

Where could I get a bash package with debug symbols?

Ciao

Martin

Revision history for this message
Mika Fischer (zoop) wrote :

Hm, strange...

You can get debug symbols for the Ubuntu packages, see:
https://wiki.ubuntu.com/DebuggingProgramCrash

gdb should pick them up automatically, IIRC.

Revision history for this message
Martin Emrich (emme) wrote :

Hmm, it gets more bizzare:

After installing libc6-dbgsym, libnss-ldap-dbgsym, libc6-i686-dbgsym and bash-dbgsym, I geht just this:

root@sauron:~# gdb bash
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(gdb) run
Starting program: /bin/bash
root@sauron:~# su - po[Thread debugging using libthread_db enabled]
Error while reading shared library symbols:
Cannot find new threads: generic error
Cannot find new threads: generic error
(gdb)

Ciao

Martin

Revision history for this message
Mika Fischer (zoop) wrote :

Maybe: gdb --args bash -c 'compgen -u' ?

Revision history for this message
Martin Emrich (emme) wrote :

Sorry, same useless result:

root@sauron:~# gdb --args bash -c 'compgen -u'
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(gdb) run
Starting program: /bin/bash -c compgen\ -u
[Thread debugging using libthread_db enabled]
Error while reading shared library symbols:
Cannot find new threads: generic error
Cannot find new threads: generic error
(gdb)

Attached is the strace of the gdb session, if running in strace, the output changes:

root@sauron:~# strace -o gdbsession.txt -f gdb --args bash -c 'compgen -u'
GNU gdb 6.8-debian
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu"...
(gdb) run
Starting program: /bin/bash -c compgen\ -u

malloc: unknown:0: assertion botched
free: start and end chunk sizes differ
Aborting...
Program terminated with signal SIGABRT, Aborted.
The program no longer exists.
You can't do that without a process to debug.

Revision history for this message
Mika Fischer (zoop) wrote :

This gets stranger and stranger. :)

Maybe you could try this:
sudo apt-get install ltrace
ltrace -o ltrace.txt -S -n 4 -f bash -c 'compgen -u'

and post the log (maybe gzip it first :))

That should give much information. At least it does in my case...

Revision history for this message
Martin Emrich (emme) wrote :

Ouch, how could I have overseen such a good debugging tool for all these years ;-/

Here is the ltrace, I just removed the plain usernames to protect the innocent :)

Revision history for this message
Mika Fischer (zoop) wrote :

Yes, ltracte is quite handy at times.

It seems that the crash occurs in endpwent(), I added this to the test program. Could you check whether it crashes now?

Revision history for this message
Martin Emrich (emme) wrote :

No, the test program did not crash, but behaves like the first one. I'm sure many programs (like getent, ...) use this API, but the only thing exhibiting this bug until now is the bash completion.

Attached is an ltrace of the pwent-test2.c program. I noticed that there are none of the dcgettext() calls which are present in the ltrace of bash.

Revision history for this message
Mika Fischer (zoop) wrote :

I think the dcgettext is just libc finding the right translation for its error message.

OK, final try for the test program. This is now almost a line-by-line copy of the function in bash...

If this does not crash, the only idea I have left is to rebuild bash so that it can be debugged...

Revision history for this message
Martin Emrich (emme) wrote :

Test program 3 runs fine, too. But I found out why all the test programs run fine: They use the libc6 malloc(), while bash brings its own:

I did "apt-get build-dep bash && apt-get source bash", then "debian/rules" in there. After that:

root@sauron:~/pwent-test/bash-3.2# grep "start and end chunk sizes differ" * -R
bash/lib/malloc/malloc.c: xbotch (mem, ERR_ASSERT_FAILED, _("free: start and end chunk sizes differ"), file, line);
bash/lib/malloc/malloc.c: xbotch (mem, ERR_ASSERT_FAILED, _("realloc: start and end chunk sizes differ"), file, line);

So I'd think this is a bug in bash itself. I'll play around with it a little bit, but I don't have very much free time today.

Ciao

Martin

Revision history for this message
Mika Fischer (zoop) wrote :

Yes, that's good. I was also wondering about the missing malloc calls :)

I think it would be best if you could take this to the bash-bug mailinglist. They should know how best to debug this. You can do this via the bashbug script.

You could also try the bash-static package which is build with --without-bash-malloc. Just to see if this really makes the problem disappear.

Revision history for this message
Martin Emrich (emme) wrote :

I rebuilt the bash package with --without-bash-malloc, and then tried the bash executable from the bash-static/ directory. Same problem if run from gdb. If I just start it with ./bash, and the enter "compgen -u", bash enters an endless loop, eats 100% CPU and won't react to SIGTERM; I had to -KILL it.

Revision history for this message
Jakob Østergaard (joe-evalesco) wrote :

Actually, tcsh shows this problem as well!

$ tcsh
puffin:~> ls ~
...[snipped]...
puffin:~> ls ~
free(5b3608) bad block. (memtop = 6a5000 membot = 5b3000)
...[snipped]...
puffin:~> ls ~
...[snipped]...

It doesn't crash, but the free error is definitely not to be expected...

The way I see this, either bash and tcsh both misuse NSS the same way, or, (more likely) there is a problem with nss ldap.

Changed in bash-completion:
status: Unknown → New
Revision history for this message
Olivier Bornet (olivier-bornet) wrote :

According to http://www.debian-administration.org/articles/585, it seems it's a bug in libnss-ldap:

    "This is a problem with libnss-ldap. try libnss-ldapd.
    This came about when they changed from the openssl libraries to the gnutls libraries"

I have had the same problem on a Debian system. Switching to libnss-ldapd has corrected the problem. I don't know if libnss-ldapd exists in Ubuntu.

HTH.

Revision history for this message
Jakob Østergaard (joe-evalesco) wrote :

I can confirm that changing to libnss-ldapd seems to solve the problem here.

One configuration quirk: To control authorization per-host via LDAP, I had the following in my old libnss-ldap /etc/ldap.conf:

 nss_base_passwd ou=People,dc=evalesco,dc=com?sub?|(host=eagle.rd)(host=\*.rd)(host=\*)

With libnss-ldapd this is different, I had to insert:

 filter passwd (|(host=eagle.rd)(host=\*.rd)(host=\*))

in my /etc/nss-ldapd.conf

Both bash and tcsh can now tab-complete on user names again :)

Revision history for this message
Mika Fischer (zoop) wrote :

Maybe one of you can tell me how to reproduce this.

I.e. what packages do I need, how should I configure slapd and PAM? And how do I get users into LDAP?

If I can reproduce it I'll try to debug the problem.

Revision history for this message
Jakob Østergaard (joe-evalesco) wrote :

Follow one of the HOWTOs on setting up LDAP distribution of user accounts.

In my setup I use kerberos for the authentication whereas most online articles describe how to shovel the authentication into LDAP too, but this doesn't seem to matter. You'll see the problems either way.

Revision history for this message
Anderson (amg1127) wrote :

I am also getting crashes in my shell when completing LDAP usernames.

$ cd ~amg<TAB>
malloc: unknown:0: assertion botched
free: start and end chunk sizes differ
Aborting...

If I try to install libnss-ldapd, APT tries to remove libnss-ldap. My question is: libnss-ldap configuration file is compatible with libnss-ldapd? Can I replace that NSS library safety?

Revision history for this message
Jakob Østergaard (joe-evalesco) wrote : Re: [Bug 219527] Re: Username completion crashes with libnss-ldap

Anderson wrote:
> I am also getting crashes in my shell when completing LDAP usernames.
>
> $ cd ~amg<TAB>
> malloc: unknown:0: assertion botched
> free: start and end chunk sizes differ
> Aborting...
>
> If I try to install libnss-ldapd, APT tries to remove libnss-ldap. My
> question is: libnss-ldap configuration file is compatible with libnss-
> ldapd? Can I replace that NSS library safety?
>

The answers are right above your post...

However, I suggest you consider #227675 before switching.

--
Best regards,
    Jakob Østergaard Hegelund
    Evalesco A/S

William Lynch (wlynch)
Changed in bash:
status: New → Confirmed
Revision history for this message
Arnd (arnd-arndnet) wrote :

Just want to say that I can confirm the crashes with username completion
with libnss-ldap_258-1ubuntu3_amd64.deb

Any progress with this?
Because of #227675 I don't want to switch to libnss-ldapd.

Revision history for this message
Anderson (amg1127) wrote :

Humm... It seems the bug was fixed...

$ cd ~luc
~lucasvanini ~luci ~luciana/ ~luciane ~luciano/ ~lucimeire ~lucmei
~lucena ~lucia ~lucianafreitas ~lucianes/ ~lucianotl/ ~lucio/

I am using the same libnss-ldap_258-1ubuntu3_amd64.deb and don't see crashes in my 3 ldap-enabled Ubuntu Hardy systems...

Revision history for this message
Martin Emrich (emme) wrote :

Right, I just tried it too, and it works now... I did not do anything other than regular updates (hardy-updates/hardy-security).

Revision history for this message
Andreas Moog (ampelbein) wrote :

This bug report is being closed due to your last comment regarding this being fixed with an update. For future reference you can manage the status of your own bugs by clicking on the current status in the yellow line and then choosing a new status in the revealed drop down box. You can learn more about bug statuses at https://wiki.ubuntu.com/Bugs/Status . Thank you again for taking the time to report this bug and helping to make Ubuntu better. Feel free to submit any future bugs you may find.

Changed in bash:
status: Confirmed → Fix Released
Revision history for this message
Hark (ubuntu-komkommerkom) wrote :

I also have this problem. Updating my Hardy systems did fix the problem on one server, but all other servers still have this problem after updating (and rebooting).

Revision history for this message
Jakob Østergaard (joe-evalesco) wrote :

On Fri, Oct 17, 2008 at 09:38:06AM -0000, Hark wrote:
> I also have this problem. Updating my Hardy systems did fix the problem
> on one server, but all other servers still have this problem after
> updating (and rebooting).

No wonder... Since noone fixed the problem it "works by accident" on
some updated systems (including some of mine).

Apparently networking is not a priority for ubuntu.

--
Best regards,
   Jakob Østergaard Hegelund
   Evalesco A/S

Revision history for this message
Anderson (amg1127) wrote :

I don't think this bug was resolved because an "accident"...

I discovered several bugs and saw several unexplained crashes when Ubuntu Hardy was shipped with OpenLDAP 2.4.7.
When Ubuntu switched to 2.4.9, my OpenLDAP server and some clients became stable. I can not ensure, but I believe this bug was fixed because of the transition to OpenLDAP 2.4.9.

A question: if the bug was not fixed in some systems, why not to say... "Hey! I still have this problem! Please, reopen the bug!" ?

Revision history for this message
Michael Zoet (michazoet-deactivatedaccount) wrote :

I tested all of the above on a Ubuntu Hardy system and I have no problems with bash completion and LDAP unix authentification. I think this bug can be closed.

Changed in openldap:
status: New → Fix Released
Revision history for this message
Hark (ubuntu-komkommerkom) wrote :

Correct, I also don't experience this problem anymore.

Revision history for this message
Jakob Østergaard (joe-evalesco) wrote :

Hark wrote:
> Correct, I also don't experience this problem anymore.
>

I find the precise description of the resolution very reassuring...

Some day when I have time to toy with this again I'll try the ubuntu
packages. For now I'm staying with the debian source packages, compiling
them on ubuntu.

--
Best regards,
    Jakob Østergaard Hegelund
    Evalesco A/S

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.