Comment 11 for bug 1883614

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Unfortunately looks like the crash file does not contain a valid dump...

(gdb) bt
#0 __GI_raise (sig=7551) at ../sysdeps/unix/sysv/linux/raise.c:37
#1 0x00007f63fb16d02a in __GI_abort () at abort.c:87
#2 0x00000000023953f0 in ?? ()
#3 0x000000000239bd80 in ?? ()
#4 0x0000000000000001 in ?? ()
#5 0x0000000000000000 in ?? ()

frames rewind is corrupted and

#0 __GI_raise (sig=7551) at ../sysdeps/unix/sysv/linux/raise.c:37
        pid = 0
        selftid = 0

values are scrambled (frame #1 seem to have correct mapped page address, according to ProcMaps).

judging by the message:

---

(Fri Aug 7 10:06:12 2020) [sssd] [talloc_log_fn] (0x0010): Bad talloc magic value - unknown value

it is very likely that a previous allocated (by talloc) buffer suffered an attempted to be realloc'ed or freed and talloc code realized the buffer checksum was corrupted, indicating a memory issue by the consumer (sssd in this case).

----

Ideas for debugging it:

----

Attempt (1)
===========

sysctl -w kernel.core_pattern=core

would make core dump to be generated without apport (to see if it helps). I tried to find other dumps using our upstream crash repository at:

https://errors.ubuntu.com/?release=Ubuntu 16.04&package=sssd&period=year

but there wasn't other failures we could use.

----

Attempt #2
==========

Another possibility would be to run valgrind and discover the issue:

Put this in your bashrc:
--
# debug symbols

getsymbols() {

    binfile=$1

    for pkg in $(for file in $(ldd $binfile | awk '{print $1}' | xargs); do dpkg -S $file 2>/dev/null ; done | awk '{print $1}' | sed 's:\:.*\:::g' | sort -u); do apt-cache pkgnames | grep -E "($pkg- dbg$|$pkg-dbgsym$)" ; done | xargs sudo apt-get install -y

}
--

and enable:

deb http://ddebs.ubuntu.com xenial main restricted universe multiverse
deb http://ddebs.ubuntu.com xenial-updates main restricted universe multiverse
deb http://ddebs.ubuntu.com xenial-proposed main restricted universe multiverse

in your /etc/apt/sources.list, apt-get update.

Then you do:

getsymbols /usr/sbin/sssd

it will install a bunch of "dbg and dbgsym" packages (for sssd dependencies). At the end you execute:

apt-get install sssd-common-dbgsym sssd-ad-common-dbgsym sssd-ad-dbgsym sssd-dbus-dbgsym sssd-ipa-dbgsym sssd-krb5-common-dbgsym sssd-ldap-dbgsym sssd-proxy-dbgsym

and you will have all debugging symbols installed for sssd and its dependencies. Then, instead of starting sssd from systemd, you run it with valgrind:

"""
$ sudo valgrind --tool=memcheck --trace-children=yes --leak-check=yes --leak-resolution=med --show-leak-kinds=definite --track-origins=yes /usr/sbin/sssd -i -f
"""

This will generate an output that you can save and attach here. Hopefully, with all debug symbols in place, memcheck from valgrind will tell us what are the places where the mem corruption has happened (saying stack trace as well).

Feel free to attach output file from valgrind to this case.