Upgrade sssd in 20.04 to the version in 20.10 as the 20.04 version crashes

Bug #1902808 reported by Adam Pfeiffer
32
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OEM Priority Project
New
Undecided
Unassigned
sssd (Ubuntu)
Fix Released
High
Unassigned
Focal
Confirmed
High
Unassigned

Bug Description

The sssd daemon is crashing when under heavy load on ubuntu 20.04.

I have created a python script that will log into an Ubuntu 20.04 ever second to test logging in via sssd. In my configuration, sssd is configured to authenticate LDAP requests.

When I run this script against 20.04, it will randomly fail to log in and my script will exit. This can happen anywhere from the first login attempt to the 100th login attempt.

When I run this same script again Ubuntu 18.04 and 20.10, it runs without any issues. I took my server running 20.04 and did a release upgrade to 20.10 and the script runs without issues.

Because of this, I believe the version of sssd included with 20.04 at the time of this bug:
sssd/focal,now 2.2.3-3 amd64 [installed]

is unstable and will crash causing some logins to fail.

When I upgraded to 20.10 and the version became:
sssd/groovy,now 2.3.1-3 amd64 [installed]

I no longer see any issues. Due to policies at my company, I am only able to run LTS version of Ubuntu, so I am requesting that the newer version of sssd in 20.10 be ported to 20.04.

Thanks

Revision history for this message
Sebastien Bacher (seb128) wrote :

Thank you for your bug report, do you have any error recorded in /var/crash? Could you check the 'journalctl -f' log and see if any sssd error is printed when you get the failure? We usually don't do version updates if they are not bugfix only so understanding the issue and applying a patch might be a better solution

Changed in sssd (Ubuntu):
importance: Undecided → High
Revision history for this message
Adam Pfeiffer (adampfeiffer) wrote :

Hello,
Thanks for the quick response. I don't see any errors recorded in /var/crash so the word 'crash' might not be the correct wording.

I want to give a brief description of what I am seeing in the logs that I believe indicates a crash is happening and then I will attached the logs as well as our configuration file.

I start up sssd and once it is done starting up I see the following line in the domain service log (sssd_BSNSERVICE.log):
(Wed Nov 4 08:20:37 2020) [sssd[be[BSNSERVICE]]] [be_run_online_cb] (0x0080): Going online. Running callbacks.

I then start up the python script which continually logs into the device, and after a short time a login will hang from the script and then I will see another line that says Going online:
(Wed Nov 4 08:24:59 2020) [sssd[be[BSNSERVICE]]] [be_run_online_cb] (0x0080): Going online. Running callbacks.

The 'Going online' is what makes me thing this is crashing and restarting. As noted in the original bug report, using the same configuration file on 18.04 and on 20.10 works and I don't see the above issue.

Please let me know if there is anything else I can provide or any other tests I can run to assist with debugging.

Thanks

Revision history for this message
Adam Pfeiffer (adampfeiffer) wrote :

Also, I would like to note that in the sssd.conf file, the areas that have **** in them are where we have taken out sensitive data.

Thanks

Revision history for this message
Adam Pfeiffer (adampfeiffer) wrote :

I had to attached a new set of logs that had sensitive information removed.

Thanks

Revision history for this message
Adam Pfeiffer (adampfeiffer) wrote :
Revision history for this message
Adam Pfeiffer (adampfeiffer) wrote :
Revision history for this message
Adam Pfeiffer (adampfeiffer) wrote :

This is the file that shows the issue. My security team kept updating the files without telling me they were not yet done. Sorry for the shuffle on the attachments, but this should be the final set of logs.

Revision history for this message
Adam Pfeiffer (adampfeiffer) wrote :
Download full text (154.6 KiB)

I ran journalctl -f while I was running my login script. The output is below:

Nov 05 11:57:02 pyuniti-brm-4 audit[1626664]: AVC apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_pam" name="/proc/1666736/cmdline" pid=1626664 comm="sssd_pam" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Nov 05 11:57:03 pyuniti-brm-4 sshd[1666755]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=10.38.255.25 user=fvt-user
Nov 05 11:57:03 pyuniti-brm-4 audit[1626664]: AVC apparmor="ALLOWED" operation="capable" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_pam" pid=1626664 comm="sssd_pam" capability=2 capname="dac_read_search"
Nov 05 11:57:03 pyuniti-brm-4 audit[1626664]: AVC apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_pam" name="/proc/1666755/cmdline" pid=1626664 comm="sssd_pam" requested_mask="r" denied_mask="r" fsuid=0 ouid=0

<this is where the script login hangs and then fails, you can see there is a 30 second pause in the logging>

Nov 05 11:57:33 pyuniti-brm-4 audit[1666877]: AVC apparmor="ALLOWED" operation="exec" profile="/usr/sbin/sssd" name="/usr/libexec/sssd/sssd_be" pid=1666877 comm="sssd" requested_mask="x" denied_mask="x" fsuid=0 ouid=0 target="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be"
Nov 05 11:57:33 pyuniti-brm-4 audit[1666877]: AVC apparmor="ALLOWED" operation="file_mmap" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/usr/libexec/sssd/sssd_be" pid=1666877 comm="sssd_be" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Nov 05 11:57:33 pyuniti-brm-4 audit[1666877]: AVC apparmor="ALLOWED" operation="file_mmap" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/usr/lib/x86_64-linux-gnu/ld-2.31.so" pid=1666877 comm="sssd_be" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Nov 05 11:57:33 pyuniti-brm-4 audit[1666877]: AVC apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/etc/ld.so.cache" pid=1666877 comm="sssd_be" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Nov 05 11:57:33 pyuniti-brm-4 audit[1666877]: AVC apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/usr/lib/x86_64-linux-gnu/libdl-2.31.so" pid=1666877 comm="sssd_be" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Nov 05 11:57:33 pyuniti-brm-4 audit[1666877]: AVC apparmor="ALLOWED" operation="file_mmap" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/usr/lib/x86_64-linux-gnu/libdl-2.31.so" pid=1666877 comm="sssd_be" requested_mask="rm" denied_mask="rm" fsuid=0 ouid=0
Nov 05 11:57:33 pyuniti-brm-4 audit[1666877]: AVC apparmor="ALLOWED" operation="open" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/usr/lib/x86_64-linux-gnu/libtevent.so.0.10.1" pid=1666877 comm="sssd_be" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Nov 05 11:57:33 pyuniti-brm-4 audit[1666877]: AVC apparmor="ALLOWED" operation="file_mmap" profile="/usr/sbin/sssd//null-/usr/libexec/sssd/sssd_be" name="/usr/lib/x86_64-linux-gnu/libtevent.so.0.10.1" pid=1666877 comm="sssd_be" requested_mask="rm" denied_mask="rm" fsuid=0...

Revision history for this message
Adam Pfeiffer (adampfeiffer) wrote :

I believe I have supplied all of the information requested. If you need any further data to fix this issue, please let me know.

Thanks

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in sssd (Ubuntu):
status: New → Confirmed
Revision history for this message
Paride Legovini (paride) wrote :

Hello Adam and thanks for the additional information. It would be very useful to gather more information from sssd in the moment it crashes (or hangs and restarts). Could you try stopping the sssd service and running it manually in foreground with a higher debug level? Something like:

  sudo sssd --interactive --logger=stderr --debug-level=X

with x >= 3 I'd say, bump it until it prints anything useful, see sssd(8) for a description of the levels. I hope we can identify a fingerprint of the crash that can lead us to the upstream change that fixed it in the newer versions.

Changed in sssd (Ubuntu Focal):
importance: Undecided → High
status: New → Confirmed
Changed in sssd (Ubuntu):
status: Confirmed → Invalid
Changed in sssd (Ubuntu):
status: Invalid → Fix Released
Revision history for this message
Adam Pfeiffer (adampfeiffer) wrote :

Thanks for the updates on this ticket. I see that a fix has been released, but I am not seeing what version contains the fix. This is my first bug report, so I am hoping you can help guide me on where I can find the version with the fix so that I can verify this in my environment.

Thanks

Revision history for this message
Marco Trevisan (Treviño) (3v1n0) wrote :

Adam,

I've marked this as fixed for ubuntu in the latest released version (so 20.10), however it's not clear what upstream commit fixed this yet.

If sssd completely crashes you should install the debug symbols [1] and try to attach to the process with gdb or see if you get a crash in /var/crash

[1] https://wiki.ubuntu.com/Debug%20Symbol%20Packages

Revision history for this message
David Chen (david.chen) wrote :

Hi, is there a plan to fix it for Focal? Thanks

Rex Tsai (chihchun)
tags: added: oem-priority originate-from-1927191 somerville
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.