autofs don't close sockets and stops working when max open files limit reached

Bug #1996869 reported by Terje Røsten
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
sssd (Ubuntu)
Fix Released
Undecided
Lena Voytek
Jammy
Incomplete
Undecided
Lena Voytek
Kinetic
Fix Released
Undecided
Lena Voytek

Bug Description

[Impact]

Due to a regression in sssd in Kinetic, when using autofs, client sockets will not be closed properly on exit. This leads to a buildup of leaked sockets until the limit of 1024 is reached. Once this happens, autofs fails with the following errors:

lookup_nss_mount: can't to read name service switch config.
nsswitch_parse:172: couldn't open /etc/nsswitch.conf

This should be backported to Kinetic since the issue causes autofs to fail consistently.

This bug is fixed through a patch from upstream that closes sockets properly on exit in sssd. Locking and cleanup mechanisms are updated in the client library.

[Test Plan]

With an available nfs/openldap server, create a client with the following:

autofs client setup
# sudo apt update && apt dist-upgrade -y
# sudo apt install autofs autofs-ldap sssd-ldap ldap-utils -y
# sudo mkdir -p /etc/sssd

Add sssd config with server domain - /etc/sssd/sssd.conf

[domain/default]
{snip}
ldap_autofs_search_base = ou=Autofs,dc=example,dc=com
ldap_autofs_map_object_class = automountMap
ldap_autofs_entry_object_class = automount
ldap_autofs_map_name = automountMapName
ldap_autofs_entry_key = automountKey
ldap_autofs_entry_value = automountInformation

[sssd]
services = autofs

Append to /etc/nsswitch.conf:
automount: files sss

Set contents of /etc/default/autofs with domain information:
MASTER_MAP_NAME="auto.master.foo"
TIMEOUT=300
BROWSE_MODE="no"
MOUNT_NFS_DEFAULT_PROTOCOL=3
LOGGING="none"
LDAP_URI="ldap://example.com"
SEARCH_BASE="dc=example,dc=com"
MAP_OBJECT_CLASS="automountMap"
ENTRY_OBJECT_CLASS="automount"
MAP_ATTRIBUTE="automountMapName"
ENTRY_ATTRIBUTE="automountKey"
VALUE_ATTRIBUTE="automountInformation"
AUTH_CONF_FILE="/etc/autofs_ldap_auth.conf"
USE_MISC_DEVICE="yes"

set usetls to "yes" in /etc/autofs_ldap_auth.conf

# sudo systemctl restart sssd
# sudo systemctl restart autofs

Use automount to mount and unmount an NFS share a few times. Without the fix, multiple sockets will show up for automount when running
# ls -l /proc/$(pidof automount)/fd/ | grep socket

While after the fix only one will show up while connected, and none afterward.

[Where problems could occur]

Regressions from this patch would most likely occour in the management of sssd client sockets. This could include prematurely closing a socket, or mishandling locks on a client process.

[Other Info]

This was fixed through upstream updates in Lunar and is not an issue in Jammy.

[Original Description]

autofs in u22.10:

autofs 5.1.8-1ubuntu3

seems to introduce a regression.

Doing

$ ls -l /proc/$autofs-pid/fd

I see

lrwx------ 1 root root 64 Nov 17 09:45 0 -> /dev/null
lrwx------ 1 root root 64 Nov 17 09:45 1 -> /dev/null
lrwx------ 1 root root 64 Nov 17 09:45 10 -> /run/autofs.fifo-home
lrwx------ 1 root root 64 Nov 17 09:45 100 -> socket:[4012989]
lrwx------ 1 root root 64 Nov 17 09:45 1000 -> socket:[4277146]
lrwx------ 1 root root 64 Nov 17 09:45 1001 -> socket:[4283734]
lrwx------ 1 root root 64 Nov 17 09:45 1002 -> socket:[4281542]
[snip]
lrwx------ 1 root root 64 Nov 17 09:45 102 -> socket:[4016178]
lrwx------ 1 root root 64 Nov 17 09:45 1020 -> socket:[4286945]
lrwx------ 1 root root 64 Nov 17 09:45 1021 -> socket:[4293742]
lrwx------ 1 root root 64 Nov 17 09:45 1022 -> socket:[4294783]
lrwx------ 1 root root 64 Nov 17 09:45 1023 -> socket:[4260750]
[snip]

and:

$ ls -l /proc/$autofs-pid/fd | wc -l
1025

so autofs keep opening sockets until limit of 1024 reached.

I then get

automount[1640142]: lookup_nss_mount: can't to read name service switch config.
automount[1640142]: nsswitch_parse:172: couldn't open /etc/nsswitch.conf

so things breaks completely.

U22.04 with autofs 5.1.8-1ubuntu1.2 works fine in the same environment.

Maps are in LDAP, with /usr/libexec/sssd/sssd_autofs and automount: files sss in
/etc/nsswitch.conf

Related branches

Revision history for this message
Terje Røsten (terjeros) wrote :

Root cause seems to be in sssd, most likely the problem is this:

  https://github.com/SSSD/sssd/commit/1b2e4760c52b9abd0d9b9f35b47ed72e79922ccc
  CLIENT: fix client fd leak

Can sshd please be updated to 2.7.4 to have this fixed.

Thanks in advance.

affects: autofs (Ubuntu) → sssd (Ubuntu)
Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

Hi Terje,

Thanks for taking the time to report this bug and trying to make Ubuntu better. Also for the investigation and providing a possible fix for the issue you are facing.

Just to clarify, have you tried the modifications in the upstream commit you linked in your local environment? Does it really fix your issue? Unfortunately, we do not update packages in stable releases to a new upstream version, so what we need to do is finding the right patch and applying it to the version we already have in 22.10. But to do that we also need some reproducible steps so we can claim the bug is fixed. Could you help us with the information above?

I am setting the bug status to Incomplete but once you provide more information please set it back to New and we will take a look again.

Changed in sssd (Ubuntu):
status: New → Incomplete
Revision history for this message
Terje Røsten (terjeros) wrote :

I reverted all sssd (related) packages back to 2.6.3-1ubuntu3.2 (from jammy-updates) and then the problem went away.

We have also have Fedora 36/37 systems with sssd 2.7.4, where the problem is not present.

Ubuntu 22.10 with sssd 2.7.3 is only platform we have seen this issue.

I don't understand this:

 "we do not update packages in stable releases to a new upstream version".

I was thinking that is exactly what a Linux distro should be, providing users with stable versions of bug fix releases like sssd 2.7.4 (over 2.7.3)?

Going to sssd 2.8.1 is completely different matter.

Any way, if you add the patch from upstream:

  https://github.com/SSSD/sssd/commit/1b2e4760c52b9abd0d9b9f35b47ed72e79922ccc

and do a test build I can test that for you.

Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

Hi Terje,

What Lucas meant here is that we usually do not change versions of packages in stable ubuntu versions unless there is an specific exception for that (see https://wiki.ubuntu.com/StableReleaseUpdates#Documentation_for_Special_Cases).

Still, it is possible to provide fixes as per the process described at https://wiki.ubuntu.com/StableReleaseUpdates.

To proceed here, would you mind providing a minimal reproducer for your issue and re-setting this bug status to new?

Thanks!

Revision history for this message
Terje Røsten (terjeros) wrote :

Hi!

I agree updates of packages is in general an art of balance.

Now to the problem at hand. I am fairly sure there is problem with SSSH 2.7.3 as we have deployed lots of Linux platforms: Ubuntu 18.04, 20.04, 22.04 and 22.10; Debian 10 and 11; RHEL6, RHEL7, RHEL8 and RHEL9; SLES 12 and openSUSE 15 and Fedora 35, 36 and 37.

Ubuntu 22.10 is the only platform with issues. Also when, sssd packages in Ubuntu 22.10 are reverted back to those found in Ubuntu 22.04 it works fine.
Fedora 36 and 37 with sssd 2.7.4 also works fine.

To reproduce the issue some infrastructure is needed:
 - OpenLDAP server
 - nfs server

At OpenLDAP server automount maps must be configured.

The client must then use sssd_autofs by something like this:

sssd.conf:

[domain/default]
{snip}
ldap_autofs_search_base = ou=Autofs,dc=example,dc=com
ldap_autofs_map_object_class = automountMap
ldap_autofs_entry_object_class = automount
ldap_autofs_map_name = automountMapName
ldap_autofs_entry_key = automountKey
ldap_autofs_entry_value = automountInformation

[sssd]
services = autofs

/etc/nsswitch.conf:
automount: files sss

/etc/default/autofs:
MASTER_MAP_NAME="auto.master.foo"
TIMEOUT=300
BROWSE_MODE="no"
MOUNT_NFS_DEFAULT_PROTOCOL=3
LOGGING="none"
LDAP_URI="ldap://example.com"
SEARCH_BASE="dc=example,dc=com"
MAP_OBJECT_CLASS="automountMap"
ENTRY_OBJECT_CLASS="automount"
MAP_ATTRIBUTE="automountMapName"
ENTRY_ATTRIBUTE="automountKey"
VALUE_ATTRIBUTE="automountInformation"
AUTH_CONF_FILE="/etc/autofs_ldap_auth.conf"
USE_MISC_DEVICE="yes"

/etc/autofs_ldap_auth.conf:
<?xml version="1.0" ?>
<autofs_ldap_sasl_conf
    usetls="yes"
    tlsrequired="no"
    authrequired="no"
/>

When everything is up and working, the client fd leak can be seen by mounting and umounting a NFS share several times. Then list open sockets in for automount by doing e.g:

$ ls -l /proc/$(pidof automount)/fd/ | grep socket
lrwx------ 1 root root 64 Jan 16 13:18 11 -> socket:[1449694324]
lrwx------ 1 root root 64 Jan 16 13:18 32 -> socket:[1437834978]
lrwx------ 1 root root 64 Jan 16 13:18 4 -> socket:[1449583336]

As seen the symptom is seen at the client side of SSSD, not in SSSD itself, however the bug is indeed in SSSD as proved by my testing of older SSSD packages on U22.10 and the very existence of the bug fix in SSSD upstream:

  https://github.com/SSSD/sssd/commit/1b2e4760c52b9abd0d9b9f35b47ed72e79922ccc

Changed in sssd (Ubuntu):
status: Incomplete → New
Revision history for this message
Lena Voytek (lvoytek) wrote :

Hi Terje,

I created a ppa for 22.10 that incorporates the commit you provided. Located at https://launchpad.net/~lvoytek/+archive/ubuntu/sssd-fix-client-fd-leak. If you would like to test it then you can run the following commands:

sudo add-apt-repository ppa:lvoytek/sssd-fix-client-fd-leak
sudo apt update
sudo apt upgrade

Let us know if that fixes it, thanks!

Revision history for this message
Terje Røsten (terjeros) wrote :

Hi again!

With this package set:

$ dpkg -l|grep sssd | awk '{print $2 " "$3 }'
sssd 2.7.3-2ubuntu3~ppa1
sssd-ad 2.7.3-2ubuntu3~ppa1
sssd-ad-common 2.7.3-2ubuntu3~ppa1
sssd-common 2.7.3-2ubuntu3~ppa1
sssd-ipa 2.7.3-2ubuntu3~ppa1
sssd-krb5 2.7.3-2ubuntu3~ppa1
sssd-krb5-common 2.7.3-2ubuntu3~ppa1
sssd-ldap 2.7.3-2ubuntu3~ppa1
sssd-proxy 2.7.3-2ubuntu3~ppa1

I am not able to trigger "lost sockets" in autofs, I see only one socket open:

ls -l /proc/22803/fd/|grep socket
lrwx------ 1 root root 64 Jan 23 10:45 11 -> socket:[60844]

While Ubuntu 22.10 vanilla without this fix I can get something like this fairly quickly:

$ ls -l /proc/20848/fd/|grep socket
lrwx------ 1 root root 64 Jan 23 10:47 59 -> socket:[5448077]
lrwx------ 1 root root 64 Jan 23 10:47 60 -> socket:[5467634]
lrwx------ 1 root root 64 Jan 23 10:47 61 -> socket:[5581150]
lrwx------ 1 root root 64 Jan 23 10:47 62 -> socket:[5586434]
lrwx------ 1 root root 64 Jan 23 10:47 63 -> socket:[6084093]
lrwx------ 1 root root 64 Jan 23 10:47 64 -> socket:[6084094]
lrwx------ 1 root root 64 Jan 23 10:47 65 -> socket:[6715708]
lrwx------ 1 root root 64 Jan 23 10:47 66 -> socket:[6714747]
lrwx------ 1 root root 64 Jan 23 10:49 67 -> socket:[7392102]
lrwx------ 1 root root 64 Jan 23 10:49 68 -> socket:[7395074]
lrwx------ 1 root root 64 Jan 23 10:49 69 -> socket:[7392106]
lrwx------ 1 root root 64 Jan 23 10:49 70 -> socket:[7395096]
lrwx------ 1 root root 64 Jan 23 10:54 71 -> socket:[7398490]
lrwx------ 1 root root 64 Jan 23 10:54 72 -> socket:[7398492]
lrwx------ 1 root root 64 Jan 23 10:54 73 -> socket:[7392132]
lrwx------ 1 root root 64 Jan 23 10:54 74 -> socket:[7395122]
lrwx------ 1 root root 64 Jan 23 10:54 75 -> socket:[7386989]
lrwx------ 1 root root 64 Jan 23 10:54 76 -> socket:[7235345]
lrwx------ 1 root root 64 Jan 23 10:54 77 -> socket:[7392134]
lrwx------ 1 root root 64 Jan 23 10:54 78 -> socket:[7395140]

Thanks for the fixed packages!

tags: added: server-todo
Changed in sssd (Ubuntu Kinetic):
assignee: nobody → Lena Voytek (lvoytek)
Lena Voytek (lvoytek)
Changed in sssd (Ubuntu):
status: New → Fix Released
Changed in sssd (Ubuntu Kinetic):
status: New → In Progress
Lena Voytek (lvoytek)
description: updated
Lena Voytek (lvoytek)
description: updated
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Terje, or anyone else affected,

Accepted sssd into kinetic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/sssd/2.7.3-2ubuntu2.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-kinetic to verification-done-kinetic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-kinetic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in sssd (Ubuntu Kinetic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-kinetic
Revision history for this message
Lena Voytek (lvoytek) wrote :

Verified with kinetic client using setup stated in Test Plan section using proposed pocket along with a jammy nfs + OpenLDAP server. Result while connected:

$ ls -l /proc/$(pidof automount)/fd/ | grep socket
lrwx------ 1 root root 64 Feb 21 21:28 11 -> socket:[62237]

tags: added: verification-done verification-done-kinetic
removed: verification-needed verification-needed-kinetic
Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of the Stable Release Update for sssd has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sssd - 2.7.3-2ubuntu2.1

---------------
sssd (2.7.3-2ubuntu2.1) kinetic; urgency=medium

  * d/p/fix-client-fd-leak.patch (LP: #1996869):
    - close client socket at thread exit
    - only build lock-free client support if libc has required
      functionality for a proper cleanup
    - use proper mechanisms to init lock_mode only once

 -- Lena Voytek <email address hidden> Fri, 20 Jan 2023 11:03:59 -0700

Changed in sssd (Ubuntu Kinetic):
status: Fix Committed → Fix Released
Revision history for this message
Stefan Staeglich (staeglis) wrote :

Seems that the bug affects also Ubuntu 22.04. A backport of the fix would be nice

Revision history for this message
Lena Voytek (lvoytek) wrote :

Hi Stefan, I have not been able to reproduce this issue with Jammy personally. Have you been experiencing this issue at all or do you have a special test case for it?
Thanks

Revision history for this message
Stefan Staeglich (staeglis) wrote (last edit ):

We've observed a lot of this error messages:

automount[1248]: set_tsd_user_vars: failed to get passwd info from getpwuid_r
automount[1248]: nsswitch_parse:172: couldn't open /etc/nsswitch.conf
automount[1248]: lookup_nss_mount: can't to read name service switch config

Autofs is dis-functional after

Can this be related? Yesterday evening we've checked the sockets and that locked normal

Revision history for this message
Lena Voytek (lvoytek) wrote :

Hm, the issue does seem to be related. I'll see if I can patch 22.04 in a similar way to 22.10. Unfortunately it doesn't apply cleanly and will need some changes. Thanks for letting me know!

Lena Voytek (lvoytek)
Changed in sssd (Ubuntu Jammy):
assignee: nobody → Lena Voytek (lvoytek)
Changed in sssd (Ubuntu):
assignee: nobody → Lena Voytek (lvoytek)
Revision history for this message
Lena Voytek (lvoytek) wrote :

Hi Stefan, I modified the patch to apply to sssd 2.6.3 in Jammy and added it to the PPA. If you find the sockets having issues again and would like to try it you can run:

sudo add-apt-repository ppa:lvoytek/sssd-fix-client-fd-leak
sudo apt update
sudo apt upgrade

Lena Voytek (lvoytek)
Changed in sssd (Ubuntu Jammy):
status: New → Incomplete
Revision history for this message
Stefan Staeglich (staeglis) wrote :

Hi Lena, thank you for the fast update. We've tested the fix: The user/group part seems to be broken. There seems to be a crash in the authentication process:

[be[DOMAIN.NAME]] [sbus_issue_request_done] (0x0040): sssd.dataprovider.getAccountInfo: Error [1432158312]: Unknown service

What kind of logs do you need?

Revision history for this message
Lena Voytek (lvoytek) wrote (last edit ):

Hi Stefan,

Sorry about that. Looking further into it there seems to be additional incompatibilities in 2.6.3 that make the additions in this patch fail. I'll see if I can get the patch to work with the older version's lack of full libc pthread support but it might not succeed

Revision history for this message
Lena Voytek (lvoytek) wrote :

Hi Stefan,

I updated the patch to better integrate the libc changes. You can try the fix with the same commands:

sudo add-apt-repository ppa:lvoytek/sssd-fix-client-fd-leak
sudo apt update
sudo apt upgrade

Thanks

Lena Voytek (lvoytek)
tags: removed: server-todo
Revision history for this message
Stefan Staeglich (staeglis) wrote :

Unfortunately the PAM stack is still broken

[sbus_issue_request_done] (0x0040): sssd.dataprovider.getAccountInfo: Error [1432158312]: Unknown service
[dp_get_account_info_done] (0x0040): [RID#10] Error sending sbus message [1432158312]: Unknown service
   * ... skipping repetitive backtrace ...
[sbus_issue_request_done] (0x0040): sssd.dataprovider.getAccountInfo: Error [1432158312]: Unknown service

Revision history for this message
Lena Voytek (lvoytek) wrote :

Thanks for letting me know. I found the issue with PAM in the patch and cleaned it up. I did some testing and don't see any issues on my end. Hopefully this update fixes your issue

Revision history for this message
Stefan Staeglich (staeglis) wrote :

Hi Lena, thank you very much :)

I haven't observed any regressions anymore in the latest update. We will test if it also fixes the original issue and let you now

Revision history for this message
Stefan Staeglich (staeglis) wrote :

Hi Lena, we haven't observed the issue again since updating to the ppa version. Thank you very much :)

Revision history for this message
Lena Voytek (lvoytek) wrote :

That's great to hear! I'll see if we can get this into 22.04

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.