after upgrading to bionic, my session forgets who I am frequently

Bug #1807246 reported by Luke Schierer
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sssd (Ubuntu)
Fix Released
Medium
Andreas Hasenack
Bionic
Fix Released
Undecided
Unassigned

Bug Description

[Impact]
When the max_id parameter is used in an sssd [domain], and a lookup is performed for a user with an id higher than that, and the cache expired, sssd fails to query that user in the other defined domains.

The fix explicitly checks for the max_id case, letting the search continue on to other domains, and was provided by upstream.

The upstream patch was taken as is, including the whitespace changes and the unit test, since those applied cleanly.

I'm additionally adding the existing Disco DEP8 tests for sssd to this SRU, to facilitate testing for this update and subsequent ones. They don't trip on specifically this case, but give more confidence in the package since they test authentication (ldap and kerberos), ssl and user and group lookups.

[Test Case]

* Install sssd, slapd and ldap-utils, on a bionic VM or LXD (if you get weird errors, use a VM, because the uid mapping in lxd might be conflicting with the uids chosen for this test):
sudo apt update
sudo apt install sssd slapd ldap-utils

* Reconfigure slapd. Enter "example.com" for the domain, "example" for the organization, and "secret" for the admin password. For the rest, accept defaults:
sudo dpkg-reconfigure slapd

* Populate the ldap directory:
ldapadd -x -D cn=admin,dc=example,dc=com -w secret -c <<EOF
dn: ou=People,dc=example,dc=com
ou: People
objectClass: organizationalUnit

dn: ou=Group,dc=example,dc=com
ou: Group
objectClass: organizationalUnit

dn: uid=testuser1,ou=People,dc=example,dc=com
uid: testuser1
objectClass: inetOrgPerson
objectClass: posixAccount
cn: testuser1
sn: testuser1
givenName: testuser1
mail: <email address hidden>
userPassword: testuser1secret
uidNumber: 10001
gidNumber: 10001
loginShell: /bin/bash
homeDirectory: /home/testuser1

dn: cn=testuser1,ou=Group,dc=example,dc=com
cn: testuser1
objectClass: posixGroup
gidNumber: 10001
memberUid: testuser1

dn: cn=ldapusers,ou=Group,dc=example,dc=com
cn: ldapusers
objectClass: posixGroup
gidNumber: 10100
memberUid: testuser1

EOF

* Create /etc/sssd/sssd.conf with the following contents:
[sssd]
services = nss
domains = local,example

[nss]
debug_level = 6
memcache_timeout = 30

[domain/local]
id_provider = local
enumerate = true
max_id = 1000

[domain/example]
id_provider = ldap
enumerate = true
auth_provider = ldap
ldap_uri = ldap://localhost
ldap_search_base = dc=example,dc=com
ldap_tls_reqcert = allow
cache_credentials = true
use_fully_qualified_names = false

* Adjust permissions and restart:
sudo chmod 0600 /etc/sssd/sssd.conf
sudo systemctl restart sssd

* Test:
id testuser1

Should return:
uid=10001(testuser1) gid=10001 groups=10001,10100

* Create a home directory:
sudo mkdir /home/testuser1 -m 0700
sudo chown testuser1:testuser1 /home/testuser1

* Become testuser1 and run this script. Depending on how long ago was the sssd restart above, it should fail soon, at most in 40s:
sudo -u testuser1 -i
while /bin/true; do date; whoami || break; echo; sleep 10; done

Wed Jan 16 19:12:02 UTC 2019
testuser1
...

Wed Jan 16 19:12:22 UTC 2019
whoami: cannot find name for user ID 10001: Unknown error 1432158300

With the fixed packages installed, that while loop won't be exited.

[Regression Potential]
sssd can be complicated to setup and test, not because of itself, but because of the additional services that need to be setup (ldap server, kerberos server, etc). I believe the inclusion of the current DEP8 tests together with this SRU helps detect regressions due to this update, and future updates after this one.
The real fix in this SRU is a one liner, merely the treatment of the max_id return code, which wasn't being handled before and meant the lookup would stop too early. This exists in cosmic and disco already, and no regressions have been spotted.

[Other Info]
The real fix is a one liner. If the SRU team prefers, I can change the patch to do just that, in the spirit of minimal changes necessary.

[Original Description]
I configured sssd on an Ubuntu 16.04 LTS system, and it worked just fine. In fact, using the same sssd.conf file (which is managed by puppet) on un-upgraded system continues to work fine.

However, after upgrading to 18.04.1 LTS, I find that the system is continuously forgetting who I am. After a few commands, or a few minutes (I'm not sure exactly how many, but around 3-5 minutes) if I try to run sudo or whoami, it says that I am an unknown user. for example,

```
whoami
whoami: cannot find name for user ID 2000: Unknown error 1432158300
```

if I run the id command on my username, it returns the correct results, and whoami/sudo/other restricted commands will work again for a short time before forgetting who I am again.

In the sssd_nss.log file, I see the lookup against the @local domain, but I do not see a related lookup in the ldap domain either in that log file or in the log file specific to the ldap domain.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: sssd 1.16.1-1ubuntu1
ProcVersionSignature: Ubuntu 4.15.0-42.45-generic 4.15.18
Uname: Linux 4.15.0-42-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
Date: Thu Dec 6 12:30:43 2018
Ec2AMI: ami-ea677d80
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: us-east-1c
Ec2InstanceType: t2.small
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
SourcePackage: sssd
UpgradeStatus: Upgraded to bionic on 2018-10-04 (63 days ago)

Related branches

Revision history for this message
Luke Schierer (lschierer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi the only thing that comes to my mind would be the default values of the enumeration cache timeouts. those are in the 2-5 minute range.

Check [1] for enum_cache_timeout and related entries.
Maybe create a script that does "while true; sleep 10s; date; check UID; done"
Then you can check how long it takes to forget in your case.
Bump all kind of these timeouts and repeat.
If it helps take them back one by one until you have found which timeout it is in your case.
Then we would at least already know which sub-cache it is that forgets your user.

You could also play with the "enumerate" option in general.
What have you set atm, how does it behave when you switch it to the other value .
Something like [3] could be related to that.

Also could you check your logs if it could be anything like [2] as it reads very similar.

I also asked a friend actually knowing sssd better than I do, maybe he has some hints later on.

[1]: http://manpages.ubuntu.com/manpages/bionic/man5/sssd.conf.5.html
[2]: https://www.linuxquestions.org/questions/linux-server-73/sssd-forgets-group-name-4175577727/
[3]: https://bugzilla.redhat.com/show_bug.cgi?id=1359208

Changed in sssd (Ubuntu):
status: New → Incomplete
Revision history for this message
Luke Schierer (lschierer) wrote :
Download full text (3.3 KiB)

I have verified there is no overlap in UIDs, so I don't think the linuxquestions.org problem applies.

it apparently forgets a lot faster than I was realizing, I just don't use commands that matter fast enough to notice:

```
luke@schierer@talemludum001:~$ !id
id luke@schierer
uid=2000(luke@schierer) gid=100(users) groups=100(users),2(bin),200,3(sys),10(uucp),60(games),4(adm),50(staff),27(sudo),40(src),37(operator),6(disk),110(uuidd),1(daemon),102(systemd-network),24(cdrom),29(audio)
luke@schierer@talemludum001:~$ for i in `seq 1 1000`; do date; whoami; sleep 10s; done
Fri Dec 7 07:52:19 EST 2018
luke@schierer
Fri Dec 7 07:52:29 EST 2018
luke@schierer
Fri Dec 7 07:52:39 EST 2018
luke@schierer
Fri Dec 7 07:52:49 EST 2018
luke@schierer
Fri Dec 7 07:52:59 EST 2018
luke@schierer
Fri Dec 7 07:53:09 EST 2018
luke@schierer
Fri Dec 7 07:53:19 EST 2018
luke@schierer
Fri Dec 7 07:53:29 EST 2018
luke@schierer
Fri Dec 7 07:53:39 EST 2018
luke@schierer
Fri Dec 7 07:53:49 EST 2018
luke@schierer
Fri Dec 7 07:53:59 EST 2018
luke@schierer
Fri Dec 7 07:54:09 EST 2018
luke@schierer
Fri Dec 7 07:54:19 EST 2018
luke@schierer
Fri Dec 7 07:54:29 EST 2018
luke@schierer
Fri Dec 7 07:54:39 EST 2018
luke@schierer
Fri Dec 7 07:54:49 EST 2018
luke@schierer
Fri Dec 7 07:54:59 EST 2018
luke@schierer
Fri Dec 7 07:55:09 EST 2018
luke@schierer
Fri Dec 7 07:55:19 EST 2018
luke@schierer
Fri Dec 7 07:55:29 EST 2018
luke@schierer
Fri Dec 7 07:55:39 EST 2018
luke@schierer
Fri Dec 7 07:55:49 EST 2018
luke@schierer
Fri Dec 7 07:55:59 EST 2018
luke@schierer
Fri Dec 7 07:56:09 EST 2018
luke@schierer
Fri Dec 7 07:56:19 EST 2018
luke@schierer
Fri Dec 7 07:56:29 EST 2018
luke@schierer
Fri Dec 7 07:56:39 EST 2018
luke@schierer
Fri Dec 7 07:56:49 EST 2018
luke@schierer
Fri Dec 7 07:56:59 EST 2018
luke@schierer
Fri Dec 7 07:57:09 EST 2018
whoami: cannot find name for user ID 2000: Unknown error 1432158300
Fri Dec 7 07:57:19 EST 2018
whoami: cannot find name for user ID 2000: Unknown error 1432158300
^C
luke@schierer@talemludum001:~$
```

a redacted sssd.conf (for domain names and such)

```
luke@schierer@talemludum001:~$ sudo cat /etc/sssd/sssd.conf
# Managed by Puppet.

[sssd]
services = nss, pam, sudo
domains = local, bramlet, ciziunas, schierer

[nss]
debug_level = 6
enum_cache_timeout = 300

[domain/local]
id_provider = local
enumerate = true
max_id = 1000

[domain/bramlet]
id_provider = ldap
enumerate = true
auth_provider = ldap
ldap_schema = rfc2307bis
ldap_uri = ldap://censor001.<domain>
ldap_search_base = ou=bramlet,dc=....
ldap_tls_reqcert = allow
cache_credentials = true
use_fully_qualified_names = true

[domain/ciziunas]
id_provider = ldap
enumerate = true
auth_provider = ldap
ldap_schema = rfc2307bis
ldap_uri = ldap://censor001.<domain>
ldap_search_base = ou=ciziunas,....
ldap_tls_reqcert = allow
cache_credentials = true
use_fully_qualified_names = true

[domain/schierer]
debug_level = 6
id_provider = ldap
enumerate = true
auth_provider = ldap
ldap_schema = rfc2307bis
ldap_uri = ldap://censor001.<domain>
ldap_search_base = ou=schierer,dc=....
ldap_tls_reqcert = allow
cache_credentials = true
use_fully_qualified_names = true

luke@schierer@talemlud...

Read more...

Revision history for this message
Luke Schierer (lschierer) wrote :

actually, that is right on 300, isn't it? anyway, as I said, while I will manipulate the cache values, it shouldn't be necessary. This is a regression.

Revision history for this message
Luke Schierer (lschierer) wrote :

I changed enum_cache_timeout to 600 and set entry_cache_timeout = 200 in the [domain/schierer] section of the above sssd.conf file. Despite that, it still starts stating I am unknown around 320 seconds ( I have 316 seconds worth of successful "whoami ; sleep 10s" loop iterations, I know it took a couple of seconds to type that loop up and hit enter from when I initially ran the id command.)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

That is odd, AFAIK Ubuntu didn't make any changes on top of upstream in that regard.
So the only thing that comes to my mind would be an intentional upstream change bweteen the sssd versions in 16.04 (1.13.4-1ubuntu1.12) and 18.04 (1.16.1-1ubuntu1)

@Luke
Would you mind filing an upstream bug about this if this is known/expected in any way between those versions? If you do so please link it here so we can track it.
The bug I found before 1359208 isn't fully (if anything it should be in 16.04 but fixed in 18.04) the same so updates there would be wrong IMHO.
[1] describes how to create one at [2].

@Andreas - I subscribed you for sssd experience in case this rings any bell

[1]: https://docs.pagure.org/SSSD.sssd/users/reporting_bugs.html
[2]: https://pagure.io/SSSD/sssd/issues

Revision history for this message
Luke Schierer (lschierer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thank you Luke!
Lets see if there is any new update that might shed some light here.

Revision history for this message
Luke Schierer (lschierer) wrote :

upstream says there is a known problem with having a domain with max_id set in the version that bionic ships. This is apparently covered in https://pagure.io/SSSD/sssd/issue/3728

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: fix is [1] in PR [2]

@Andreas will you take a look at that when you are back?

[1]: https://pagure.io/SSSD/sssd/c/2952de7
[2]: https://github.com/SSSD/sssd/pull/565

Revision history for this message
Luke Schierer (lschierer) wrote :

confirmed that not having a max_id line in the local domain does work around the issue.

Revision history for this message
Luke Schierer (lschierer) wrote :

any word on getting the fix merged into ubuntu?

Robie Basak (racb)
Changed in sssd (Ubuntu):
status: Incomplete → Triaged
tags: added: server-next
Changed in sssd (Ubuntu):
assignee: nobody → Andreas Hasenack (ahasenack)
importance: Undecided → Medium
status: Triaged → In Progress
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Building test packages.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Problem reproduced, and also verified that the test packages fix it.

description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Luke, or anyone else affected,

Accepted sssd into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/sssd/1.16.1-1ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in sssd (Ubuntu Bionic):
status: New → Fix Committed
tags: added: verification-needed verification-needed-bionic
Revision history for this message
Luke Schierer (lschierer) wrote :

luke@schierer@talemludum001:~$ dpkg -l | grep sss
ii libnss-sss:amd64 1.16.1-1ubuntu1 amd64 Nss library for the System Security Services Daemon
ii libpam-sss:amd64 1.16.1-1ubuntu1 amd64 Pam module for the System Security Services Daemon
ii libsss-certmap0 1.16.1-1ubuntu1 amd64 Certificate mapping library for SSSD
ii libsss-idmap0 1.16.1-1ubuntu1.1 amd64 ID mapping library for SSSD
ii libsss-nss-idmap0 1.16.1-1ubuntu1 amd64 SID based lookups library for SSSD
ii libsss-simpleifp0 1.16.1-1ubuntu1.1 amd64 SSSD D-Bus responder helper library
ii libsss-sudo 1.16.1-1ubuntu1 amd64 Communicator library for sudo
ii python3-sss 1.16.1-1ubuntu1.1 amd64 Python3 module for the System Security Services Daemon
ii sssd 1.16.1-1ubuntu1.1 amd64 System Security Services Daemon -- metapackage
ii sssd-ad 1.16.1-1ubuntu1.1 amd64 System Security Services Daemon -- Active Directory back end
ii sssd-ad-common 1.16.1-1ubuntu1.1 amd64 System Security Services Daemon -- PAC responder
ii sssd-common 1.16.1-1ubuntu1.1 amd64 System Security Services Daemon -- common files
ii sssd-dbus 1.16.1-1ubuntu1.1 amd64 System Security Services Daemon -- D-Bus responder
ii sssd-ipa 1.16.1-1ubuntu1.1 amd64 System Security Services Daemon -- IPA back end
ii sssd-krb5 1.16.1-1ubuntu1.1 amd64 System Security Services Daemon -- Kerberos back end
ii sssd-krb5-common 1.16.1-1ubuntu1.1 amd64 System Security Services Daemon -- Kerberos helpers
ii sssd-ldap 1.16.1-1ubuntu1.1 amd64 System Security Services Daemon -- LDAP back end
ii sssd-proxy 1.16.1-1ubuntu1.1 amd64 System Security Services Daemon -- proxy back end
ii sssd-tools 1.16.1-1ubuntu1.1 amd64 System Security Services Daemon -- tools
luke@schierer@talemludum001:~$

fixed the bug

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Bionic verification

First, confirming the bug:

# checking ldap user exists, and is not defined in /etc/passwd
root@bionic-sssd-1807246:~# id testuser1
uid=10001(testuser1) gid=10001 groups=10001,10100
root@bionic-sssd-1807246:~# grep testuser1 /etc/passwd
root@bionic-sssd-1807246:~#

# looping over whoami calls fails eventually as expected by this bug:
testuser1@bionic-sssd-1807246:~$ while /bin/true; do date; whoami || break; echo; sleep 10; done
Wed Jan 30 18:35:20 UTC 2019
testuser1

Wed Jan 30 18:35:30 UTC 2019
testuser1

Wed Jan 30 18:35:40 UTC 2019
whoami: cannot find name for user ID 10001: Unknown error 1432158300
$

# Installing the packages from proposed:
root@bionic-sssd-1807246:~# apt-cache policy sssd
sssd:
  Installed: 1.16.1-1ubuntu1.1
  Candidate: 1.16.1-1ubuntu1.1
  Version table:
 *** 1.16.1-1ubuntu1.1 500
        500 http://br.archive.ubuntu.com/ubuntu bionic-proposed/main amd64 Packages
        100 /var/lib/dpkg/status
     1.16.1-1ubuntu1 500
        500 http://br.archive.ubuntu.com/ubuntu bionic/main amd64 Packages

Retrying the whoami loop, which this time doesn't fail and has to be aborted:
testuser1@bionic-sssd-1807246:~$ while /bin/true; do date; whoami || break; echo; sleep 10; done
Wed Jan 30 18:37:25 UTC 2019
testuser1

Wed Jan 30 18:37:35 UTC 2019
testuser1

Wed Jan 30 18:37:45 UTC 2019
testuser1

Wed Jan 30 18:37:55 UTC 2019
testuser1

Wed Jan 30 18:38:05 UTC 2019
testuser1

Wed Jan 30 18:38:15 UTC 2019
testuser1

^C
testuser1@bionic-sssd-1807246:~$

Bionic verification succeeded.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package sssd - 1.16.1-1ubuntu1.1

---------------
sssd (1.16.1-1ubuntu1.1) bionic; urgency=medium

  * d/p/fix-id-out-of-range-lookup.patch: CACHE_REQ: Do not fail the domain
    locator plugin if ID outside the domain range is looked up. Thanks to
    Jakub Hrozek <email address hidden>. (LP: #1807246)
  * d/t/common-tests, d/t/control, d/t/ldap-user-group-krb5-auth,
    d/t/ldap-user-group-ldap-auth, d/t/login.exp, d/t/util: add DEP8
    tests for kerberos and LDAP (LP: #1793882)

 -- Andreas Hasenack <email address hidden> Wed, 16 Jan 2019 13:58:03 -0200

Changed in sssd (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for sssd has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Disco has this fixed already, updating main task.

Changed in sssd (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.