after upgrading to bionic, my session forgets who I am frequently
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
sssd (Ubuntu) |
Fix Released
|
Medium
|
Andreas Hasenack | ||
Bionic |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
When the max_id parameter is used in an sssd [domain], and a lookup is performed for a user with an id higher than that, and the cache expired, sssd fails to query that user in the other defined domains.
The fix explicitly checks for the max_id case, letting the search continue on to other domains, and was provided by upstream.
The upstream patch was taken as is, including the whitespace changes and the unit test, since those applied cleanly.
I'm additionally adding the existing Disco DEP8 tests for sssd to this SRU, to facilitate testing for this update and subsequent ones. They don't trip on specifically this case, but give more confidence in the package since they test authentication (ldap and kerberos), ssl and user and group lookups.
[Test Case]
* Install sssd, slapd and ldap-utils, on a bionic VM or LXD (if you get weird errors, use a VM, because the uid mapping in lxd might be conflicting with the uids chosen for this test):
sudo apt update
sudo apt install sssd slapd ldap-utils
* Reconfigure slapd. Enter "example.com" for the domain, "example" for the organization, and "secret" for the admin password. For the rest, accept defaults:
sudo dpkg-reconfigure slapd
* Populate the ldap directory:
ldapadd -x -D cn=admin,
dn: ou=People,
ou: People
objectClass: organizationalUnit
dn: ou=Group,
ou: Group
objectClass: organizationalUnit
dn: uid=testuser1,
uid: testuser1
objectClass: inetOrgPerson
objectClass: posixAccount
cn: testuser1
sn: testuser1
givenName: testuser1
mail: <email address hidden>
userPassword: testuser1secret
uidNumber: 10001
gidNumber: 10001
loginShell: /bin/bash
homeDirectory: /home/testuser1
dn: cn=testuser1,
cn: testuser1
objectClass: posixGroup
gidNumber: 10001
memberUid: testuser1
dn: cn=ldapusers,
cn: ldapusers
objectClass: posixGroup
gidNumber: 10100
memberUid: testuser1
EOF
* Create /etc/sssd/sssd.conf with the following contents:
[sssd]
services = nss
domains = local,example
[nss]
debug_level = 6
memcache_timeout = 30
[domain/local]
id_provider = local
enumerate = true
max_id = 1000
[domain/example]
id_provider = ldap
enumerate = true
auth_provider = ldap
ldap_uri = ldap://localhost
ldap_search_base = dc=example,dc=com
ldap_tls_reqcert = allow
cache_credentials = true
use_fully_
* Adjust permissions and restart:
sudo chmod 0600 /etc/sssd/sssd.conf
sudo systemctl restart sssd
* Test:
id testuser1
Should return:
uid=10001(
* Create a home directory:
sudo mkdir /home/testuser1 -m 0700
sudo chown testuser1:testuser1 /home/testuser1
* Become testuser1 and run this script. Depending on how long ago was the sssd restart above, it should fail soon, at most in 40s:
sudo -u testuser1 -i
while /bin/true; do date; whoami || break; echo; sleep 10; done
Wed Jan 16 19:12:02 UTC 2019
testuser1
...
Wed Jan 16 19:12:22 UTC 2019
whoami: cannot find name for user ID 10001: Unknown error 1432158300
With the fixed packages installed, that while loop won't be exited.
[Regression Potential]
sssd can be complicated to setup and test, not because of itself, but because of the additional services that need to be setup (ldap server, kerberos server, etc). I believe the inclusion of the current DEP8 tests together with this SRU helps detect regressions due to this update, and future updates after this one.
The real fix in this SRU is a one liner, merely the treatment of the max_id return code, which wasn't being handled before and meant the lookup would stop too early. This exists in cosmic and disco already, and no regressions have been spotted.
[Other Info]
The real fix is a one liner. If the SRU team prefers, I can change the patch to do just that, in the spirit of minimal changes necessary.
[Original Description]
I configured sssd on an Ubuntu 16.04 LTS system, and it worked just fine. In fact, using the same sssd.conf file (which is managed by puppet) on un-upgraded system continues to work fine.
However, after upgrading to 18.04.1 LTS, I find that the system is continuously forgetting who I am. After a few commands, or a few minutes (I'm not sure exactly how many, but around 3-5 minutes) if I try to run sudo or whoami, it says that I am an unknown user. for example,
```
whoami
whoami: cannot find name for user ID 2000: Unknown error 1432158300
```
if I run the id command on my username, it returns the correct results, and whoami/sudo/other restricted commands will work again for a short time before forgetting who I am again.
In the sssd_nss.log file, I see the lookup against the @local domain, but I do not see a related lookup in the ldap domain either in that log file or in the log file specific to the ldap domain.
ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: sssd 1.16.1-1ubuntu1
ProcVersionSign
Uname: Linux 4.15.0-42-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
Date: Thu Dec 6 12:30:43 2018
Ec2AMI: ami-ea677d80
Ec2AMIManifest: (unknown)
Ec2Availability
Ec2InstanceType: t2.small
Ec2Kernel: unavailable
Ec2Ramdisk: unavailable
SourcePackage: sssd
UpgradeStatus: Upgraded to bionic on 2018-10-04 (63 days ago)
Related branches
- Robie Basak: Approve
- Canonical Server: Pending requested
-
Diff: 617 lines (+561/-0)9 files modifieddebian/changelog (+11/-0)
debian/patches/fix-id-out-of-range-lookup.patch (+117/-0)
debian/patches/series (+1/-0)
debian/tests/common-tests (+28/-0)
debian/tests/control (+7/-0)
debian/tests/ldap-user-group-krb5-auth (+35/-0)
debian/tests/ldap-user-group-ldap-auth (+29/-0)
debian/tests/login.exp (+74/-0)
debian/tests/util (+259/-0)
Changed in sssd (Ubuntu): | |
status: | Incomplete → Triaged |
tags: | added: server-next |
Changed in sssd (Ubuntu): | |
assignee: | nobody → Andreas Hasenack (ahasenack) |
importance: | Undecided → Medium |
status: | Triaged → In Progress |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
Hi the only thing that comes to my mind would be the default values of the enumeration cache timeouts. those are in the 2-5 minute range.
Check [1] for enum_cache_timeout and related entries.
Maybe create a script that does "while true; sleep 10s; date; check UID; done"
Then you can check how long it takes to forget in your case.
Bump all kind of these timeouts and repeat.
If it helps take them back one by one until you have found which timeout it is in your case.
Then we would at least already know which sub-cache it is that forgets your user.
You could also play with the "enumerate" option in general.
What have you set atm, how does it behave when you switch it to the other value .
Something like [3] could be related to that.
Also could you check your logs if it could be anything like [2] as it reads very similar.
I also asked a friend actually knowing sssd better than I do, maybe he has some hints later on.
[1]: http:// manpages. ubuntu. com/manpages/ bionic/ man5/sssd. conf.5. html /www.linuxquest ions.org/ questions/ linux-server- 73/sssd- forgets- group-name- 4175577727/ /bugzilla. redhat. com/show_ bug.cgi? id=1359208
[2]: https:/
[3]: https:/