Comment 14 for bug 1921494

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Hello there,

Matthew and I spent a non-trivial amount of time trying to reproduce this bug. Let me tell you what we did, what we found, and then maybe you can help us progress.

First of all, it is really important that we find a reproducer so that we can: (1) figure out what exactly is going on, (2) possibly find a patch or write a fix for the problem, and (3) drive the SRU process to its completion and get the fix release for all affected Ubuntu versions.

Having said that, here is what I did (and, to the best of my knowledge, what Matthew also tried):

1) I installed Windows Server 2019 in a VM. I configured Active Directory, DHCP and DNS. I also configured the Certificate Authority in it.

Bear in mind that this is a test environment, so the configuration I did was basic. For AD, I chose to "Add a new forest" using "Windows Server 2016" as the functional level, and with DNS and Global Catalog capabilities actived. For the Certificate Authority, I chose to generate a root certificate (I don't remember exactly the other options, but the root certificate is important).

2) I created an LXD container running Ubuntu Focal that shared the same network bridge as the Windows Server 2019 VM. Promptly, the container acquired an IP from the DHCP I had configured in the Windows VM.

3) I joined the AD realm using "realm join win-ad-example.adtest.local". sssd and other dependencies were automatically installed, and the process finished successfully.

4) I then went to the Windows VM, opened certsrv, found the certificate for the machine and exported it. I copied the certificate into the LXD container, put it into the right place (/usr/local/share/ca-certificates/) and ran update-ca-certificates. I noticed that the command added 1 certificate to the chain.

5) I edited /etc/sssd/sssd.conf and added the following options to the domain section:

ldap_tls_cacert = /etc/ssl/certs/ca-certificates.crt
ad_use_ldaps = True
debug_level = 4

6) I then restarted sssd. And I noticed the error manifesting! At that point, I thought I had reproduced the bug (keep reading, though). I started to investigate what could be happening.

7) After several minutes thinking I was on the right track, I decided to try and run the "ldapsearch" command provided above:

$ ldapsearch -x -Z -v -H ldaps://win-ad-example.adtest.local:636

And to my surprise, I noticed that the command could not connect to the Windows server. I then started debugging things, and quickly found that the problem was that the TLS certificate from the server could not be validated. Something was wrong...

8) I went back to the Windows VM and poked around certsrv. I noticed that I had exported the certificate for the machine, but not the root certificate. I decided to give it a try.

9) After having imported the root certificate into the LXD container, I restarted sssd and, much to my surprise, everything worked out of the box (using the sssd package from the archive). Back to square one...

As you can see, I have a working AD DC and I can successfully connect to it using a regular Focal container (I also tried with a Focal VM, with the same results). I spent a few more hours trying to tweak some things here and there to see if I could make the bug manifest, to no avail.

For this reason, I decided to come and ask for more information from you. It would be great if you could tell me if there's anything you can think of that might trigger this problem. Something related to the way your AD DC is configured, perhaps?

Here is the sssd.conf I'm using:

# cat /etc/sssd/sssd.conf
[sssd]
domains = adtest.local
config_file_version = 2
services = nss, pam

[domain/adtest.local]
default_shell = /bin/bash
krb5_store_password_if_offline = True
cache_credentials = True
krb5_realm = ADTEST.LOCAL
realmd_tags = manages-system joined-with-adcli
id_provider = ad
ldap_sasl_authid = SSSD-BUG1921494$
fallback_homedir = /home/%u@%d
ad_domain = adtest.local
use_fully_qualified_names = True
ldap_id_mapping = True
access_provider = ad
ldap_tls_cacert = /etc/ssl/certs/ca-certificates.crt
ad_use_ldaps = True
debug_level = 4
ldap_library_debug_level = -1

Note that I haven't configured Kerberos authentication in my example, but I don't that should matter much.

It would be great if you could do a few things:

a) As I mentioned above, let us know if there is any peculiarity in your AD DC configuration that might impact this.

b) If possible, set up an Ubuntu Impish system (which has just been released) and try to reproduce the problem there. Impish ships a newer version of sssd and also OpenLDAP, which might influence here.

c) Can you confirm whether you can always trigger this problem, or does this just happen sporadically?

d) Can you confirm whether you have imported the root certificate for your AD DC server into the client as well (assuming this applies to your scenario, of course)?

I think this is all I have for now. Matthew, feel free to complement the info from this comment and also to expand on the questions if you have any.

Thanks in advance.