Timing problems with FreeIPA installation

Bug #1627371 reported by Philippe Clérié
46
This bug affects 8 people
Affects Status Importance Assigned to Milestone
dogtag-pki (Ubuntu)
Fix Released
Undecided
Unassigned
freeipa (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

While installing FreeIPA I came accross two situations that turned out to be timing problems. In both cases, the installation procedure was attempting to access the certificate server immediately after a restart, and the server was not listening.

The first one is at step 10 of "Configuring certificate server (pki_tomcatd)":

  [10/28]: importing CA chain to RA certificate database
  [error] RuntimeError: Unable to retrieve CA chain: [Errno 111] Connection refused
ipa.ipapython.install.cli.install_tool(Server): ERROR Unable to retrieve CA chain: [Errno 111] Connection refused

The second is at step 25:

  [25/28]: migrating certificate profiles to LDAP
  [error] NetworkError: cannot connect to 'https://server.name:8443/ca/rest/account/login': Could not connect to server.name using any address: (PR_ADDRESS_NOT_SUPPORTED_ERROR) Network address type not supported.

My solution was to add a delay at the top of the functions for those steps.

def __import_ca_chain(self):
    + ##======================
    + # Add wait time to allow certificate server to start up
    + #
    + time.sleep(10)

    chain = self.__get_ca_chain()

...

def migrate_profiles_to_ldap():
    """Migrate profiles from filesystem to LDAP.

    This must be run *after* switching to the LDAPProfileSubsystem
    and restarting the CA.

    The profile might already exist, e.g. if a replica was already
    upgraded, so this case is ignored.

    """
    + ##======================
    + # Add wait time to allow certificate server to start up
    + #
    + time.sleep(20)

    ensure_ldap_profiles_container()

It might be necessary to adjust the sleep time.

These bugs are intermittent and they may not appear at all. In my case, one KVM machine had no problems whatsoever while another had problems at the "migrate profiles ..." step. Both problems showed up on one Raspberry Pi. There were also time differences between runs. So, one needs to be _very_ patient.

This is all on Ubuntu Xenial. freeipa-server 4.3.1-0ubuntu1.
The RaspberryPi is a pi 2B

Tags: patch
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in freeipa (Ubuntu):
status: New → Confirmed
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

this is likely caused by tomcat instances using initd rather than systemd

I'm not sure about adding timeouts to freeipa for this, because RPi doesn't have much RAM either.. the VM's that I've used for testing have all had at least 1.5GB

Revision history for this message
Lars Bahner (bahner) wrote :

I am working with this atm. There isn't much RAM being used. I typically see 600M used as cache. And only about 150M used for the services running. This is supposed to be a dedicated freeipa-server. I will try and run the installation again until it fails, and looks at the memory usage.

Revision history for this message
Lars Bahner (bahner) wrote :

root@ipa:/usr/lib/python2.7/dist-packages/ipaserver/install# free -m
              total used free shared buff/cache available
Mem: 920 282 223 7 414 597
Swap: 0 0 0

Every time the installation fails, adding a time.sleep seems to help, but of course it not pretty. This is the diff that got my system working at last. Taken from submitters patch, but tweaked to for work me.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Not pretty, but works. Incorp" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in dogtag-pki (Ubuntu):
status: New → Confirmed
Revision history for this message
Andrew Bork (anbork) wrote :

my armv7h OrangePi+ 2E with 2Gb of ram running armbian nightly mainline (upgraded to zesty) also experiences this problem, and the attached patch fixes it.

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

note that 389 upstream will be dropping all support for 32bit architectures soon.. so 18.04 won't have freeipa-server available for them

Revision history for this message
gianluca (amato) wrote :

I was also affected by this problem, which I solved by adding extra RAM and cores. However, in some cases, when working with virtual machines, this problem might be exacerbated by the fact that VMs are much slower in collecting entropy with respect to real hardware. The low entropy slows down /dev/random, which may cause a sensible delay for cryptographic software (https://developers.redhat.com/blog/2017/10/05/entropy-rhel-based-cloud-instances/).

In one of my old installations of FreeIPA, the kadmin server used to take several minutes to start. I solved this problem installing a daemon for generating entropy ("haveged", in my case). With haveged, the same daemon which used to take minutes, was starting in a fraction of a second.

Revision history for this message
Harry Coin (hcoin) wrote :
Download full text (9.5 KiB)

Same issue here. Adding haveged reduced the error count, but still failed with 2 processors and 2gb. 3 processors and 3gb failed with a network error

[24/28]: migrating certificate profiles to LDAP
  [error] NetworkError: cannot connect to 'https://directory1.ri.mamabosso.com:8443/ca/rest/account/logout': [Errno 104] Connection reset by peer
ipapython.admintool: ERROR cannot connect to 'https://XXXXXXXXXXXXXXXXXX.com:8443/ca/rest/account/logout': [Errno 104] Connection reset by peer
ipapython.admintool: ERROR The ipa-server-install command failed. See /var/log/ipaserver-install.log for more information

4gb and 4 processors: CPU usage 100$ after 10/28 was printed. (requesting RA cert..)

Usage at 100% through step 21 (restarting cert server), and.. 24 migrating cert profiles...(where it failed before)... 55% cpu usage.. 37%... 43%... 64%... 87%... 73%... and failed again:

  [24/28]: migrating certificate profiles to LDAP
  [error] NetworkError: cannot connect to 'https://directory1.ri.mamabosso.com:8443/ca/rest/account/logout': [Errno 111] Connection refused
ipapython.admintool: ERROR cannot connect to 'https://XXXXXXXXXXXXXXX.com:8443/ca/rest/account/logout': [Errno 111] Connection refused
ipapython.admintool: ERROR The ipa-server-install command failed. See /var/log/ipaserver-install.log for more information

from the log:

2018-06-21T15:49:02Z DEBUG request POST https://directory1.ri.mamabosso.com:8443/ca/rest/profiles/raw
2018-06-21T15:49:02Z DEBUG request body "desc=This certificate profile is for ...

2018-06-21T15:49:02Z DEBUG httplib request failed:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/ipapython/dogtag.py", line 220, in _httplib_request
    conn.request(method, uri, body=request_body, headers=headers)
  File "/usr/lib/python2.7/httplib.py", line 1042, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 1082, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 1038, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 882, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 844, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 1263, in connect
    server_hostname=server_hostname)
  File "/usr/lib/python2.7/ssl.py", line 369, in wrap_socket
    _context=self)
  File "/usr/lib/python2.7/ssl.py", line 617, in __init__
    self.do_handshake()
  File "/usr/lib/python2.7/ssl.py", line 846, in do_handshake
    self._sslobj.do_handshake()
error: [Errno 104] Connection reset by peer
2018-06-21T15:49:02Z DEBUG request GET https://directory1.ri.mamabosso.com:8443/ca/rest/account/logout
2018-06-21T15:49:02Z DEBUG request body ''
2018-06-21T15:49:02Z DEBUG httplib request failed:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/ipapython/dogtag.py", line 220, in _httplib_request
    conn.request(method, uri, body=request_body, headers=headers)
  File "/usr/lib/python2.7/httplib.py", line 1042, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httpl...

Read more...

Revision history for this message
Harry Coin (hcoin) wrote :

Added:
apt install python-psutil

in cainstance.py after import tempfile added
import psutil

In function "migrate_profiles_to_ldap"

before
for profile_id in profile_ids:
changed time.sleep(80) to time.sleep(30)
then added just after, inside the loop:

while psutil.cpu_percent() > 5: time.sleep(2)

and... success.

Revision history for this message
Harry Coin (hcoin) wrote :

Spoke too soon, though the routine reported success, in the log we have:

Updating DNS system records
ipapython.dnsutil: ERROR DNS query for directory1.ri.mamabosso.com. 1 failed: The DNS operation timed out after 30.0014941692 seconds
ipaserver.dns_data_management: ERROR unable to resolve host name directory1.ri.XXX.com. to IP address, ipa-ca DNS record will be incomplete
Configuring client side components
...
Adding SSH public key from /etc/ssh/ssh_host_ed25519_key.pub
[try 1]: Forwarding 'host_mod' to json server 'https://directory1.ri.XXX.com/ipa/session/json'
Could not update DNS SSHFP records.

and then, what is in fact an error though the text is otherwise:

The ipa-client-install command was successful.

So, in bindinstance.py, after import time, added
import psutil
and just before
system_records = IPASystemRecords(self.api)
added
while psutil.cpu_percent() > 5: time.sleep(2)

and .. that didn't work. Same error.

Done configuring DNS (named).
Restarting the web server to pick up resolv.conf changes
Configuring DNS key synchronization service (ipa-dnskeysyncd)
  [1/7]: checking status
  [2/7]: setting up bind-dyndb-ldap working directory
  [3/7]: setting up kerberos principal
  [4/7]: setting up SoftHSM
  [5/7]: adding DNSSEC containers
  [6/7]: creating replica keys
  [7/7]: configuring ipa-dnskeysyncd to start on boot
Done configuring DNS key synchronization service (ipa-dnskeysyncd).
Restarting ipa-dnskeysyncd
Restarting named
Updating DNS system records
ipapython.dnsutil: ERROR DNS query for directory1.ri.xxxx.com. 1 failed: The DNS operation timed out after 30.000576973 seconds
ipaserver.dns_data_management: ERROR unable to resolve host name directory1.ri.xxx.com. to IP address, ipa-ca DNS record will be incomplete
Configuring client side components
Using existing certificate '/etc/ipa/ca.crt'.
Client hostname: directory1.ri.xxx.com
Realm: RI.XXXX.COM
DNS Domain: ri.xxxx.com
IPA Server: directory1.ri.xxxx.com
BaseDN: dc=ri,dc=xxxxxxx,dc=com

Skipping attempt to configure and synchronize time with chrony server as it has been already done on master.
New SSSD config will be created
Configured sudoers in /etc/nsswitch.conf
Configured /etc/sssd/sssd.conf
trying https://directory1.ri.xxx.com/ipa/json
[try 1]: Forwarding 'ping' to json server 'https://directory1.ri.xxxx.com/ipa/json'
[try 1]: Forwarding 'ca_is_enabled' to json server 'https://directory1.ri.xxxx.com/ipa/json'
Systemwide CA database updated.
Adding SSH public key from /etc/ssh/ssh_host_ecdsa_key.pub
Adding SSH public key from /etc/ssh/ssh_host_rsa_key.pub
Adding SSH public key from /etc/ssh/ssh_host_ed25519_key.pub
[try 1]: Forwarding 'host_mod' to json server 'https://directory1.ri.xxxx.com/ipa/json'
Could not update DNS SSHFP records.
SSSD enabled

Revision history for this message
Harry Coin (hcoin) wrote :

At appears my problem arises as I asked for an install with DNS. On ubuntu bionic, apt install freeipa-server-dns

ipa-server-install

then the bug appears. It is discussed here.
https://bugs.launchpad.net/ubuntu/+source/freeipa/+bug/1769440

Revision history for this message
Harry Coin (hcoin) wrote :

At https://bugs.launchpad.net/ubuntu/+source/freeipa/+bug/1769440/comments/34

You will find a recipe to work around all bugs stopping the installation of freeipa with integrated DNS on 18.04/bionic as of this date. Hopefully folks who like to fix things can work these into the install scripts and programs.

At the heart of the timing problems (there are many others) is the install script's habit of (re) starting a service then nearly immediately placing calls to it (which fail as the service isn't up yet). 4 cores and 4GB of memory was enough to avoid all but two of the problems. The above note includes four lines of code in two files that inject waits until the system load is under 9% before processing further after starting a service.

HTH

Revision history for this message
Brad Johnson (bradjohnsonkink) wrote :

I was able to get this to install finally with the workaround provided above of installing the haveged package and increasing my core count to 4.

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

I'm just going to assume things are all fixed with 19.10 and up, which have freeipa 4.8.x and dogtag 10.7.3, and they both depend on systemd features now which should resolve all race conditions

Changed in freeipa (Ubuntu):
status: Confirmed → Fix Released
Changed in dogtag-pki (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Harry Coin (hcoin) wrote :

Timo,

You might take a look at /etc/sssd/sssd.conf

Consider changing

services =  ifp
#services = nss, pam, ifp, ssh, sudo

The reason is that unless you change this, systemctl is-system-running
reports degraded instead of running, with messages akin to

Dec  9 17:59:25 registry1 sssd_check_socket_activated_responders[1672]:
(Mon Dec  9 17:59:25:697972 2019) [sssd] [main] (0x0010):
Misconfiguration found for the pam responder.
Dec  9 17:59:25 registry1 sssd_check_socket_activated_responders[1672]:
The pam responder has been configured to be socket-activated but it's
still mentioned in the services' line in /etc/sssd/sssd.conf.
Dec  9 17:59:25 registry1 sssd_check_socket_activated_responders[1672]:
Please, consider either adjusting your services' line in
/etc/sssd/sssd.conf or disabling the pam's socket by calling:
Dec  9 17:59:25 registry1 sssd_check_socket_activated_responders[1672]:
"systemctl disable sssd-pam.socket"

On 11/28/19 8:17 AM, Timo Aaltonen wrote:
> I'm just going to assume things are all fixed with 19.10 and up, which
> have freeipa 4.8.x and dogtag 10.7.3, and they both depend on systemd
> features now which should resolve all race conditions
>
> ** Changed in: freeipa (Ubuntu)
> Status: Confirmed => Fix Released
>
> ** Changed in: dogtag-pki (Ubuntu)
> Status: Confirmed => Fix Released
>

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

why is this relevant here? open a new bug

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.