tls-everywhere deployment fails with:
fatal: [overcloud-controller-2]: FAILED! => {"ansible_job_id": "2169753008.25540", "attempts": 43, "changed": true, "cmd": "set -o pipefail; puppet apply --modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules --detailed-exitcodes --summarize --color=false /var/lib/tripleo-config/puppet_step_config.pp 2>&1 | logger -s -t puppet-user", "delta": "0:02:17.047808", "end": "2021-05-01 16:49:43.656355", "failed_when_result": true, "finished": 1, "msg": "non-zero return code", "rc": 6, "start": "2021-05-01 16:47:26.608547", "stderr": "<13>May 1 16:47:26 puppet-user: Warning: The function 'hiera' is deprecated in favor of using 'lookup'. See https://puppet.com/docs/puppet/5.5/deprecated_language.html\\n (file & line not available)\n<13>May 1 16:47:32 puppet-user: Warning: /etc/puppet/hiera.yaml: Use of 'hiera.yaml' version 3 is deprecated. It should be converted to version 5\n<13>May 1 16:47:32 puppet-user: (file: /etc/puppet/hiera.yaml)\n<13>May 1 16:47:32 puppet-user: Warning: Undefined variable '::deploy_config_name'; \\n (file & line not available)\n<13>May 1 16:47:32 puppet-user: Warning: ModuleLoader: module 'tripleo' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules\\n (file & line not available)\n<13>May 1 16:47:32 puppet-user: Warning: Undefined variable '::nova::params::vncproxy_service_name'; class nova::params has not been evaluated\\n (file & line not available)\n<13>May 1 16:47:32 puppet-user: Warning: ModuleLoader: module 'nova' has unresolved dependencies - it will only see those that are resolved.
1. The error is that the CA cert is not being written by certmonger in time.
2. We already have a patch to verify/wait for the cacert in https://github.com/openstack/puppet-tripleo/commit/2c241e393481d73161b8534bbeba388731112cc7
, and in fact we can see that the puppet call to test for the existence of the file fails after 1 minute
-- ie. 60 attempts one second apart.
<13>May 1 16:47:39 puppet-user: Notice: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-server-cert]/Certmonger_certificate[libvirt-vnc-server-cert]/ensure: created
<13>May 1 16:48:40 puppet-user: Error: 'test -f /etc/pki/CA/certs/vnc.crt' returned 1 instead of one of [0]
<13>May 1 16:48:40 puppet-user: Error: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-server-cert]/Exec[/etc/pki/CA/certs/vnc.crt]/returns: change from 'notrun' to ['0'] failed: 'test -f /etc/pki/CA/certs/vnc.crt' returned 1 instead of one of [0]
3. The certmonger request is correct and has all the right values.
The problem here is that certmonger doesn't behave in the way that we expect it to do. When we make the cert request and ask for the ca cert to be retrieved, it issues the cert and schedules the cert to be returned asynchronously, even if you specify -w to wait for the cert. -w will block pending the cert being retrieved, but not for the CA cert.
You can always force the retrieval to happen by restarting certmonger, and this has helped in some cases in the past, but is a less than ideal solution. This is a bug in certmonger IMHO, in that we should expect the CA cert to be returned synchronously along with the cert if we specify -w.
The behavior for certmonger is unlikely to be fixed anytime soon though, so we need to look at other options.
For now we can use the workaround of setting the template parameters to use etc/ipa/ca.crt as the CA cert:
LibvirtVncCACert: '/etc/ipa/ca.crt'
LibvirtNbdCACert: '/etc/ipa/ca.crt'
QemuCACert: '/etc/ipa/ca.crt'
In master we already default those parameters to '/etc/ipa/ca.crt'