libvirt related CA files not created in time by certmonger

Bug #1927201 reported by Martin Schuppert
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Undecided
Martin Schuppert

Bug Description

tls-everywhere deployment fails with:

fatal: [overcloud-controller-2]: FAILED! => {"ansible_job_id": "2169753008.25540", "attempts": 43, "changed": true, "cmd": "set -o pipefail; puppet apply --modulepath=/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules --detailed-exitcodes --summarize --color=false /var/lib/tripleo-config/puppet_step_config.pp 2>&1 | logger -s -t puppet-user", "delta": "0:02:17.047808", "end": "2021-05-01 16:49:43.656355", "failed_when_result": true, "finished": 1, "msg": "non-zero return code", "rc": 6, "start": "2021-05-01 16:47:26.608547", "stderr": "<13>May 1 16:47:26 puppet-user: Warning: The function 'hiera' is deprecated in favor of using 'lookup'. See https://puppet.com/docs/puppet/5.5/deprecated_language.html\\n (file & line not available)\n<13>May 1 16:47:32 puppet-user: Warning: /etc/puppet/hiera.yaml: Use of 'hiera.yaml' version 3 is deprecated. It should be converted to version 5\n<13>May 1 16:47:32 puppet-user: (file: /etc/puppet/hiera.yaml)\n<13>May 1 16:47:32 puppet-user: Warning: Undefined variable '::deploy_config_name'; \\n (file & line not available)\n<13>May 1 16:47:32 puppet-user: Warning: ModuleLoader: module 'tripleo' has unresolved dependencies - it will only see those that are resolved. Use 'puppet module list --tree' to see information about modules\\n (file & line not available)\n<13>May 1 16:47:32 puppet-user: Warning: Undefined variable '::nova::params::vncproxy_service_name'; class nova::params has not been evaluated\\n (file & line not available)\n<13>May 1 16:47:32 puppet-user: Warning: ModuleLoader: module 'nova' has unresolved dependencies - it will only see those that are resolved.

1. The error is that the CA cert is not being written by certmonger in time.
2. We already have a patch to verify/wait for the cacert in https://github.com/openstack/puppet-tripleo/commit/2c241e393481d73161b8534bbeba388731112cc7
    , and in fact we can see that the puppet call to test for the existence of the file fails after 1 minute
    -- ie. 60 attempts one second apart.

<13>May 1 16:47:39 puppet-user: Notice: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-server-cert]/Certmonger_certificate[libvirt-vnc-server-cert]/ensure: created
<13>May 1 16:48:40 puppet-user: Error: 'test -f /etc/pki/CA/certs/vnc.crt' returned 1 instead of one of [0]
<13>May 1 16:48:40 puppet-user: Error: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-server-cert]/Exec[/etc/pki/CA/certs/vnc.crt]/returns: change from 'notrun' to ['0'] failed: 'test -f /etc/pki/CA/certs/vnc.crt' returned 1 instead of one of [0]

3. The certmonger request is correct and has all the right values.

The problem here is that certmonger doesn't behave in the way that we expect it to do. When we make the cert request and ask for the ca cert to be retrieved, it issues the cert and schedules the cert to be returned asynchronously, even if you specify -w to wait for the cert. -w will block pending the cert being retrieved, but not for the CA cert.

You can always force the retrieval to happen by restarting certmonger, and this has helped in some cases in the past, but is a less than ideal solution. This is a bug in certmonger IMHO, in that we should expect the CA cert to be returned synchronously along with the cert if we specify -w.

The behavior for certmonger is unlikely to be fixed anytime soon though, so we need to look at other options.

For now we can use the workaround of setting the template parameters to use etc/ipa/ca.crt as the CA cert:

    LibvirtVncCACert: '/etc/ipa/ca.crt'
    LibvirtNbdCACert: '/etc/ipa/ca.crt'
    QemuCACert: '/etc/ipa/ca.crt'

Revision history for this message
Martin Schuppert (mschuppert) wrote :

In master we already default those parameters to '/etc/ipa/ca.crt'

Changed in tripleo:
assignee: nobody → Martin Schuppert (mschuppert)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/793940

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/793940
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/d54d63285db71cdca4da943094b219bc560286ab
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit d54d63285db71cdca4da943094b219bc560286ab
Author: Martin Schuppert <email address hidden>
Date: Tue Jun 1 12:14:13 2021 +0200

    [victoria/ussuri/train] Change nbd, vnc and qemu default cacert file

    InternalTLSNbdCAFile, InternalTLSVncCAFile and InternalTLSQemuCAFile
    do not point to the default IPA ca.crt file and instead are requested
    to be loaded to component specific CA files (even if they are the same).
    This can lead to a race where the CA cert is not being written by
    certmonger in time and the following issue is seen after the 60s timeout:

    May 1 16:47:39 puppet-user: Notice: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-server-cert]/Certmonger_certificate[libvirt-vnc-server-cert]/ensure: created
    May 1 16:48:40 puppet-user: Error: 'test -f /etc/pki/CA/certs/vnc.crt' returned 1 instead of one of [0]
    May 1 16:48:40 puppet-user: Error: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-server-cert]/Exec[/etc/pki/CA/certs/vnc.crt]/returns: change from 'notrun' to ['0'] failed: 'test -f /etc/pki/CA/certs/vnc.crt' returned 1 instead of one of [0]

    The problem here is that certmonger doesn't behave in the way that we
    expect it to do. When we make the cert request and ask for the ca cert to
    be retrieved, it issues the cert and schedules the cert to be returned
    asynchronously, even if you specify -w to wait for the cert. -w will block
    pending the cert being retrieved, but not for the CA cert.

    You can always force the retrieval to happen by restarting certmonger, and
    this has helped in some cases in the past, but is a less than ideal
    solution.

    This is a bug in certmonger IMHO, in that we should expect the CA cert to
    be returned synchronously along with the cert if we specify -w.

    The BZ for certmonger is unlikely to be fixed anytime soon though, so we
    need to look at other options.

    Ib868465c20d97c62cbcb214bfc62d949bd6efc62 already changed the default to
    use the IPA system cacert file '/etc/ipa/ca.crt' per default starting with
    the wallaby release using the ansible role. This change backports to also
    use the IPA system cacert file '/etc/ipa/ca.crt' to previous release when
    managing the certs via puppet-tripleo.

    Change-Id: I8a00ab81c16b21c9b1f703015a2a2eaa66fd556f
    Closes-Bug: #1927201

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/ussuri)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/train)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/796651
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/58e6913751c88595d997c99cb6d218f07939c7c6
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 58e6913751c88595d997c99cb6d218f07939c7c6
Author: Martin Schuppert <email address hidden>
Date: Tue Jun 1 12:14:13 2021 +0200

    [victoria/ussuri/train] Change nbd, vnc and qemu default cacert file

    InternalTLSNbdCAFile, InternalTLSVncCAFile and InternalTLSQemuCAFile
    do not point to the default IPA ca.crt file and instead are requested
    to be loaded to component specific CA files (even if they are the same).
    This can lead to a race where the CA cert is not being written by
    certmonger in time and the following issue is seen after the 60s timeout:

    May 1 16:47:39 puppet-user: Notice: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-server-cert]/Certmonger_certificate[libvirt-vnc-server-cert]/ensure: created
    May 1 16:48:40 puppet-user: Error: 'test -f /etc/pki/CA/certs/vnc.crt' returned 1 instead of one of [0]
    May 1 16:48:40 puppet-user: Error: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-server-cert]/Exec[/etc/pki/CA/certs/vnc.crt]/returns: change from 'notrun' to ['0'] failed: 'test -f /etc/pki/CA/certs/vnc.crt' returned 1 instead of one of [0]

    The problem here is that certmonger doesn't behave in the way that we
    expect it to do. When we make the cert request and ask for the ca cert to
    be retrieved, it issues the cert and schedules the cert to be returned
    asynchronously, even if you specify -w to wait for the cert. -w will block
    pending the cert being retrieved, but not for the CA cert.

    You can always force the retrieval to happen by restarting certmonger, and
    this has helped in some cases in the past, but is a less than ideal
    solution.

    This is a bug in certmonger IMHO, in that we should expect the CA cert to
    be returned synchronously along with the cert if we specify -w.

    The BZ for certmonger is unlikely to be fixed anytime soon though, so we
    need to look at other options.

    Ib868465c20d97c62cbcb214bfc62d949bd6efc62 already changed the default to
    use the IPA system cacert file '/etc/ipa/ca.crt' per default starting with
    the wallaby release using the ansible role. This change backports to also
    use the IPA system cacert file '/etc/ipa/ca.crt' to previous release when
    managing the certs via puppet-tripleo.

    Conflicts:
    deployment/nova/nova-vnc-proxy-container-puppet.yaml

    Change-Id: I8a00ab81c16b21c9b1f703015a2a2eaa66fd556f
    Closes-Bug: #1927201
    (cherry picked from commit d54d63285db71cdca4da943094b219bc560286ab)

tags: added: in-stable-ussuri
tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/796673
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/a43c41deab78bc9a0252e9e0548b7b911236c91c
Submitter: "Zuul (22348)"
Branch: stable/train

commit a43c41deab78bc9a0252e9e0548b7b911236c91c
Author: Martin Schuppert <email address hidden>
Date: Tue Jun 1 12:14:13 2021 +0200

    [victoria/ussuri/train] Change nbd, vnc and qemu default cacert file

    InternalTLSNbdCAFile, InternalTLSVncCAFile and InternalTLSQemuCAFile
    do not point to the default IPA ca.crt file and instead are requested
    to be loaded to component specific CA files (even if they are the same).
    This can lead to a race where the CA cert is not being written by
    certmonger in time and the following issue is seen after the 60s timeout:

    May 1 16:47:39 puppet-user: Notice: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-server-cert]/Certmonger_certificate[libvirt-vnc-server-cert]/ensure: created
    May 1 16:48:40 puppet-user: Error: 'test -f /etc/pki/CA/certs/vnc.crt' returned 1 instead of one of [0]
    May 1 16:48:40 puppet-user: Error: /Stage[main]/Tripleo::Profile::Base::Certmonger_user/Tripleo::Certmonger::Libvirt_vnc[libvirt-vnc-server-cert]/Exec[/etc/pki/CA/certs/vnc.crt]/returns: change from 'notrun' to ['0'] failed: 'test -f /etc/pki/CA/certs/vnc.crt' returned 1 instead of one of [0]

    The problem here is that certmonger doesn't behave in the way that we
    expect it to do. When we make the cert request and ask for the ca cert to
    be retrieved, it issues the cert and schedules the cert to be returned
    asynchronously, even if you specify -w to wait for the cert. -w will block
    pending the cert being retrieved, but not for the CA cert.

    You can always force the retrieval to happen by restarting certmonger, and
    this has helped in some cases in the past, but is a less than ideal
    solution.

    This is a bug in certmonger IMHO, in that we should expect the CA cert to
    be returned synchronously along with the cert if we specify -w.

    The BZ for certmonger is unlikely to be fixed anytime soon though, so we
    need to look at other options.

    Ib868465c20d97c62cbcb214bfc62d949bd6efc62 already changed the default to
    use the IPA system cacert file '/etc/ipa/ca.crt' per default starting with
    the wallaby release using the ansible role. This change backports to also
    use the IPA system cacert file '/etc/ipa/ca.crt' to previous release when
    managing the certs via puppet-tripleo.

    Conflicts:
    deployment/nova/nova-vnc-proxy-container-puppet.yaml

    Change-Id: I8a00ab81c16b21c9b1f703015a2a2eaa66fd556f
    Closes-Bug: #1927201
    (cherry picked from commit d54d63285db71cdca4da943094b219bc560286ab)
    (cherry picked from commit 58e6913751c88595d997c99cb6d218f07939c7c6)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 13.4.0

This issue was fixed in the openstack/tripleo-heat-templates 13.4.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 12.4.5

This issue was fixed in the openstack/tripleo-heat-templates 12.4.5 release.

Changed in tripleo:
status: New → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates train-eol

This issue was fixed in the openstack/tripleo-heat-templates train-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.