ansible-playbook replay failed at "Timeout waiting for ssl_ca certificate install"

Bug #1868585 reported by Peng Peng
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Andy

Bug Description

Brief Description
-----------------
ansible-playbook bootstrap was successful during initial play.

Then a replay was performed and failed at certificate install during bootstrap.
"Timeout waiting for ssl_ca certificate install"

Severity
--------
Major

Steps to Reproduce
------------------
ansible bootstrap passed in initial run.
reran ansible-playbook

Expected Behavior
------------------
reran ansible-playbook pass

Actual Behavior
----------------
reran ansible-playbook failed

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Two node system

Lab-name: WCP_78-79

Branch/Pull Time/Commit
-----------------------
2020-03-22_16-04-38

Last Pass
---------
Load: 2020-03-20_04-10-00
Job: StarlingX_Upstream_build

Timestamp/Logs
--------------
First run,
[2020-03-23 03:00:02,337] 140 INFO MainThread telnet.send :: Send: ansible-playbook lab-install-playbook.yaml -e "@local-install-overrides.yaml"
[2020-03-23 03:17:58,626] 3821 ERROR MainThread install_helper.controller_system_config:: ansible-playbook lab-install-playbook.yaml -e "@local-install-overrides.yaml" execution failed: 2 tall-overrides.yaml"

Rerun,
[2020-03-23 13:33:01,089] 140 INFO MainThread telnet.send :: Send: ansible-playbook lab-install-playbook.yaml -e "@local-install-overrides.yaml"
[2020-03-23 13:40:16,480] 3821 ERROR MainThread install_helper.controller_system_config:: ansible-playbook lab-install-playbook.yaml -e "@local-install-overrides.yaml" execution failed: 2 tall-overrides.yaml"

TASK [bootstrap/persist-config : Wait for certificate install] *****************
E fatal: [localhost]: FAILED! => {"changed": false, "elapsed": 360, "msg": "Timeout waiting for ssl_ca certificate install"}

Test Activity
-------------

Peng Peng (ppeng)
tags: added: stx.retestneeded
Revision history for this message
Peng Peng (ppeng) wrote :
Peng Peng (ppeng)
description: updated
Yang Liu (yliu12)
description: updated
description: updated
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 - issue with ansible replay; needs further investigation

tags: added: stx.4.0 stx.config
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
importance: High → Medium
assignee: nobody → Andy (andy.wrs)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Unsure if this is related to the recent changes to support multiple CA certificates. Assigning to Andy Ning to investigate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/714531

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/714531
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=d119336b3a3b24d924e000277a37ab0b5f93aae1
Submitter: Zuul
Branch: master

commit d119336b3a3b24d924e000277a37ab0b5f93aae1
Author: Andy Ning <email address hidden>
Date: Mon Mar 23 16:26:21 2020 -0400

    Fix timeout waiting for CA cert install during ansible replay

    During ansible bootstrap replay, the ssl_ca_complete_flag file is
    removed. It expects puppet platform::config::runtime manifest apply
    during system CA certificate install to re-generate it. So this commit
    updated conductor manager to run that puppet manifest even if the CA cert
    has already installed so that the ssl_ca_complete_flag file is created
    and makes ansible replay to continue.

    Change-Id: Ic9051fba9afe5d5a189e2be8c8c2960bdb0d20a4
    Closes-Bug: 1868585
    Signed-off-by: Andy Ning <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Yosief Gebremariam (ygebrema) wrote :

verification passed

Ghada Khalil (gkhalil)
tags: removed: stx.retestneeded
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/716137

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (f/centos8)
Download full text (32.3 KiB)

Reviewed: https://review.opendev.org/716137
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=cb4cf4299c2ec10fb2eb03cdee3f6d78a6413089
Submitter: Zuul
Branch: f/centos8

commit 16477935845e1c27b4c9d31743e359b0aa94a948
Author: Steven Webster <email address hidden>
Date: Sat Mar 28 17:19:30 2020 -0400

    Fix SR-IOV runtime manifest apply

    When an SR-IOV interface is configured, the platform's
    network runtime manifest is applied in order to apply the virtual
    function (VF) config and restart the interface. This results in
    sysinv being able to determine and populate the puppet hieradata
    with the virtual function PCI addresses.

    A side effect of the network manifest apply is that potentially
    all platform interfaces may be brought down/up if it is determined
    that their configuration has changed. This will likely be the case
    for a system which configures SR-IOV interfaces before initial
    unlock.

    A few issues have been encountered because of this, with some
    services not behaving well when the interface they are communicating
    over suddenly goes down.

    This commit makes the SR-IOV VF configuration much more targeted
    so that only the operation of setting the desired number of VFs
    is performed.

    Closes-Bug: #1868584
    Depends-On: https://review.opendev.org/715669
    Change-Id: Ie162380d3732eb1b6e9c553362fe68cbc313ae2b
    Signed-off-by: Steven Webster <email address hidden>

commit 45c9fe2d3571574b9e0503af108fe7c1567007db
Author: Zhipeng Liu <email address hidden>
Date: Thu Mar 26 01:58:34 2020 +0800

    Add ipv6 support for novncproxy_base_url.

    For ipv6 address, we need url with below format
    [ip]:port

    Partial-Bug: 1859641

    Change-Id: I01a5cd92deb9e88c2d31bd1e16e5bce1e849fcc7
    Signed-off-by: Zhipeng Liu <email address hidden>

commit d119336b3a3b24d924e000277a37ab0b5f93aae1
Author: Andy Ning <email address hidden>
Date: Mon Mar 23 16:26:21 2020 -0400

    Fix timeout waiting for CA cert install during ansible replay

    During ansible bootstrap replay, the ssl_ca_complete_flag file is
    removed. It expects puppet platform::config::runtime manifest apply
    during system CA certificate install to re-generate it. So this commit
    updated conductor manager to run that puppet manifest even if the CA cert
    has already installed so that the ssl_ca_complete_flag file is created
    and makes ansible replay to continue.

    Change-Id: Ic9051fba9afe5d5a189e2be8c8c2960bdb0d20a4
    Closes-Bug: 1868585
    Signed-off-by: Andy Ning <email address hidden>

commit 24a533d800b2c57b84f1086593fe5f04f95fe906
Author: Zhipeng Liu <email address hidden>
Date: Fri Mar 20 23:10:31 2020 +0800

    Fix rabbitmq could not bind port to ipv6 address issue

    When we use Armada to deploy openstack service for ipv6, rabbitmq
    pod could not start listen on [::]:5672 and [::]:15672.
    For ipv6, we need an override for configuration file.

    Upstream patch link is:
    https://review.opendev.org/#/c/714027/

    Test pass for deploying rabbitmq service on both ipv...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.