Bootstrap ansible fails in creation of barbican credentials on replay

Bug #1859726 reported by Ghada Khalil
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Andy

Bug Description

Brief Description
-----------------
** Opening this LP on behalf of Eddy Raineri
The ansible script intermittently fails during the barbican credential creation on ansible re-play. Even in the re-play case, it is a random failure depending on the order of the endpoints in the service catalog.

During the initial play, the Barbican secret was created prior to populating the initial system configuration and the SystemController endpoints were not populated. The un-configured region name in the Barbican config file does not cause a problem.

Sysinv applies the keystone endpoint runtime manifest when populating the initial system config, which creates the SystemController keystone endpoints. After the first play, the endpoints are populated and re-play does not re-configure the endpoints since the distributed_cloud_role is not changed.

Severity
--------
Minor - intermittent issue

Steps to Reproduce
------------------
On DC, run ansible re-play after the initial bootstrap

Expected Behavior
------------------
Ansible re-play is successful on DC system

Actual Behavior
----------------
Ansible re-play intermittently fails

Reproducibility
---------------
Intermittent; frequency is unknown

System Configuration
--------------------
Distributed Cloud

Branch/Pull Time/Commit
-----------------------
Designer load equivalent to stx.3.0

Last Pass
---------
N/A

Timestamp/Logs
--------------
N/A

Test Activity
-------------
Lab setup

Workaround
----------

Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / medium priority - intermittent issue on ansible replay

tags: added: stx.4.0 stx.config stx.distcloud
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Andy (andy.wrs)
Andy (andy.wrs)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (master)

Fix proposed to branch: master
Review: https://review.opendev.org/703821

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)

Fix proposed to branch: master
Review: https://review.opendev.org/703822

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/703822
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=342bdacdf98864d7a4265be564a30f818e96ea87
Submitter: Zuul
Branch: master

commit 342bdacdf98864d7a4265be564a30f818e96ea87
Author: Andy Ning <email address hidden>
Date: Wed Jan 22 10:24:53 2020 -0500

    Restart Barbican after region_name is updated

    Restart Barbican after its region_name (among other parameters) is
    updated during bootstrap service endpoints reconfiguration to make it
    consistent with keystone service catalog.

    Change-Id: Id48078a4d1a429b1e6adc8d8fbcec7eac0264965
    Closes-Bug: 1859726
    Signed-off-by: Andy Ning <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/703821
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=387a20ab23b000b99692abab494c42bc6b6a76cb
Submitter: Zuul
Branch: master

commit 387a20ab23b000b99692abab494c42bc6b6a76cb
Author: Andy Ning <email address hidden>
Date: Wed Jan 22 09:11:09 2020 -0500

    Populate barbican region_name during bootstrap

    During DC System Controller deployment, the ansible script
    intermittently fails during the barbican credential creation
    on ansible re-play. Even in the re-play case, it is a random
    failure depending on the order of the endpoints in the service
    catalog.

    The reason for this to happen is that, during the initial play, the
    barbican secrets are created prior to initial system configuration
    population so endpoints for SystemController region are not created.
    Barbican will use the RegionOne keystone endpoint. But after initial
    play finished, endpoints for SystemController region are created thus
    there are two keystone endpoints (RegionOne and SystemController).

    With two region keystone endpoints during re-play, Barbican may pickup
    SystemController region keystone endpoint during credential creation.
    Yet the service for SystemController region (dcorch identity proxy) has
    not started, causing the credential creation to fail.

    The fix is to explicitly configure Barbican region_name to RegionOne
    during bootstrap so re-play will use RegionOne keystone endpoint. Then
    update Barbican region_name after service endpoints reconfiguration to
    make region_name consistent with keystone service catalog, so requests
    to Barbican will always succeed.

    Change-Id: I7afda2806aad6437f746ca8ff39adee2d29571cf
    Closes-Bug: 1859726
    Signed-off-by: Andy Ning <email address hidden>

Revision history for this message
Yang Liu (yliu12) wrote :

Yosief Gebremariam has tested this 3 times with the fix and did not encounter this issue.
Build used: 01-24 and 01-28 loads.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/705831

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/705837

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (f/centos8)
Download full text (8.2 KiB)

Reviewed: https://review.opendev.org/705831
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=6670caf7ceda5fe0dc46f2f82033b68abf00ed5e
Submitter: Zuul
Branch: f/centos8

commit bf8d081a95a9b1776964960a6d9089b1449f2c58
Author: Angie Wang <email address hidden>
Date: Thu Jan 30 17:57:05 2020 -0500

    Support k8s networking upgrade based on k8s version

    Update to support a set of k8s networking templates
    based on kubernetes release. The kubernetes version
    needs to be passed to the ansible playbook
    k8s-networking-upgrade.yml to determine which set
    of networking manifests should be applied for the
    current kubernetes.

    Story: 2006781
    Task: 37584
    Change-Id: I3a0b9f56608ddb1323b36f9ecedb8a5488c222c9
    Signed-off-by: Angie Wang <email address hidden>

commit 2b0cd43e5fa75628d8eff78be7045ba4fc82d980
Author: Jerry Sun <email address hidden>
Date: Thu Dec 19 13:22:50 2019 -0500

    Add Dex parameters to ansible bootstrap

    Add oidc_groups_claim as a new parameters for ansible
    config. We now have 2 valid configs: the previous 3 parameters
    for a microsoft azure authentication deployment, and the previous
    3 in addition to oidc_groups_claim for a dex authentication
    deployment.

    Story: 2006711
    Task: 37850
    Change-Id: I265d2f7872eb31e2b295eeff6a3165543673497c
    Depends-On: https://review.opendev.org/702798
    Signed-off-by: Jerry Sun <email address hidden>

commit 92ca122652733805b62fc16940861ca4e83e2bb1
Author: David Sullivan <email address hidden>
Date: Wed Jan 22 21:33:19 2020 -0500

    Install secondary controller nodes with kubeadm join

    Kubeadm init is no longer supported for installing secondary nodes in an
    HA kubernetes cluster. kubeadm join with the --controller-plane option
    should be used.

    Change-Id: I64aaf02b09053608c884149d73bc1a3f2b62d98a
    Partial-Bug: 1846829
    Depends-On: https://review.opendev.org/702797
    Signed-off-by: David Sullivan <email address hidden>

commit 393379bd7671aeec5e9852679a69bdc29577426a
Author: Angie Wang <email address hidden>
Date: Tue Jan 28 14:01:10 2020 -0500

    Fix the image download failure on IPv6 system

    "crictl pull" failed to pull images on IPv6 system with
    proxy setting since Containerd doesn't work with the
    NO_PROXY environment variable that has IPv6 addresses
    with square brackets. This commit updates to strip out
    the square brackets from NO_PROXY environment variable.

    Verified on both IPv4 and IPv6 labs.

    Change-Id: I70bd00439b2cc39d2b25dd62746994a524be4998
    Partial-Bug: 1859835
    Signed-off-by: Angie Wang <email address hidden>

commit 792ea357e2b6d2bd23b441aa1657e0dc46f7ef5d
Author: Jim Somerville <email address hidden>
Date: Mon Jan 27 16:08:48 2020 -0500

    Security: Add nospectre_v1 to the default setting

    Most of the v1 mitigation is baked into the kernel and not
    optional. The swapgs barriers are, however, optional.
    They have a negative performance impact so we disable them
    by using the nospectre_v1 kernel bootarg.

    C...

Read more...

tags: added: in-f-centos8
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (f/centos8)
Download full text (35.0 KiB)

Reviewed: https://review.opendev.org/705837
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=8ac6ec70cb8a787a274fd7227eb34d2b7bcd5f5b
Submitter: Zuul
Branch: f/centos8

commit 7995dd436954b92f1c4e3f760a7609af670c84c8
Author: Jessica Castelino <email address hidden>
Date: Mon Feb 3 12:07:26 2020 -0500

    Unit test cases for helm charts

    Test cases added for API endpoints used by:
     1. helm-override-delete
     2. helm-override-show
     3. helm-override-list
     4. helm-override-update
     5. helm-chart-attribute-modify

    Story: 2007082
    Task: 38012
    Change-Id: I86763496bb41084c006f2486702c3b15bde039d2
    Signed-off-by: Jessica Castelino <email address hidden>

commit 7e2fda010299f7305b630d6db97bbe1e169a38b1
Author: Angie Wang <email address hidden>
Date: Wed Jan 29 21:18:18 2020 -0500

    Finish kubernetes networking upgrade support

    The commit completes the RPC kube_upgrade_networking
    in sysinv-conductor to run ansible playbook
    upgrade-k8s-networking.yml to upgrade networking pods
    and also updates the networking upgrade function called
    as part of sysinv-conductor startup to provide a current
    kubernetes version when running the upgrade playbook.

    The second control plane upgrade can only be performed
    after the networking upgrade is done, fix the semantic
    check in sysinv api.

    Change-Id: I8dcf5a2baedfaefb0a7ca037eb47bf7cacd686f8
    Story: 2006781
    Task: 37584
    Depends-On: https://review.opendev.org/#/c/705310/
    Signed-off-by: Angie Wang <email address hidden>

commit 52c37a35d2cd62fa1cc1933765c76c1ba8616864
Author: Jerry Sun <email address hidden>
Date: Fri Jan 31 16:10:25 2020 -0500

    Add Unit Tests for Dex Sysinv Changes

    Add unit tests for the dex helm chart changes under the same story
    and task

    Story: 2006711
    Task: 37857

    Depends-On: https://review.opendev.org/#/c/705297/

    Change-Id: I3a0e1c490e56188adfbd614fd6ebb21bfdddf49e
    Signed-off-by: Jerry Sun <email address hidden>

commit 144587a6ac9fc48b9249be99abadd35dfa49e7a7
Author: Teresa Ho <email address hidden>
Date: Fri Jan 31 15:35:04 2020 -0500

    Tox tests for OIDC client helm overrides

    Added some tox tests for OIDC client helm overrides.

    Story: 2006711
    Task: 38481

    Change-Id: If4aeaf0010c7076d1d83bacd00d6fd0122d4ffad
    Signed-off-by: Teresa Ho <email address hidden>

commit 763ddeadd4e83af6cebf752d693ee3e7d3b005b1
Author: Thomas Gao <email address hidden>
Date: Wed Jan 29 16:30:40 2020 -0500

    Fixed errors in address deletion

    Allowed address deletion despite missing associated interface or host.

    Enabled relevant unit test.

    Closes-Bug: 1860186

    Change-Id: Ie6e6358aa75091e92914a8b581b4d5203a596f56
    Signed-off-by: Thomas Gao <email address hidden>

commit 61463608169e75601b8a4f9db7c98190788d6f6a
Author: Thomas Gao <email address hidden>
Date: Tue Jan 28 15:32:58 2020 -0500

    Fixed broken sysinv address get-all api call

    Removed unexpected keyword argument that caused the error....

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.