platform armada apps should not set their replica to 0

Bug #1922278 reported by Isac Sacchi e Souza
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Isac Sacchi e Souza

Bug Description

Brief Description
-----------------
After lock/unlock, platform-integ-apps and oidc-auth-apps are failing to apply because their replicas is set to 0. This is an issue on systems where the admin endpoint certificate was renewed.

The cert-manager app already has a check that prevents this.

https://opendev.org/starlingx/cert-manager-armada-app/src/branch/master/python-k8sapp-cert-manager/k8sapp_cert_manager/k8sapp_cert_manager/helm/cert_manager.py#L37

The check should be added to the platform-armada-app calculations:
https://opendev.org/starlingx/platform-armada-app/src/branch/master/python-k8sapp-platform/k8sapp_platform/k8sapp_platform/helm/ceph_fs_provisioner.py#L173
https://opendev.org/starlingx/platform-armada-app/src/branch/master/python-k8sapp-platform/k8sapp_platform/k8sapp_platform/helm/rbd_provisioner.py#L86

and oidc-auth-apps calculations:
https://opendev.org/starlingx/oidc-auth-armada-app/src/branch/master/python-k8sapp-oidc/k8sapp_oidc/k8sapp_oidc/helm/dex.py#L59
https://opendev.org/starlingx/oidc-auth-armada-app/src/branch/master/python-k8sapp-oidc/k8sapp_oidc/k8sapp_oidc/helm/oidc_client.py#L39

Severity
--------
Major

Steps to Reproduce
------------------

On a AIO-SX subcloud where the admin cert has been renewed:

system host-lock controller-0
system host-label-remove controller-0 kube-ignore-isol-cpus
system host-cpu-modify -f application-isolated -p0 19 controller-0
system host-unlock controller-0

Expected Behavior
------------------
Armada apps should have replicas set to 1

Actual Behavior
----------------
Armada apps have replicas set to 0 and fail to re-apply

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
AIO-SX subcloud

Branch/Pull Time/Commit
-----------------------
stx 4.0

Last Pass
---------
N/A

Timestamp/Logs
--------------
Example of replicaset=0
--[sysadmin@controller-0 ~(keystone_admin)]$ kubectl describe deployments.apps -n kube-system oidc-dex
Name: oidc-dex
Namespace: kube-system
CreationTimestamp: Wed, 17 Mar 2021 00:38:45 +0000
Labels: app=dex
chart=dex-0.8.0
heritage=Tiller
release=oidc-dex
Annotations: deployment.kubernetes.io/revision: 1
Selector: app=dex,release=oidc-dex
Replicas: 0 desired | 0 updated | 0 total | 0 available | 0 unavailable

Test Activity
-------------
N/A

Workaround
----------
N/A

Changed in starlingx:
status: New → In Progress
assignee: nobody → Isac Sacchi e Souza (isouza)
Revision history for this message
Ghada Khalil (gkhalil) wrote :
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.5.0 stx.apps stx.containers
Revision history for this message
Ghada Khalil (gkhalil) wrote :

All commits merged as of 2021-04-09

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to platform-armada-app (f/centos8)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793460

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793696

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to platform-armada-app (f/centos8)
Download full text (11.0 KiB)

Reviewed: https://review.opendev.org/c/starlingx/platform-armada-app/+/792234
Committed: https://opendev.org/starlingx/platform-armada-app/commit/f3bb236173d8bb336513a868d36f601a23ab87dc
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 66fa48f04d926848a37f5fbb7689cf4b114cb3ba
Author: Pedro Henrique Linhares <email address hidden>
Date: Tue Apr 6 21:45:28 2021 -0300

    Update helm charts config maps after sx-dx migration with new CEPH monitors

    This commit adds annotations that allows config maps to be recreated
    after ceph monitor IP changes due to DX migration so that existing
    StorageClasses can get a reference to the correct monitor. StorageClasses
    and provisioners are recreated during platform-integ-apps auto re-apply.

    Story: 2008587
    Task: 42242

    Signed-off-by: Pedro Linhares <email address hidden>
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/783727
    Change-Id: I9cedc70326e92796f03520deed7f0857e119257f

commit 9d45149d29b8c05808bcd7e5a129f135b3386931
Author: Isac Souza <email address hidden>
Date: Mon Apr 5 11:07:21 2021 -0300

    Use new method for setting num of replicas

    Use the new _num_replicas_for_platform_app method from the
    helm base class to set the number of replicas in the chart.
    The new method will return the number of provisioned
    controllers with a minimum of 1.

    Tested by building an ISO and installing the armada apps.

    Partial-Bug: 1922278
    Signed-off-by: Isac Souza <email address hidden>
    Change-Id: Idb3c93274a1cb5c410d885d459784382525427a0

commit 1021d50142af6422c9a3f0853f4b7c525e724ab8
Author: Daniel Safta <email address hidden>
Date: Wed Mar 24 12:56:29 2021 +0000

    Removed extra serviceAccount from cephfs-provisioner

    cephfs-provisioner may need to create new resources
    in the kubernetes cluster. It was granted access to
    some of the resources including namespaces but when
    https://review.opendev.org/c/starlingx/platform-armada-app/+/778746
    got merged the serviceAccount was changed.

    I have updated the serviceAccount with access to
    creating new namespaces and secrets.

    The serviceAccount that was initially used to create
    namespaces and secrets is not needed anymore, so I
    have removed it.

    Closes-bug: 1921197
    Change-Id: I3c683776f3ecaf9c78d1a6b5b1108e9582497dde
    Signed-off-by: Daniel Safta <email address hidden>

commit 45fd0a6b2cd2dbaaf7ff21ef989824377c5b17dc
Author: Robert Church <email address hidden>
Date: Thu Mar 18 11:43:21 2021 -0400

    Build: Isolate platform plugins to an app specific directory

    When building the stx-platform-helm RPM for platform-integ-apps the helm
    plugins are installed in a location that could be populated with other
    app plugins if their spec files are not properly set up.

    Adjust the spec to provide an app specific location for the plugins to
    ensure that no other app plugins are included in the application tarball

    Closes-Bug: #1920066
    Change-Id: Id24227cd100a3c29809f1dd01f61ea717...

tags: added: in-f-centos8
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/794611

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/794906

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on config (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/794611

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (f/centos8)
Download full text (147.3 KiB)

Reviewed: https://review.opendev.org/c/starlingx/config/+/794906
Committed: https://opendev.org/starlingx/config/commit/75758b37a5a23c8811355b67e2a430a1713cd85b
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 9e420d9513e5fafb1df4d29567bc299a9e04d58d
Author: Bin Qian <email address hidden>
Date: Mon May 31 14:45:52 2021 -0400

    Add more logging to run docker login

    Add error log for running docker login. The new log could
    help identify docker login failure.

    Closes-Bug: 1930310
    Change-Id: I8a709fb6665de8301fbe3022563499a92b2a0211
    Signed-off-by: Bin Qian <email address hidden>

commit 31c77439d2cea590dfcca13cfa646522665f8686
Author: albailey <email address hidden>
Date: Fri May 28 13:42:42 2021 -0500

    Fix controller-0 downgrade failing to kill ceph

    kill_ceph_storage_monitor tried to manipulate a pmon
    file that does not exist in an AIO-DX environment.

    We no longer invoke kill_ceph_storage_monitor in an
    AIO SX or DX env.

    This allows: "system host-downgrade controller-0"
    to proceed in an AIO-DX environment where that second
    controller (controller-0) was upgraded.

    Partial-Bug: 1929884
    Signed-off-by: albailey <email address hidden>
    Change-Id: I633853f75317736084feae96b5b849c601204c13

commit 0dc99eee608336fe01b58821ea404286371f1408
Author: albailey <email address hidden>
Date: Fri May 28 11:05:43 2021 -0500

    Fix file permissions failure during duplex upgrade abort

    When issuing a downgrade for controller-0 in a duplex upgrade
    abort and rollback scenario, the downgrade command was failing
    because the sysinv API does not have root permissions to set
    a file flag.
    The fix is to use RPC so the conductor can create the flag
    and allow the downgrade for controller-0 to get further.

    Partial-Bug: 1929884
    Signed-off-by: albailey <email address hidden>
    Change-Id: I913bcad73309fe887a12cbb016a518da93327947

commit 7ef3724dad173754e40b45538b1cc726a458cc1c
Author: Chen, Haochuan Z <email address hidden>
Date: Tue May 25 16:16:29 2021 +0800

    Fix bug rook-ceph provision with multi osd on one host

    Test case:
    1, deploy simplex system
    2, apply rook-ceph with below override value
    value.yaml
    cluster:
      storage:
        nodes:
        - name: controller-0
          devices:
          - name: sdb
          - name: sdc
    3, reboot

    Without this fix, only osd pod could launch successfully after boot
    as vg start with ceph could not correctly add in sysinv-database

    Closes-bug: 1929511

    Change-Id: Ia5be599cd168d13d2aab7b5e5890376c3c8a0019
    Signed-off-by: Chen, Haochuan Z <email address hidden>

commit 23505ba77d76114cf8a0bf833f9a5bcd05bc1dd1
Author: Angie Wang <email address hidden>
Date: Tue May 25 18:49:21 2021 -0400

    Fix issue in partition data migration script

    The created partition dictonary partition_map is not
    an ordered dict so we need to sort it by its key -
    device node when iterating it to adjust the device
    nodes/paths for user created extra partitions to ensure
    the number of device node...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on config (f/centos8)

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793696

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793460

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.