Playbook fails at alarm check task

Bug #1958885 reported by Reinildes Oliveira
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Reinildes Oliveira

Bug Description

Brief Description
------------------------------------------

platform cert migration playbook fails at task mgmt alarms check

Severity
-------------------------------------------

Provide the severity of the defect.

Major

Steps to Reproduce
-------------------------------------------

1)Create the following deploy
{code:java}
---
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: system-selfsigning-issuer
spec:
  selfSigned: {}
---
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: cloudplatform-rootca-certificate
spec:
  secretName: cloudplatform-rootca-certificate
  commonName: "cloudplatform-rootca"
  isCA: true
  duration: 30681h0m0s
  renewBefore: 720h0m0s
  issuerRef:
    name: system-selfsigning-issuer
    kind: ClusterIssuer
---
apiVersion: cert-manager.io/v1alpha2
kind: Issuer
metadata:
  name: cloudplatform-rootca-issuer
spec:
  ca:
    secretName: cloudplatform-rootca-certificate
---
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
  name: cloudplatform-interca-certificate
spec:
  secretName: cloudplatform-interca-certificate
  commonName: "cloudplatform-interca"
  isCA: true
  duration: 30681h0m0s
  renewBefore: 720h0m0s
  issuerRef:
    name: cloudplatform-rootca-issuer
    kind: Issuer

{code}
kubectl create -f issuer.yaml

2)once the cert is issued, gather the crt, key data
{code:java}
sysadmin@controller-0 ~(keystone_admin)]$ echo $(kubectl get secrets cloudplatform-interca-certificate -o jsonpath='{.data.tls\.crt}')
base64...
[sysadmin@controller-0 ~(keystone_admin)]$ echo $(kubectl get secrets cloudplatform-interca-certificate -o jsonpath='{.data.tls\.key}')
base64...

{code}
3)Create the following inventory file with the above values
{code:java}
all:
  vars:
    ica_cert: base64
    ica_key: base64
  children:
    target_group:
      vars:
        dns_domain: mydomain
        duration: 2160h # 90d
        renewBefore: 360h # 15d
        subject_C: Canada
        subject_ST: Ontario
        subject_L: Ottawa
        subject_O: myorganization
        subject_OU: engineering
        subject_CN: myorganization.com
        subject_prefix: starlingx2`
        # SSH password to connect to all subclouds
        ansible_ssh_user: sysadmin
        ansible_ssh_pass: pwd*
        # Sudo password
        ansible_become_pass: pwd*

{code}
4)Now run the playbook
{code:java}
[sysadmin@controller-0 ~(keystone_admin)]$ ansible-playbook /usr/share/ansible/stx-ansible/playbooks/migrate-platform-certificates-to-certmanager.yml -i migration-inventory.yml --extra-vars "target_list=localhost mode=update" --ask-vault-pass
Vault password:
 [WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'

PLAY [localhost] ***************************************************************************************************************************************************************************************

TASK [Fail if target_list is not defined] **************************************************************************************************************************************************************
Friday 21 January 2022 21:47:14 +0000 (0:00:00.064) 0:00:00.064 ********
skipping: [localhost]

TASK [Get online subclouds from dcmanager] *************************************************************************************************************************************************************
Friday 21 January 2022 21:47:14 +0000 (0:00:00.015) 0:00:00.080 ********
skipping: [localhost]

TASK [Add host to target_group] ************************************************************************************************************************************************************************
Friday 21 January 2022 21:47:14 +0000 (0:00:00.013) 0:00:00.093 ********
skipping: [localhost]

TASK [Get subcloud from extra-vars] ********************************************************************************************************************************************************************
Friday 21 January 2022 21:47:14 +0000 (0:00:00.015) 0:00:00.108 ********
changed: [localhost] => (item=localhost)
 [WARNING]: A duplicate localhost-like entry was found (localhost). First found localhost was localhost

TASK [migrate-platform-certificates-to-certmanager/install-trusted-ca : Save the specified ICA to a file] **********************************************************************************************
Friday 21 January 2022 21:47:14 +0000 (0:00:00.026) 0:00:00.135 ********
changed: [localhost]

TASK [migrate-platform-certificates-to-certmanager/install-trusted-ca : Get CA information from certificate] *******************************************************************************************
Friday 21 January 2022 21:47:15 +0000 (0:00:00.402) 0:00:00.538 ********
changed: [localhost]

TASK [migrate-platform-certificates-to-certmanager/install-trusted-ca : Fail when ICA certificate is not an actual CA certificate] *********************************************************************
Friday 21 January 2022 21:47:15 +0000 (0:00:00.208) 0:00:00.746 ********
skipping: [localhost]

TASK [migrate-platform-certificates-to-certmanager/install-trusted-ca : Get years for ICA duration validation] *****************************************************************************************
Friday 21 January 2022 21:47:15 +0000 (0:00:00.025) 0:00:00.771 ********
ok: [localhost]

TASK [migrate-platform-certificates-to-certmanager/install-trusted-ca : Check that ICA certificate remaining duration is longer than 3 years] **********************************************************
Friday 21 January 2022 21:47:15 +0000 (0:00:00.062) 0:00:00.834 ********
changed: [localhost]

TASK [migrate-platform-certificates-to-certmanager/install-trusted-ca : Fail when ICA certificate remaining duration is shorter than 3 years] **********************************************************
Friday 21 January 2022 21:47:15 +0000 (0:00:00.211) 0:00:01.045 ********
skipping: [localhost]

TASK [migrate-platform-certificates-to-certmanager/install-trusted-ca : Install ICA] *******************************************************************************************************************
Friday 21 January 2022 21:47:15 +0000 (0:00:00.017) 0:00:01.063 ********
changed: [localhost]

PLAY [target_group] ************************************************************************************************************************************************************************************
Friday 21 January 2022 21:47:22 +0000 (0:00:07.016) 0:00:08.079 ********
included: /usr/share/ansible/stx-ansible/playbooks/roles/migrate-platform-certificates-to-certmanager/migrate-certificates/tasks/check-for-management-alarms.yml for localhost
Friday 21 January 2022 21:47:22 +0000 (0:00:00.034) 0:00:08.114 ********

TASK [migrate-platform-certificates-to-certmanager/migrate-certificates : Check for management affecting alarms] ***************************************************************************************
changed: [localhost]
Friday 21 January 2022 21:47:25 +0000 (0:00:02.941) 0:00:11.055 ********

TASK [migrate-platform-certificates-to-certmanager/migrate-certificates : Fail when there are management alarms] ***************************************************************************************
fatal: [localhost]: FAILED! => changed=false
  msg: There are management affecting alarms present on the target system. Execution will not continue. No certificates were migrated. After a careful analysis of the alarms, retry this target with extra-var ignore-alarms=yes
Friday 21 January 2022 21:47:25 +0000 (0:00:00.048) 0:00:11.104 ********

TASK [migrate-platform-certificates-to-certmanager/migrate-certificates : debug] ***********************************************************************************************************************
ok: [localhost] =>
  msg: Failed to migrate platform certificates to cert-manager. Please find backups of the previous certificates in /home/sysadmin/certificates_backup.
Friday 21 January 2022 21:47:25 +0000 (0:00:00.044) 0:00:11.149 ********

TASK [migrate-platform-certificates-to-certmanager/migrate-certificates : Show backups of certificates] ************************************************************************************************
fatal: [localhost]: FAILED! => changed=true
  cmd:
  - ls
  - -lR
  - /home/sysadmin/certificates_backup
  delta: '0:00:00.002146'
  end: '2022-01-21 21:47:26.042180'
  msg: non-zero return code
  rc: 2
  start: '2022-01-21 21:47:26.040034'
  stderr: 'ls: cannot access /home/sysadmin/certificates_backup: No such file or directory'
  stderr_lines:
  - 'ls: cannot access /home/sysadmin/certificates_backup: No such file or directory'
  stdout: ''
  stdout_lines: <omitted>

PLAY RECAP *********************************************************************************************************************************************************************************************
localhost : ok=9 changed=6 unreachable=0 failed=2

Friday 21 January 2022 21:47:26 +0000 (0:00:00.288) 0:00:11.437 ********
===============================================================================
migrate-platform-certificates-to-certmanager/install-trusted-ca : Install ICA ------------------------------------------------------------------------------------------------------------------- 7.02s
migrate-platform-certificates-to-certmanager/migrate-certificates : Check for management affecting alarms --------------------------------------------------------------------------------------- 2.94s
migrate-platform-certificates-to-certmanager/install-trusted-ca : Save the specified ICA to a file ---------------------------------------------------------------------------------------------- 0.40s
migrate-platform-certificates-to-certmanager/migrate-certificates : Show backups of certificates ------------------------------------------------------------------------------------------------ 0.29s
migrate-platform-certificates-to-certmanager/install-trusted-ca : Check that ICA certificate remaining duration is longer than 3 years ---------------------------------------------------------- 0.21s
migrate-platform-certificates-to-certmanager/install-trusted-ca : Get CA information from certificate ------------------------------------------------------------------------------------------- 0.21s
migrate-platform-certificates-to-certmanager/install-trusted-ca : Get years for ICA duration validation ----------------------------------------------------------------------------------------- 0.06s
migrate-platform-certificates-to-certmanager/migrate-certificates : Fail when there are management alarms --------------------------------------------------------------------------------------- 0.05s
migrate-platform-certificates-to-certmanager/migrate-certificates : debug ----------------------------------------------------------------------------------------------------------------------- 0.04s
migrate-platform-certificates-to-certmanager/migrate-certificates : Check for management affecting alarms --------------------------------------------------------------------------------------- 0.03s
Get subcloud from extra-vars -------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.03s
migrate-platform-certificates-to-certmanager/install-trusted-ca : Fail when ICA certificate is not an actual CA certificate --------------------------------------------------------------------- 0.03s
migrate-platform-certificates-to-certmanager/install-trusted-ca : Fail when ICA certificate remaining duration is shorter than 3 years ---------------------------------------------------------- 0.02s
Fail if target_list is not defined -------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.02s
Add host to target_group ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0.02s
Get online subclouds from dcmanager ------------------------------------------------------------------------------------------------------------------------------------------------------------- 0.01s

{code}
playbook fails on alarm check task even though there are no alarms on the system
{code:java}
 [sysadmin@controller-0 ~(keystone_admin)]$ fm alarm-list

[sysadmin@controller-0 ~(keystone_admin)]$

{code}
Expected Behavior
-------------------------------------------

playbook should work fine without any errors

Actual Behavior
-------------------------------------------
playbook fails

Reproducibility
-------------------------------------------

100%

System Configuration
-------------------------------------------

ipv4

{code}
Last Pass
-------------------------------------------

new feature testing

Test Activity
-------------------------------------------

Feature testing

Workaround
-------------------------------------------

Run the playbook with extra-var ignore_alarms=yes.

Example:
ansible-playbook /usr/share/ansible/stx-ansible/playbooks/migrate-platform-certificates-to-certmanager.yml -i inventory.yml --extra-vars "target_list=localhost mode=update ignore_alarms=yes" --ask-vault-pass

description: updated
Changed in starlingx:
assignee: nobody → Reinildes Oliveira (rjosemat)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
Ghada Khalil (gkhalil) wrote :

screening: stx.7.0 / medium - issue related to a new certificate migration playbook introduced in stx.7.0

Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.7.0 stx.security
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/826110
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/3c30097321208e480a234309681a6db80325dc72
Submitter: "Zuul (22348)"
Branch: master

commit 3c30097321208e480a234309681a6db80325dc72
Author: Rei Oliveira <email address hidden>
Date: Mon Jan 24 12:49:34 2022 -0300

    Playbook fails at alarm check task

    This commit adds retry for the alarm check in order to make it more
    resilient to temporary alarms.

    It also includes some minor improvements regarding certificate backups.

    Test Plan:
    PASS: Trigger temporary alarm and run the playbook. It should be
          resilient enough to wait for about 2 minutes for alarm to clear.
    PASS: Trigger a persistent alarm and check that playbook waits for
          about 2 minutes for alarm to clear and fails when it does not.

    Closes-Bug: 1958885
    Signed-off-by: Rei Oliveira <email address hidden>
    Change-Id: I44d3a885d0f470263fd9feb8947755b4cb67326b

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.