ansible replay fails at adding ssl_ca cert

Bug #1881216 reported by Yang Liu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
Medium
Yang Liu

Bug Description

Brief Description
-----------------
Ansible bootstrap replay failed at installing ssl_ca cert. The initial bootstrap already succeeded in this step, however it failed at replay.

I tried to replay again, but same issue occurred. I had to reinstall the system.

E TASK [bootstrap/persist-config : Add ssl_ca certificate] ***********************
E fatal: [localhost]: FAILED! => {"changed": true, "cmd": "source /etc/platform/openrc; system certificate-install -m ssl_ca /tmp/ca-cert.pem", "delta": "0:01:03.648569", "end": "2020-05-28 23:45:58.636711", "msg": "non-zero return code", "rc": 1, "start": "2020-05-28 23:44:54.988142", "stderr": "Certificate /tmp/ca-cert.pem not installed: Expecting value: line 1 column 1 (char 0)", "stderr_lines": ["Certificate /tmp/ca-cert.pem not installed: Expecting value: line 1 column 1 (char 0)"], "stdout": "WARNING: For security reasons, the original certificate, \ncontaining the private key, will be removed, \nonce the private key is processed.", "stdout_lines": ["WARNING: For security reasons, the original certificate, ", "containing the private key, will be removed, ", "once the private key is processed."]}
E
E PLAY RECAP *********************************************************************
E localhost : ok=139 changed=34 unreachable=0 failed=1

Severity
--------
Major

Steps to Reproduce
------------------
- Install controller-0
- Run ansible bootstrap (bootstrap passed)
- Rerun ansible bootstrap using same localhost.yml

Expected Behavior
------------------
- Bootstrap passed in replay

Actual Behavior
----------------
- Bootstrap failed in replay

Reproducibility
---------------
Not sure yet

System Configuration
--------------------
AIO-DX
Lab-name: wolfpass8-12

Branch/Pull Time/Commit
-----------------------
stx master as of "2020-05-27_20-00-00"

Last Pass
---------
Not sure.

Timestamp/Logs
--------------
# Initial play:
[2020-05-28 21:19:53,592] 140 INFO MainThread telnet.send :: Send: ansible-playbook lab-install-playbook.yaml -e "@local-install-overrides.yaml"
...
TASK [bootstrap/persist-config : Restart Docker and containerd] ****************

TASK [bootstrap/persist-config : Copy ssl_ca certificate] **********************
changed: [localhost]

TASK [bootstrap/persist-config : Remove ssl_ca complete flag] ******************
changed: [localhost]

TASK [bootstrap/persist-config : Add ssl_ca certificate] ***********************
changed: [localhost]

TASK [bootstrap/persist-config : Wait for certificate install] *****************
ok: [localhost]

TASK [bootstrap/persist-config : Cleanup temporary certificate] ****************
ok: [localhost]
...

# Relay
[2020-05-28 23:28:04,681] 140 INFO MainThread telnet.send :: Send: ansible-playbook lab-install-playbook.yaml -e "@local-install-overrides.yaml"
...
E TASK [bootstrap/persist-config : Restart Docker and containerd] ****************
E
E TASK [bootstrap/persist-config : Copy ssl_ca certificate] **********************
E changed: [localhost]
E
E TASK [bootstrap/persist-config : Remove ssl_ca complete flag] ******************
E changed: [localhost]
E
E TASK [bootstrap/persist-config : Add ssl_ca certificate] ***********************
E fatal: [localhost]: FAILED! => {"changed": true, "cmd": "source /etc/platform/openrc; system certificate-install -m ssl_ca /tmp/ca-cert.pem", "delta": "0:01:04.271207", "end": "2020-05-28 23:30:51.628429", "msg": "non-zero return code", "rc": 1, "start": "2020-05-28 23:29:47.357222", "stderr": "Certificate /tmp/ca-cert.pem not installed: Expecting value: line 1 column 1 (char 0)", "stderr_lines": ["Certificate /tmp/ca-cert.pem not installed: Expecting value: line 1 column 1 (char 0)"], "stdout": "WARNING: For security reasons, the original certificate, \ncontaining the private key, will be removed, \nonce the private key is processed.", "stdout_lines": ["WARNING: For security reasons, the original certificate, ", "containing the private key, will be removed, ", "once the private key is processed."]}
E
E PLAY RECAP *********************************************************************
E localhost : ok=152 changed=49 unreachable=0 failed=1

Test Activity
-------------
Normal use

Revision history for this message
Yang Liu (yliu12) wrote :
Yang Liu (yliu12)
description: updated
description: updated
description: updated
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 / medium priority - ansible replay failure

Changed in starlingx:
status: New → Triaged
importance: Undecided → Medium
tags: added: stx.4.0 stx.config stx.security
Changed in starlingx:
assignee: nobody → Andy (andy.wrs)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

I'd like the reporter to re-test this scenario after the following commit is merged:
https://review.opendev.org/#/c/732402/

It appears that ansible replay cannot be run after the bootstrap is finalized and host configuration has started. Right now, a random error msg related to the certificate is returned in this case. The above commit will return a better error message. This will help us determine if this is indeed a certificate issue or the replay being triggered too late (in which case the new error msg would be expected).

Changed in starlingx:
status: Triaged → Incomplete
assignee: Andy (andy.wrs) → Yang Liu (yliu12)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Note: The commit noted above was merged in stx master on 2020-06-11

Ghada Khalil (gkhalil)
tags: added: stx.retestneeded
Revision history for this message
Ghada Khalil (gkhalil) wrote :

I am not sure if this should still be marked as for stx.4.0 given it deals with an ansible replay scenario (which is a failure test scenario).

Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Yang, is there any more information on this test scenario? Is this issue still reproducible?

Revision history for this message
Ghada Khalil (gkhalil) wrote :

No info on re-test is available yet. Given this deals with an ansible replay scenario (which is a failure test-case), I'm moving this to stx.5.0

tags: added: stx.5.0
removed: stx.4.0
Ghada Khalil (gkhalil)
tags: removed: stx.security
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Any update on re-testing this scenario?

Revision history for this message
Yang Liu (yliu12) wrote :

This was tested 10+ times in past two weeks, and did not see this issue.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Closing as the issue is no longer reproducible

Changed in starlingx:
status: Incomplete → Invalid
Ghada Khalil (gkhalil)
tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.