Subcloud install playbook failed to ping OAM interface of subclouds
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Li Zhu |
Bug Description
Brief Description
------------------
Failed to ping OAM interface of subclouds in the install playbook.
Failure:
TASK [common/prepare-env : Fail if host is unreachable] *******
Wednesday 23 November 2022 01:43:16 +0000 (0:00:10.177) 1:08:56.357 ****
skipping: [subcloudXXX] => (item=PING <ip> 56 data bytes)
skipping: [subcloudXXX] => (item=)
skipping: [subcloudXXX] => (item=--- <ip> ping statistics ---)
failed: [subcloudXXX] (item=1 packets transmitted, 0 received, 100% packet loss, time 0ms) => changed=false
ansible_loop_var: item
item: 1 packets transmitted, 0 received, 100% packet loss, time 0ms
msg: Host <ip> is unreachable!PLAY RECAP
The install completed at 01h43min05s and it failed at 01h43min16s, 9 seconds after the install completed.
In daemon logs we can see that the server was rebooted at 01h43min05s. The OAM interface seem to have taken longer than 9 seconds till it becomes available. The step " [Waiting 9000 seconds for port 22 become open on <ip>]" also checks if the interface is available. So maybe the oam interface connection was NOT stable for few seconds. I was able to connect to the host post failure.
Severity
--------
<Critical: System/Feature is not usable after the defect>
Steps to Reproduce
------------------
Run remote subcloud install
Expected Behavior
-----------------
The subcloud deployment should complete successfully
Actual Behavior
---------------
The subcloud install successfully, however the OAM was not stable for few seconds and then the install playbook failed to PING it.
Reproducibility
---------------
Intermittent
System Configuration
------------------
DC lab
SW_VERSION="22.12"
BUILD_ID=
Timestamp/Logs
---------------
daemon logs: reboot system
/var/log/
ansible logs: - subcloudXXX
2022-11-23-00:34:19 Executing playbook command: ['ansible-
...
changed: [subcloudXXX]TASK [common/prepare-env : Fail if host is unreachable] *******
Wednesday 23 November 2022 01:43:16 +0000 (0:00:10.177) 1:08:56.357 ****
skipping: [subcloudXXX] => (item=PING <ip> 56 data bytes)
skipping: [subcloudXXX] => (item=)
skipping: [subcloudXXX] => (item=--- <ip> ping statistics ---)
failed: [subcloudXXX] (item=1 packets transmitted, 0 received, 100% packet loss, time 0ms) => changed=false
ansible_loop_var: item
item: 1 packets transmitted, 0 received, 100% packet loss, time 0ms
msg: Host 2620:10a:
subcloudXXX : ok=21 changed=9 unreachable=0 failed=1 skipped=13 rescued=0 ignored=0
Alarms
--------
n/a
Test Activity
-----------------
Feature Testing
Workaround
-----------------
Resume the subcloud deployment to bootstrap playbook phase
Fix proposed to branch: master /review. opendev. org/c/starlingx /ansible- playbooks/ +/867544
Review: https:/