Distributed Cloud: 5 subclouds fail deployment
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Tee Ngo |
Bug Description
Brief Description
-----------------
Tested parallel deployment of subclouds. Of the group 3 bootstrap-failed and 2 deploy-prep-failed.
Severity
--------
Major
Steps to Reproduce
------------------
Set up distributed cloud system with many subclouds
Expected Behavior
------------------
subclouds should deploy successfully.
Actual Behavior
----------------
5 failed to deploy.
Reproducibility
---------------
occurred on 1 out of 2 attempts
System Configuration
-------
Distributed Cloud - system controller
Branch/Pull Time/Commit
-------
2020-06-27_18-35-20
Last Pass
---------
none
Timestamp/Logs
--------------
controller-0 was active at the time of test:
2 deploy prep failed (subcloud 81 & 99)
2020-08-11 22:43:14.859 13677 ERROR dcmanager.
2020-08-11 22:43:16.331 13677 ERROR dcmanager.
3 bootstrap failed
subcloud98:
2020-08-11 22:43:15.042 13677 ERROR dcmanager.
[sysadmin@
changed: [subcloud98] => (item=DOCKER_
changed: [subcloud98] => (item=ELASTIC_
changed: [subcloud98] => (item=USE_
changed: [subcloud98] => (item=RECONFIGU
changed: [subcloud98] => (item=INITIAL_
fatal: [subcloud98]: FAILED! => {"msg": "Timeout (12s) waiting for privilege escalation prompt: "}
PLAY RECAP *******
subcloud98 : ok=115 changed=23 unreachable=0 failed=1
subcloud80:
2020-08-11 22:43:15.092 13677 ERROR dcmanager.
[sysadmin@
TASK [bootstrap/
TASK [bootstrap/
TASK [bootstrap/
fatal: [subcloud80]: FAILED! => {"msg": "Timeout (12s) waiting for privilege escalation prompt: "}
PLAY RECAP *******
subcloud80 : ok=112 changed=21 unreachable=0 failed=1
subcloud97:
2020-08-11 22:48:00.198 13677 ERROR dcmanager.
[sysadmin@
changed: [subcloud97] => (item=grubby --update-
changed: [subcloud97] => (item=grubby --efi --update-
TASK [bootstrap/
changed: [subcloud97] => (item=lvextend -L1G /dev/cgts-
fatal: [subcloud97]: FAILED! => {"msg": "Timeout (12s) waiting for privilege escalation prompt: "}
PLAY RECAP *******
subcloud97 : ok=152 changed=47 unreachable=0 failed=1
Was collecting performance data at the time and these errors correspond to periods of high cpu usage on system controller due to kswapd, postgres, ansible-playbook processes.
Test Activity
-------------
Distributed Cloud system testing
Workaround
----------
Subclouds deployed successfully individually on subsequent attempt.
Changed in starlingx: | |
assignee: | nobody → Gerry Kopec (gerry-kopec) |
summary: |
- Distributed Cloud: some subclouds fail deployment + Distributed Cloud: 5 subclouds fail deployment |
tags: | added: stx.distcloud |
Changed in starlingx: | |
assignee: | Gerry Kopec (gerry-kopec) → Tee Ngo (teewrs) |
stx.5.0 - scalability issue with distributed cloud