Default timeout of enable_ssh_admin is too short in case multiple servers deployment
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Medium
|
Yossi Ovadia |
Bug Description
Description
===========
In case of deployment of multiple servers ( in my case , 8 ) where some are storage computes , the workflow of enable_ssh_admin timed-out with the following Err -
---
2018-11-27 22:43:45Z [overcloud]: CREATE_COMPLETE Stack CREATE completed successfully
Stack overcloud/
Deploying overcloud configuration
Enabling ssh admin (tripleo-admin) for hosts:
172.31.0.9 172.31.0.24 172.31.0.5 172.31.0.6 172.31.0.21 172.31.0.15 172.31.0.13 172.31.0.11
Using ssh user heat-admin for initial connection.
Using ssh key at /home/stack/
Inserting TripleO short term key for 172.31.0.9
Inserting TripleO short term key for 172.31.0.24
Inserting TripleO short term key for 172.31.0.5
Inserting TripleO short term key for 172.31.0.6
Inserting TripleO short term key for 172.31.0.21
Inserting TripleO short term key for 172.31.0.15
Inserting TripleO short term key for 172.31.0.13
Inserting TripleO short term key for 172.31.0.11
Starting ssh admin enablement workflow
ssh admin enablement workflow - RUNNING.
ssh admin enablement workflow - RUNNING.
ssh admin enablement workflow - RUNNING.
ssh admin enablement workflow - RUNNING.
ssh admin enablement workflow - RUNNING.
ssh admin enablement workflow - RUNNING.
..
ssh admin enablement workflow - RUNNING.
Removing short term keys locallyException caught during deployment: ssh admin enablement workflow - TIMED OUT.
Exception caught during deployment: ssh admin enablement workflow - TIMED OUT.
stack named overcloud found.
exit status: 255
----
looking at the relevant workflow shows that it too ** slightly ** more than 300 seconds -
(undercloud) [stack@undercloud templates]$ os workflow execution list |grep enable_ssh_admin
(undercloud) [stack@undercloud templates]$ os workflow execution show b1d74029-
+------
| Field | Value |
+------
| ID | b1d74029-
| Workflow ID | 0a607914-
| Workflow name | tripleo.
| Workflow namespace | |
| Description | |
| Task Execution ID | <none> |
| Root Execution ID | <none> |
| State | SUCCESS |
| State info | None |
| Created at | 2018-11-28 06:06:43 |
| Updated at | 2018-11-28 06:11:55 |
+------
Trying to deploy with less server deployment did passed successfully.
Manually increasing the timeout of the constand timeout from 300 to 600 resolved the problem.
Steps to reproduce
==================
Deploy on relatively large amount of real servers.
Expected result
===============
Should work.
Actual result
=============
Did not.
Environment:
============
Nokia Airframe 3 storage servers + 5 computes.
Version:
=========
Containerized Rocky
Changed in tripleo: | |
assignee: | nobody → Yossi Ovadia (jabadia) |
Changed in tripleo: | |
importance: | Undecided → Medium |
status: | New → Fix Released |
Suggested fix https:/ /review. openstack. org/620754