Overcloud Deploy Hangs During a Large Deployment
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
High
|
Luke Short |
Bug Description
Description
===========
When deploying a large number of nodes, the output of from Ansible during the config_
We have ruled out the problem being related to our usage of Python's subprocess to execute the ansible-playbook command. This means that the issue is definitely isolated to Mistral and/or Zaqar. It is believed that the large amount of memory/buffer is overloading the services when trying to send messages to/from Zaqar.
As a workaround, using `openstack overcloud deploy --quiet` can help. The config-download playbooks can also be ran manually.
Steps to reproduce
==================
This issue is difficult to replicate all the time. The more Overcloud nodes, the more likely it is to happen.
* Deploy an Overcloud with at least 50 Compute nodes using Train.
Expected result
===============
The full Ansible output should be displayed and tripleoclient should exit with status 0.
Actual result
=============
Normally during step 4 or 5 the CLI will hang and stop outputting the Ansible stdout/stderr. It stops at random points in every re-deployment. No errors are reported in the Mistral or Zaqar logs.
Environment
===========
All OpenStack releases using Mistral and Zaqar for the deployment (<= Train).
Logs & Configs
==============
BZ with more information: https:/
Changed in tripleo: | |
milestone: | ussuri-rc1 → ussuri-rc3 |
tags: | added: queens-backport-potential train-backport-potential |
Changed in tripleo: | |
milestone: | ussuri-rc3 → victoria-1 |
Changed in tripleo: | |
milestone: | victoria-1 → victoria-3 |
Changed in tripleo: | |
milestone: | victoria-3 → wallaby-1 |
Changed in tripleo: | |
milestone: | wallaby-1 → wallaby-2 |
Changed in tripleo: | |
milestone: | wallaby-2 → wallaby-3 |
Fix proposed to branch: master /review. opendev. org/720083
Review: https:/