tripleo

Overcloud Deploy Hangs During a Large Deployment

Bug #1872823 reported by Luke Short on 2020-04-14

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Fix Released	High	Luke Short	tripleo wallaby-3

Bug Description

Description
===========
When deploying a large number of nodes, the output of from Ansible during the config_download_deploy workflow stops. The ansible.log file continues to completion and it can be seen that the deployment does complete successfully. However, the tripleoclient CLI will hang / be stuck indefinitely. This leads operators to think that the deployment is stuck or failed.

We have ruled out the problem being related to our usage of Python's subprocess to execute the ansible-playbook command. This means that the issue is definitely isolated to Mistral and/or Zaqar. It is believed that the large amount of memory/buffer is overloading the services when trying to send messages to/from Zaqar.

As a workaround, using `openstack overcloud deploy --quiet` can help. The config-download playbooks can also be ran manually.

Steps to reproduce
==================
This issue is difficult to replicate all the time. The more Overcloud nodes, the more likely it is to happen.

* Deploy an Overcloud with at least 50 Compute nodes using Train.

Expected result
===============
The full Ansible output should be displayed and tripleoclient should exit with status 0.

Actual result
=============
Normally during step 4 or 5 the CLI will hang and stop outputting the Ansible stdout/stderr. It stops at random points in every re-deployment. No errors are reported in the Mistral or Zaqar logs.

Environment
===========
All OpenStack releases using Mistral and Zaqar for the deployment (<= Train).

Logs & Configs
==============
BZ with more information: https://bugzilla.redhat.com/show_bug.cgi?id=1792500

Tags:

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-14: Fix proposed to python-tripleoclient (master)

Fix proposed to branch: master
Review: https://review.opendev.org/720083

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-17: Fix proposed to tripleo-common (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/720845

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-28: Fix proposed to puppet-tripleo (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/724181

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-11: Change abandoned on tripleo-common (stable/train)

Change abandoned by Luke Short (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/720845
Reason: I will hold off on this patch to see what the results of using only https://review.opendev.org/#/c/720083/ provides. This patch may be unnecessary.

wes hayutin (weshayutin) on 2020-05-11

Changed in tripleo:
milestone:	ussuri-rc1 → ussuri-rc3

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-18: Change abandoned on python-tripleoclient (master)

Change abandoned by Luke Short (<email address hidden>) on branch: master
Review: https://review.opendev.org/720083

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-18: Related fix proposed to puppet-tripleo (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/728974

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-24: Related fix merged to puppet-tripleo (master)

Reviewed: https://review.opendev.org/728974
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=5c3e736e409e661b7e1db51749719eafb86f2f9a
Submitter: Zuul
Branch: master

commit 5c3e736e409e661b7e1db51749719eafb86f2f9a
Author: Luke Short <email address hidden>
Date: Mon May 18 14:06:47 2020 -0400

Allow the Mistral tunnel timeout to be configurable.

    Change-Id: Ibfd5587476d5a411206f62e8b4b886db662bf7d1
    Related-Bug: #1872823
    Signed-off-by: Luke Short <email address hidden>

Bogdan Dobrelya (bogdando) on 2020-05-25

tags:

added: queens-backport-potential train-backport-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-26: Related fix proposed to puppet-tripleo (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/730805

wes hayutin (weshayutin) on 2020-05-26

Changed in tripleo:
milestone:	ussuri-rc3 → victoria-1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-27: Related fix proposed to puppet-tripleo (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/731031

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-27: Related fix merged to puppet-tripleo (stable/train)

#10

Reviewed: https://review.opendev.org/730805
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=9bd8331053afbd2a3dd7056d2ba377641ac8dd7a
Submitter: Zuul
Branch: stable/train

commit 9bd8331053afbd2a3dd7056d2ba377641ac8dd7a
Author: Luke Short <email address hidden>
Date: Mon May 18 14:06:47 2020 -0400

Allow the Mistral tunnel timeout to be configurable.

    Change-Id: Ibfd5587476d5a411206f62e8b4b886db662bf7d1
    Related-Bug: #1872823
    Signed-off-by: Luke Short <email address hidden>
    (cherry picked from commit 5c3e736e409e661b7e1db51749719eafb86f2f9a)

tags:

added: in-stable-train

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-27: Related fix merged to puppet-tripleo (stable/ussuri)

#11

Reviewed: https://review.opendev.org/731031
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=ec4e58927f33c77f65ec3b7a8f580b54cf4589e4
Submitter: Zuul
Branch: stable/ussuri

commit ec4e58927f33c77f65ec3b7a8f580b54cf4589e4
Author: Luke Short <email address hidden>
Date: Mon May 18 14:06:47 2020 -0400

Allow the Mistral tunnel timeout to be configurable.

tags:

added: in-stable-ussuri

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-06-24: Change abandoned on puppet-tripleo (stable/train)

#12

Change abandoned by Luke Short (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/724181
Reason: Abandoning due to preferring the patches mentioned in my previous comment.

Revision history for this message

Luke Short (ekultails) wrote on 2020-06-29:

#13

This issue has also been reported a few times recently on small deployments as well. Adding the argument `--quiet` continues to be the recommended workaround.

Emilien Macchi (emilienm) on 2020-07-28

Changed in tripleo:
milestone:	victoria-1 → victoria-3

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-21: Change abandoned on tripleo-common (stable/train)

#14

Change abandoned by Luke Short (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/720845

Marios Andreou (marios-b) on 2020-11-03

Changed in tripleo:
milestone:	victoria-3 → wallaby-1

Marios Andreou (marios-b) on 2020-12-08

Changed in tripleo:
milestone:	wallaby-1 → wallaby-2

Marios Andreou (marios-b) on 2021-01-29

Changed in tripleo:
milestone:	wallaby-2 → wallaby-3

Revision history for this message

Rabi Mishra (rabi) wrote on 2021-02-19:

#15

I think https://review.opendev.org/c/openstack/python-tripleoclient/+/776612 would fix it.

Revision history for this message

Rabi Mishra (rabi) wrote on 2021-03-03:

#16

Fixed with https://review.opendev.org/c/openstack/python-tripleoclient/+/776612

Changed in tripleo:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-04-05: Fix included in openstack/python-tripleoclient 12.5.0

#17

This issue was fixed in the openstack/python-tripleoclient 12.5.0 release.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.