Overcloud Deployment fails at a very late stage with "500 Internal Server Error"

Bug #1774895 reported by Ameen Ali
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Expired
Medium
Unassigned

Bug Description

Description
===========
I'm performing a TripleO Setup (Pike) on top of vMware ESXi (Containerized Overcloud). Setup, introspection, as well as initial steps of overcloud deployment (booting from PXE, loading the deploy images, rebooting with new hostnames, etc are working fine). At a later stage, the overcloud deployment exists with the following message:

After more debugging, I got the following detailed errors from stacks

Steps to reproduce
==================
* Install and prepare 4 VMs (undercloud, 2x Compute, 1x Controller) + 1 Router (VyOS). Kindly check Environment section below

* Perform Undercloud setup + Introspection of nodes successfully

* Launch overcloud deployment using the following command:
# openstack overcloud deploy --templates -e /home/stack/node-info.yaml -e /home/stack/local_overcloud_images -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml

Expected result
===============
Overcloud Deployment to finish successfully

Actual result
=============
Overcloud deployment exits with the following messages:

2018-06-02 02:34:42Z [overcloud.AllNodesDeploySteps.ControllerHostPrepDeployment.0]: CREATE_COMPLETE state changed
2018-06-02 02:34:42Z [overcloud.AllNodesDeploySteps.ControllerHostPrepDeployment]: CREATE_COMPLETE Stack CREATE completed successfully
2018-06-02 02:34:43Z [overcloud.AllNodesDeploySteps.ControllerHostPrepDeployment]: CREATE_COMPLETE state changed
2018-06-02 02:34:43Z [overcloud.AllNodesDeploySteps.ControllerPreConfig]: CREATE_IN_PROGRESS state changed
2018-06-02 02:34:44Z [overcloud.AllNodesDeploySteps.ControllerPreConfig]: CREATE_COMPLETE state changed
2018-06-02 02:34:44Z [overcloud.AllNodesDeploySteps.ObjectStorageDeployment_Step1]: CREATE_IN_PROGRESS state changed
2018-06-02 02:34:47Z [overcloud.AllNodesDeploySteps.ComputeDeployment_Step1]: CREATE_IN_PROGRESS state changed
2018-06-02 02:34:55Z [overcloud.AllNodesDeploySteps.ControllerDeployment_Step1]: CREATE_IN_PROGRESS state changed
2018-06-02 02:35:03Z [overcloud.AllNodesDeploySteps.BlockStorageDeployment_Step1]: CREATE_IN_PROGRESS state changed
2018-06-02 02:35:05Z [overcloud.AllNodesDeploySteps.CephStorageDeployment_Step1]: CREATE_IN_PROGRESS state changed
2018-06-02 02:35:10Z [overcloud.AllNodesDeploySteps.ObjectStorageDeployment_Step1]: CREATE_COMPLETE state changed
2018-06-02 02:35:10Z [overcloud.AllNodesDeploySteps.BlockStorageDeployment_Step1]: CREATE_COMPLETE state changed
2018-06-02 02:35:10Z [overcloud.AllNodesDeploySteps.CephStorageDeployment_Step1]: CREATE_COMPLETE state changed
2018-06-02 02:35:22Z [0]: CREATE_IN_PROGRESS state changed
ERROR: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at
 [no address given] to inform them of the time this error occurred,
 and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
</body></html>

Environment
===========
Openstack Pike
For more info about the overall network/compute setup, kindly refer to the first part of this article:
https://messeiry.com/deploying-redhat-openstack-rhos-12-pike-on-vmware-esxi/

Logs & Configs
==============

-- Content of local_overcloud_images are here: http://paste.openstack.org/show/722609/
-- Content of node-info.yaml are here: http://paste.openstack.org/show/722610/
-- Content of stack failures and more are here: http://paste.openstack.org/show/722611/

Revision history for this message
Ameen Ali (a-root-ameen) wrote :

Edit: -- Content of stack failures and more are here:
https://pastebin.com/yWR7t5RV

seems to be cut off on pate.openstack.org

Revision history for this message
Ameen Ali (a-root-ameen) wrote :
Download full text (11.8 KiB)

Update 01:

Running the deploy_steps_playbook.yaml directly on the failed compute node produces the following:

=================================================

(undercloud) [stack@director ~]$ openstack software deployment show 2ed6c2cc-3257-44b1-93ef-f46098b15491 --format value --column server_id
6530c389-54c8-4b54-9669-1ac2e7b6c497

(undercloud) [stack@director ~]$ openstack server list | grep 6530c389-54c8-4b54-9669-1ac2e7b6c497
| 6530c389-54c8-4b54-9669-1ac2e7b6c497 | overcloud-novacompute-0 | ACTIVE | ctlplane=192.0.2.11 | overcloud-full | compute |

(undercloud) [stack@director ~]$ openstack overcloud config download
The TripleO configuration has been successfully generated into: /home/stack/tripleo-6IXCCo-config

(undercloud) [stack@director tripleo-6IXCCo-config]$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory deploy_steps_playbook.yaml --limit overcloud-novacompute-0

PLAY [overcloud] ************************************************************************************************

TASK [Gathering Facts] ******************************************************************************************
The authenticity of host '192.0.2.11 (192.0.2.11)' can't be established.
ECDSA key fingerprint is SHA256:ZQh57G38E/H8wdYKkMNIX9pzIAsZ+yaTQ68/hQYGEYg.
ECDSA key fingerprint is MD5:07:44:a4:7a:0a:bd:38:5f:12:b0:2f:ab:af:ba:75:5d.
Are you sure you want to continue connecting (yes/no)? ues
Please type 'yes' or 'no': yes
ok: [192.0.2.11]

TASK [create persistent logs directory] *************************************************************************
skipping: [192.0.2.11] => (item=/var/log/containers/aodh)
skipping: [192.0.2.11] => (item=/var/log/containers/httpd/aodh-api)

TASK [aodh logs readme] *****************************************************************************************
skipping: [192.0.2.11]

TASK [create persistent logs directory] *************************************************************************
skipping: [192.0.2.11]

TASK [create persistent logs directory] *************************************************************************
skipping: [192.0.2.11]

TASK [ceilometer logs readme] ***********************************************************************************
skipping: [192.0.2.11]

TASK [Mount NFS on host] ****************************************************************************************
skipping: [192.0.2.11] => (item={u'NFS_OPTIONS': u'_netdev,bg,intr,context=system_u:object_r:glance_var_lib_t:s0', u'NFS_SHARE': u''})

TASK [create persistent logs directory] *************************************************************************
skipping: [192.0.2.11] => (item=/var/log/containers/glance)

TASK [glance logs readme] ***************************************************************************************
skipping: [192.0.2.11]

TASK [ensure ceph configurations exist] *************************************************************************
skipping: [192.0.2.11]

TASK [create persistent logs directory] *************************************************************************
skipping: [192.0.2.11] => (item=/var/log/containers/gnocchi)
skipping: [192.0.2.11] => (item=/var/log/containers/http...

Changed in tripleo:
milestone: none → rocky-2
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Ameen Ali (a-root-ameen) wrote :

Update 02:
By using -b, the playbook ran successfully. Also, I omit the --limit option to make it run on all nodes and it also finished successfully. However, I was not presented with output specifying overcloud endpoint. Am I supposed to re-run the (# openstack overcloud deploy) command?

Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
milestone: stein-3 → train-1
Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
milestone: train-2 → train-3
Changed in tripleo:
milestone: train-3 → ussuri-1
Changed in tripleo:
milestone: ussuri-1 → ussuri-2
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-2 → ussuri-3
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Incomplete
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-3 → ussuri-rc3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
Changed in tripleo:
milestone: victoria-3 → wallaby-1
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Revision history for this message
Marios Andreou (marios-b) wrote :

This is an automated action. Bug status has been set to 'Incomplete' and target milestone has been removed due to inactivity. If you disagree please re-set these values and reach out to us on freenode #tripleo

Changed in tripleo:
milestone: wallaby-3 → none
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for tripleo because there has been no activity for 60 days.]

Changed in tripleo:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.