Failure when deploying the overcloud on a predeployed server

Bug #1742237 reported by emanoel
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Incomplete
Medium
Unassigned

Bug Description

Description
===========
When following the steps documented here http://tripleo.org/install/advanced_deployment/deployed_server.html the overcloud deployment gets stuck eventually timing out during stack creation.

Steps to reproduce
==================
* Deployed the undercloud using the steps here https://docs.openstack.org/tripleo-docs/latest/install/installation/installation.html#installing-the-undercloud
* The repo used for the undercloud was current-tripleo-dev
* Followed the steps here http://tripleo.org/install/advanced_deployment/deployed_server.html to deploy the overcloud. All the documented steps passed except for the final overcloud deploy command (see logs and config used below)
* The VM used for the undercloud was a Centos 7 and all the needed packages were installed as documented above.

Expected result
===============
Overcloud deployment completes successfully and the overcloud services are present in the controller node as containers.

Actual result
=============
Overcloud deployment is stuck at this step `[overcloud.AllNodesDeploySteps.ControllerDeployedServerDeployment_Step1.0]: CREATE_IN_PROGRESS state change`

Logs & Configs
==============

Overcloud deploy command and output: http://paste.openstack.org/show/641415/
os-collect-config log: http://paste.openstack.org/show/641442/
deployed-server-ctrlr-data.yaml: http://paste.openstack.org/show/641433/
deployed-server-ips.yaml: http://paste.openstack.org/show/641438/
deployment-swift-data-map.yaml: http://paste.openstack.org/show/641439/

Changed in tripleo:
milestone: none → rocky-1
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

It seems that
OS::TripleO::DeployedServer::ControlPlanePort: ../deployed-server/deployed-neutron-port.yaml

needs to have an absolute path instead?

Revision history for this message
emanoel (emanoelxavier) wrote :

Could be. After changing that, deleting the existing failed overcloud stack and retrying the deployment i got a different error. The overcloud deploy command now executed with the --debug option produced the output http://paste.openstack.org/show/642463/

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The failed software deployment's resources.NetworkDeployment should be analyzed then,
perhaps the commands like:

openstack software deployment show <id>
openstack stack resource list --nested-depth 5 <stack>
openstack --os-cloud rdo-cloud stack resource show <stack> <resource>

though, I'm not too good with navigating nested heat entities :/

Revision history for this message
emanoel (emanoelxavier) wrote :

I did some steps similar to above, looks like the issue may be related with discovering or pinging the $METADATA_IP http://paste.openstack.org/show/645737/ ?

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

It seems that you should double-check the EC2MetadataIp value that Heat takes from quickstart inventory vars, see network_environment_args and use_resource_registry_nic_configs that enables the former. If you configure networking via custom tht templates, look there instead.

Revision history for this message
emanoel (emanoelxavier) wrote :

The value I currently have for EC2MetadaIP is the undercloud API IP. From the overcloud VM:
curl 192.168.24.1:8000
{"versions": [{"status": "CURRENT", "id": "v1.0", "links": [{"href": "http://192.168.24.1:8000/v1/", "rel": curl 192.168.24.1:8080
<html><h1>Not Found</h1><p>The resource could not be found.</p></html>[root@overcloud ~]

There is a route to that IP from the overcloud VM, and the behavior above is the expected one according to http://tripleo.org/install/advanced_deployment/deployed_server.html#testing-connectivity. The configuration I am currently using is based on http://tripleo.org/install/advanced_deployment/deployed_server.html#testing-connectivity. Should the value of the EC2MetadataIp be something else?

Revision history for this message
Sanjay Upadhyay (saneax) wrote :

EC2MetadataIP should be your undercloud ip. The variable might be correctly set. However the failure is at the NetworkConfig stage. The error could be related to any other network config parameter issues. We might need to see if the packets for ping are being received on undercloud node. Mostly this could be the network setup.

Revision history for this message
emanoel (emanoelxavier) wrote :

Is the ping happening from inside of one of the containers deployed in the overcloud guest VM? I did the steps here http://tripleo.org/install/advanced_deployment/deployed_server.html#testing-connectivity (ping, curl the undercloud IP 192.168.24.1) and everything worked as mentioned above. See traces and ip config here http://paste.openstack.org/show/653451/

Changed in tripleo:
milestone: rocky-1 → rocky-2
Changed in tripleo:
milestone: rocky-2 → rocky-3
Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
milestone: stein-3 → train-1
Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
milestone: train-2 → train-3
Changed in tripleo:
milestone: train-3 → ussuri-1
Changed in tripleo:
milestone: ussuri-1 → ussuri-2
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-2 → ussuri-3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-3 → ussuri-rc3
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Incomplete
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.