Overcloud fails with misleading error 504 Gateway Time-out when there is an error in templates

Bug #1801737 reported by Sai Sindhur Malleni
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Thomas Herve

Bug Description

Description of problem:
Trying to deploy an overcloud, using the command
 openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e templates/network-environment.yaml -e templates/deploy.yaml -e /home/stack/docker_registry.yaml -e templates/neutron-policy.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/disable-telemetry.yaml --ntp-server clock.redhat.com

We see it fail with error

APIException: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>

Looking at debug of the command, we see
RESP: [504] Cache-Control: no-cache Connection: close Content-Type: text/html
RESP BODY: Omitted, Content-Type is set to text/html. Only application/json responses have their bodies logged.
Request returned failure status: 504
<html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/cliff/app.py", line 402, in run_subcommand
    result = cmd.run(parsed_args)
  File "/usr/lib/python2.7/site-packages/tripleoclient/command.py", line 25, in run
    super(Command, self).run(parsed_args)
  File "/usr/lib/python2.7/site-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python2.7/site-packages/cliff/command.py", line 184, in run
    return_code = self.take_action(parsed_args) or 0
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 887, in take_action
    self._deploy_tripleo_heat_templates_tmpdir(stack, parsed_args)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 359, in _deploy_tripleo_heat_templates_tmpdir
    new_tht_root, tht_root)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 452, in _deploy_tripleo_heat_templates
    parsed_args.plan_environment_file)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 467, in _try_overcloud_deploy_with_compat_yaml
    plan_env_file)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 216, in _heat_deploy
    stack_name, env, moved_files, tht_root)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 292, in _process_and_upload_environment
    parameters=params)
  File "/usr/lib/python2.7/site-packages/tripleoclient/workflows/parameters.py", line 20, in update_parameters
    **input_)
  File "/usr/lib/python2.7/site-packages/tripleoclient/workflows/base.py", line 25, in call_action
    save_result=True, run_sync=True)
  File "/usr/lib/python2.7/site-packages/mistralclient/api/v2/action_executions.py", line 44, in create
    dump_json=True
  File "/usr/lib/python2.7/site-packages/mistralclient/api/base.py", line 97, in _create
    self._raise_api_exception(ex.response)
  File "/usr/lib/python2.7/site-packages/mistralclient/api/base.py", line 160, in _raise_api_exception
    error_message=error_data)
APIException: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>

clean_up DeployOvercloud: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>

Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/osc_lib/shell.py", line 135, in run
    ret_val = super(OpenStackShell, self).run(argv)
  File "/usr/lib/python2.7/site-packages/cliff/app.py", line 281, in run
    result = self.run_subcommand(remainder)
  File "/usr/lib/python2.7/site-packages/osc_lib/shell.py", line 175, in run_subcommand
    ret_value = super(OpenStackShell, self).run_subcommand(argv)
  File "/usr/lib/python2.7/site-packages/cliff/app.py", line 402, in run_subcommand
    result = cmd.run(parsed_args)
  File "/usr/lib/python2.7/site-packages/tripleoclient/command.py", line 25, in run
    super(Command, self).run(parsed_args)
  File "/usr/lib/python2.7/site-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python2.7/site-packages/cliff/command.py", line 184, in run
    return_code = self.take_action(parsed_args) or 0
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 887, in take_action
    self._deploy_tripleo_heat_templates_tmpdir(stack, parsed_args)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 359, in _deploy_tripleo_heat_templates_tmpdir
    new_tht_root, tht_root)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 452, in _deploy_tripleo_heat_templates
    parsed_args.plan_environment_file)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 467, in _try_overcloud_deploy_with_compat_yaml
    plan_env_file)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 216, in _heat_deploy
    stack_name, env, moved_files, tht_root)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 292, in _process_and_upload_environment
    parameters=params)
  File "/usr/lib/python2.7/site-packages/tripleoclient/workflows/parameters.py", line 20, in update_parameters
    **input_)
  File "/usr/lib/python2.7/site-packages/tripleoclient/workflows/base.py", line 25, in call_action
    save_result=True, run_sync=True)
  File "/usr/lib/python2.7/site-packages/mistralclient/api/v2/action_executions.py", line 44, in create
    dump_json=True
  File "/usr/lib/python2.7/site-packages/mistralclient/api/base.py", line 97, in _create
    self._raise_api_exception(ex.response)
  File "/usr/lib/python2.7/site-packages/mistralclient/api/base.py", line 160, in _raise_api_exception
    error_message=error_data)
APIException: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>

END return value: 1

However, the real error is in heat-engine logs which is a templating issue
2018-11-05 09:01:48.062 1 ERROR tripleo_common.actions.parameters [req-b6fedba5-8abf-4ed0-b741-a3e2c60182c9 f75430b12a2247cf990a7a1e25e81b5c d9b2c711458e44eca60f592ebb81ed31 - default default] Error validating environment for plan overcloud: ERROR: Internal Error: HTTPBadRequest: ERROR: Internal Error
2018-11-05 09:01:48.062 1 ERROR tripleo_common.actions.parameters Traceback (most recent call last):
2018-11-05 09:01:48.062 1 ERROR tripleo_common.actions.parameters File "/usr/lib/python2.7/site-packages/tripleo_common/actions/parameters.py", line 180, in run
2018-11-05 09:01:48.062 1 ERROR tripleo_common.actions.parameters 'heat_resource_tree': heat.stacks.validate(**fields),
2018-11-05 09:01:48.062 1 ERROR tripleo_common.actions.parameters File "/usr/lib/python2.7/site-packages/heatclient/v1/stacks.py", line 332, in validate
2018-11-05 09:01:48.062 1 ERROR tripleo_common.actions.parameters resp = self.client.post(url, **args)
2018-11-05 09:01:48.062 1 ERROR tripleo_common.actions.parameters File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 292, in post
2018-11-05 09:01:48.062 1 ERROR tripleo_common.actions.parameters return self.client_request("POST", url, **kwargs)
2018-11-05 09:01:48.062 1 ERROR tripleo_common.actions.parameters File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 282, in client_request
2018-11-05 09:01:48.062 1 ERROR tripleo_common.actions.parameters resp, body = self.json_request(method, url, **kwargs)
2018-11-05 09:01:48.062 1 ERROR tripleo_common.actions.parameters File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 271, in json_request
2018-11-05 09:01:48.062 1 ERROR tripleo_common.actions.parameters resp = self._http_request(url, method, **kwargs)
2018-11-05 09:01:48.062 1 ERROR tripleo_common.actions.parameters File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 234, in _http_request
2018-11-05 09:01:48.062 1 ERROR tripleo_common.actions.parameters raise exc.from_response(resp)
2018-11-05 09:01:48.062 1 ERROR tripleo_common.actions.parameters HTTPBadRequest: ERROR: Internal Error
2018-11-05 09:01:48.062 1 ERROR tripleo_common.actions.parameters
2018-11-05 09:01:48.934 1 DEBUG mistral.services.action_execution_reporter [-] Running heartbeat reporter... report /usr/lib/python2.7/site-packages/mistral/services/action_execution_reporter.py:60
executor.log
2018-11-05 08:59:55.321 26 DEBUG heat.engine.service [req-671a1e82-6896-4859-98c1-92762ac9bb37 - - - - -] Service 65f5c8aa-fb5c-4c1b-a193-65c3486c22fd is updated service_manage_report /usr/lib/python2.7/site-packages/heat/engine/service.py:2341
2018-11-05 09:00:29.821 45 INFO heat.engine.service [req-6968e284-d03a-4efa-83c6-345500df6977 - admin - default default] validate_template
2018-11-05 09:00:30.264 45 DEBUG heat.engine.parameter_groups [req-6968e284-d03a-4efa-83c6-345500df6977 - admin - default default] Validating Parameter Groups: ControllerParameters, OS::project_id, ControllerCount, ExtraConfig, BlockStorageParameters, ServerMetadata, ComputeCount, PublicVirtualFixedIPs, ObjectStorageParameters, StorageMgmtVirtualFixedIPs, InternalApiVirtualFixedIPs, BlockStorageRemovalPolicies, CloudNameInternal, ControllerExtraConfig, ObjectStorageHostnameFormat, ControllerHostnameFormat, CloudDomain, NovaComputeExtraConfig, ControllerRemovalPolicies, CephStorageCount, CephStorageSchedulerHints, ControlPlaneSubnetCidr, HypervisorNeutronPhysicalBridge, CephStorageServices, BlockStorageExtraConfig, ExtraHostFileEntries, AddVipsToEtcHosts, NeutronControlPlaneID, CephStorageParameters, ComputeHostnameFormat, ComputeRemovalPolicies, RedisVirtualFixedIPs, ComputeServices, HypervisorNeutronPublicInterface, ObjectStorageCount, BlockStorageServices, CloudName, CloudNameCtlplane, CephStorageRemovalPolicies, NodeCreateBatchSize, CloudNameStorage, EndpointMapOverride, CloudNameStorageManagement, DeployIdentifier, NeutronPublicInterface, BlockStorageCount, BlockStorageSchedulerHints, ControlFixedIPs, StorageVirtualFixedIPs, CephStorageExtraConfig, ObjectStorageSchedulerHints, BlockStorageHostnameFormat, DeploymentServerBlacklist, RabbitCookieSalt, OS::stack_id, ObjectStorageServices, ControllerServices, NovaComputeSchedulerHints, CephStorageHostnameFormat, ComputeSchedulerHints, ComputeParameters, controllerExtraConfig, ControllerSchedulerHints, ObjectStorageExtraConfig, ObjectStorageRemovalPolicies, ControlPlaneSubnet, UpdateIdentifier, ComputeExtraConfig validate /usr/lib/python2.7/site-packages/heat/engine/parameter_groups.py:42
2018-11-05 09:00:32.014 45 ERROR oslo_messaging.rpc.server [req-6968e284-d03a-4efa-83c6-345500df6977 - admin - default default] Exception during message handling: ValueError: Error parsing template https://192.168.24.2:13808/v1/AUTH_d9b2c711458e44eca60f592ebb81ed31/overcloud/user-files/home/stack/templates/nic-configs/controller.yaml while parsing a block mapping
  in "<unicode string>", line 176, column 17:
                  - type: interface
                    ^
expected <block end>, but found '-'
  in "<unicode string>", line 192, column 17:
                    - ip_netmask: 169.254.169.254/32

The user should be displayed the right error, instead of a vague 504 error.

Version-Release number of selected component (if applicable):
Rocky

How reproducible:
100% when you have a tempalte error

Steps to Reproduce:
1. Deploy undercloud
2. Introduce a tempalte error in the nic-configs
3. Deploy overcloud

Actual results:
Vague 504 error from overcloud deploy command

Expected results:
STDERR should have the correct error message printed on a deploy failure. Something like "Templating error" or the error from the heat-engine logs.

Additional info:

Tags: ux
Thomas Herve (therve)
Changed in tripleo:
assignee: nobody → Thomas Herve (therve)
Changed in tripleo:
status: New → Triaged
importance: Undecided → Medium
milestone: none → stein-2
tags: added: ux
Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
milestone: stein-3 → train-1
Revision history for this message
Alex Schultz (alex-schultz) wrote :

I checked this and it appears we're now throwing an internal error with a message about validating the environment files.

(undercloud) [centos@undercloud ~]$ openstack overcloud deploy --templates -e /usr/share/openstack-tripleo-heat-templates/environments/enable-swap.yaml -e container-prepare-params.yaml -e templates/environments/network-environment.yaml
Creating Swift container to store the plan
Creating plan from template files in: /tmp/tripleoclient-9pJm7M/tripleo-heat-templates
Plan created.
Processing templates in the directory /tmp/tripleoclient-9pJm7M/tripleo-heat-templates
Exception occured while running the command
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tripleoclient/command.py", line 30, in run
    super(Command, self).run(parsed_args)
  File "/usr/lib/python2.7/site-packages/osc_lib/command/command.py", line 41, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python2.7/site-packages/cliff/command.py", line 184, in run
    return_code = self.take_action(parsed_args) or 0
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 910, in take_action
    self._deploy_tripleo_heat_templates_tmpdir(stack, parsed_args)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 365, in _deploy_tripleo_heat_templates_tmpdir
    new_tht_root, tht_root)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 468, in _deploy_tripleo_heat_templates
    deployment_options=deployment_options)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 485, in _try_overcloud_deploy_with_compat_yaml
    deployment_options=deployment_options)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 219, in _heat_deploy
    stack_name, env, moved_files, tht_root)
  File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 296, in _process_and_upload_environment
    parameters=params)
  File "/usr/lib/python2.7/site-packages/tripleoclient/workflows/parameters.py", line 20, in update_parameters
    **input_)
  File "/usr/lib/python2.7/site-packages/tripleoclient/workflows/base.py", line 31, in call_action
    raise exceptions.WorkflowActionError(action, output)
WorkflowActionError: Action tripleo.parameters.update execution failed: Error validating environment for plan overcloud: ERROR: Internal Error
None
Action tripleo.parameters.update execution failed: Error validating environment for plan overcloud: ERROR: Internal Error
None

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.